In [1]:
import pylena

# available objects:
- **LenaFile**  (the whole lena5min.csv file)
- **LenaRange** (a range within that whole LenaFile)
- **LenaRow**   (a single row within that csv file)

# class LenaFile

the **LenaFile** class is the main top level class you'll be working with. To create a LenaFile object, pass it a path to a lena5min.csv file. For example:

In [2]:
lena_file = pylena.LenaFile("93_07_lena5min.csv")

each LenaFile has some global fields as members, for example:

In [3]:
print "child_key:   " + lena_file.child_key
print "birth_date:  " + lena_file.birth_date
print "age:         " + lena_file.age 
print "sex:         " + lena_file.sex

child_key:   46934939900000E7A
birth_date:  2014-05-29
age:         7m; 14d
sex:         M


# LenaFile.get_range(begin=0, end=0)

this returns a **LenaRange** object, which represents a range of rows (from "begin" to "end") within a lena5min csv file. The begin and end refer to the indices of the rows within the csv file. 

In [4]:
rows_1_to_7 = lena_file.get_range(begin=1, end=7) # from rows 1 to 7 (does not include row 7)

# class LenaRange

in the previous example, the result (variable rows_1_to_7) was a **LenaRange** object. A LenaRange as a bunch of different methods available to it. Most of them mirroring the methods of the same name belonging to a LenaFile. 

# LenaRange.sum(*keys)

The sum() function returns the sum of all the values corresponding to the given key. In the example below, we're getting the sum of all the "CTC" values in the LenaRange.

possible key values:
 - "ctc"
 - "cvc"
 - "awc"
 
 
## single key

providing a single key will calculate the sum accross that range for that single key, for example:
 

In [5]:
print rows_1_to_7.sum("ctc")

22


## multiple keys

providing multiple keys will calculate the combined sum across that LenaRange for all the keys provided, for example:

In [6]:
print rows_1_to_7.sum("ctc", "cvc")

76


# LenaRange.total_time(begin=0, end=None)

The total_time() method returns the total duration of that LenaRange, calculated by summing all the "duration" fields. Most of the rows should be exactly 5 minutes, but sometimes (usually at the beginning) they aren't. If you call total_time() with no arguments, you'll get the total time of the entire range. If you provide begin and end arguments, it'll give you the duration of that subrange. The begin and end refer to the indices of the rows.

In [7]:
print rows_1_to_7.total_time()

0:30:00


In [8]:
print rows_1_to_7.total_time(begin=2, end=5) # from rows 2 to 5 (does not include row 5)

0:15:00


# LenaFile.total_time(begin=0, end=None)

the total_time method is also available for the LenaFile object. Its behavior is the same in regards to arguments. For example:


In [9]:
print lena_file.total_time()

15:17:14


In [10]:
print lena_file.total_time(begin=0, end=22) # from rows 0 to 22 (does not include row 22)

1:47:14


# LenaFile.sum(*keys)

The sum() method on a LenaFile behaves the same way as the sum() method on a LenaRange, except the "range" is the entire file, not just a subsample.

# LenaRange.range

you can access the underlying list of **LenaRow**s by refering to the .range member of the LenaRange object. In the example below, we print the list of LenaRows:

In [11]:
print rows_1_to_7.range


[<pylena.elements.LenaRow object at 0x10488b450>, <pylena.elements.LenaRow object at 0x10488b4d0>, <pylena.elements.LenaRow object at 0x10488b550>, <pylena.elements.LenaRow object at 0x10488b5d0>, <pylena.elements.LenaRow object at 0x10488b650>, <pylena.elements.LenaRow object at 0x10488b6d0>]


# class LenaRow

Each element in a LenaRange is a **LenaRow** (a single row in the lena5min.csv). A LenaRow is the smallest unit of subdivision of a lena5min.csv file. Each LenaRow has a bunch of attributes. In the example below, we loop through the range and print each LenaRow's timestamp:

In [12]:
for element in rows_1_to_7.range:
    print element.timestamp

2015-01-12 08:45
2015-01-12 08:50
2015-01-12 08:55
2015-01-12 09:00
2015-01-12 09:05
2015-01-12 09:10


In this next example we do the same thing, except printing the awc_actual value of each row:

In [13]:
for element in rows_1_to_7.range:
    print element.awc_actual

106
82
104
106
236
214


# all the attributes of LenaRow

Here we're selecting the first element in the 
```python 
rows_1_to_7.range
``` 
list, and printing out a dictionary of all it's attributes. All of these fields are available to you. 

In [14]:
row_1 = rows_1_to_7.range[0]

In [15]:
import pprint
pp = pprint.PrettyPrinter(indent=3)
# ^ ignore this, it's just for printing the dictionary nicely.

pp.pprint(row_1.__dict__)

{  'age': '7m; 14d',
   'ava_stdscore': 102.11,
   'ava_stdscore_percent': 55,
   'awc_actual': 106,
   'birth_date': '2014-05-29',
   'child_key': '46934939900000E7A',
   'ctc_actual': 6,
   'cvc_actual': 15,
   'data_type': '5 Minute',
   'distant': '00:00:45',
   'dlp': '3729',
   'duration': '00:05:00',
   'first_name': '93',
   'id': 'C008',
   'last_name': '',
   'meaningful': '00:00:54',
   'noise': '00:00:00',
   'processing_file': '20150113_123848_003729.its',
   'row_index': 1,
   'sex': 'M',
   'silence': '00:03:16',
   'timestamp': '2015-01-12 08:45',
   'tv': '00:00:05',
   'tv_percent': 2}


^ These are all the values that are in a single LenaRow.

In [16]:
print "duration: " + row_1.duration
print "awc_actual: " + str(row_1.awc_actual) # have to cast int to string here
print "birth_date: " + row_1.birth_date
print "processing_file: " + row_1.processing_file

duration: 00:05:00
awc_actual: 106
birth_date: 2014-05-29
processing_file: 20150113_123848_003729.its


# LenaFile.rank_window(window_size=0, *keys)

The rank_window() function of the LenaFile object returns a ranked list of all the subsamples of size window_size. You specify which attribute you want to rank by with the "keys" parameter.

## single key:

single key means we're ranking the average of a single value, for example the "CTC" value:

In [17]:
ranked_ctc = lena_file.rank_window(6, "ctc")

#### top 5

let's print out the top 5 results:

In [18]:
print ranked_ctc[:5]

[(98, 20.666666666666668), (99, 19.833333333333332), (100, 16.333333333333332), (97, 15.333333333333334), (101, 14.833333333333334)]


## multiple keys:

If you provide more than 1 key, then it'll be ranked as the average of those key values.


In [19]:
ranked_ctc_cvc = lena_file.rank_window(6, "ctc", "cvc")

In [20]:
print ranked_ctc_cvc[:5]

[(99, 59.833333333333336), (98, 58.0), (102, 54.666666666666664), (100, 51.833333333333336), (101, 49.166666666666664)]


# LenaFile.top_rows(n=5, *keys)

the top_rows() method on a LenaFile object returns the top N (default is 5) rows, ranked according to the keys provided. For example:

In [21]:
top_6_rows = lena_file.top_rows(6, "cvc")

print top_6_rows

[(103, 83.0), (107, 48.0), (102, 44.0), (104, 44.0), (84, 38.0), (99, 38.0)]


The result of the top_rows() method is a list of tuples. Each tuple contains the index of the row and the value it was ranked with.

To get the actual rows that these indices refer to, pass a list of indices to the get_rows() method:


# LenaFile.get_rows(rows=[])

In [22]:
# this is a list comprehension. It basically says take the first 
# element of each tuple in the list and make a new list with them.
the_indices = [element[0] for element in top_6_rows] 

print "the_indices: "
print the_indices


top_6_LenaRows = lena_file.get_rows(rows=the_indices)


print "\nthe top_6_LenaRows: "

pp.pprint(top_6_LenaRows)

 
print "\nthe cvc values: \n"

for row in top_6_LenaRows:
    print row.cvc_actual


the_indices: 
[103, 107, 102, 104, 84, 99]

the top_6_LenaRows: 
[  <pylena.elements.LenaRow object at 0x1048b57d0>,
   <pylena.elements.LenaRow object at 0x1048b59d0>,
   <pylena.elements.LenaRow object at 0x1048b5750>,
   <pylena.elements.LenaRow object at 0x1048b5850>,
   <pylena.elements.LenaRow object at 0x1048ace10>,
   <pylena.elements.LenaRow object at 0x1048b55d0>]

the cvc values: 

83
48
44
44
38
38
