### Task1
- Define a second generator that yields the number of words in each line
- Use charcount(filename) as an input to this generator
- output print statement should include (linenumber, charcount and word count)

Tip : What I like to do when writing a generator is to print statements instead of yield.
When I am satisfied with the accuracy of the print statement, I convert it to yield

### Task 1 - Solution

In [87]:
def countwords(linearray):
    for lines, ccount in linearray:
        yield ccount, len(lines.split())

In [106]:
for i, l in enumerate(countwords(charcount(filename = 'sometext.txt'))):
    print(f'> line number:{i} char count: {l[0]}, number of words: {l[1]}')

> line number:0 char count: 31, number of words: 8
> line number:1 char count: 38, number of words: 7
> line number:2 char count: 21, number of words: 2
> line number:3 char count: 27, number of words: 4
> line number:4 char count: 39, number of words: 8


## Task 1b solution

In [118]:
total = 0
for count, n in enumerate(charcount(filename='WallabyDNAseq.txt'), 1):
    total += n[1]

print(f'Mean length per line {total/count:0.2f}')

FileNotFoundError: [Errno 2] No such file or directory: 'sometextIwrote.txt'

## Generators are great for large datasets that you want to process one line at a time
- Unlike batch type of programming,that could create a large memory footprint, generators iterate over data __lazily__ without loading the entire data source into memory at once.
- __yield__ is not __return__!! 
- When functions `return`, they are done for good. Generators are alive till values are exhausted
- Functions always start from the first line, generators start where you left off : at __yield__ 
- __Limitation__ - with a generator you can only iterate. You can't peak ahead or look behind

## Task 2 : Streaming with `yield`
Multiple CSV files stored in a directory, contain information of x-y position of a swimming zebrafish across time.
<br>__The task:__
1. Loop through each csv file, acquire the x and y position and find distance travelled by the fish at each time point.
2. To find distance travelled between two timepoints, you need to get the x and y position of fish at two consecutive frames.
3. Using the acquired distance travelled, print time spent by the fish at a speed below the threshold. 

  <img src="files/fish.png"  width="400" >

In [103]:
import csv
import os


def CSVfileGrabber(dirname):
    """Step 1 : Grab CSV files from a directory """
    for filename in os.listdir(dirname):
        if filename.endswith('.csv'):
            print('Working on: {}'.format(filename[:5]))  # Print name of fish
            yield os.path.join(dirname, filename)


def readxy(filename):
    """Step 2 : read the csv files line by line """
    with open(filename) as f:
        csvreader = csv.reader(f)
        for i, line in enumerate(csvreader):
            # Skip a few lines
            if i < 10:
                continue
            else:
                 # x and y coordinates
                x = int(line[2])
                y = int(line[3])
                yield (x, y)

In [104]:
dirname = '/Users/seetha/Desktop/Microbetest/ExampleFile/'  # A small sample dataset

for files in CSVfileGrabber(dirname):
    print(files)
    
# for files in CSVfileGrabber(dirname):
#     numline = 0
#     for g in readxy(files):
#         numline += 1
#     print('Parsed lines from this csv file is {}'.format(numline))

Working on: Fish1
Parsed lines from this csv file is 17
Working on: Fish2
Parsed lines from this csv file is 17


In [105]:
def consecutivexy1(linearray):
    """Step 4: get consecutive xy values"""
    # Here we want to get two consecutive xy to get speed/frame
    # Make use of the next keyword
    for i, line in enumerate(linearray):
        if i == 0:
            prevxy = line
            nextxy = next(linearray)
        else:
            prevxy = nextxy
            nextxy = line
        yield prevxy, nextxy

A nice way to do this is to use itertools (which is an amazing library for looping through iterators) https://docs.python.org/3/library/itertools.html

In [92]:
from itertools import tee


def consecutivexy2(linearray):
    # This makes two copies of the same iterable
    prevxy, nextxy = tee(linearray, 2)
    next(nextxy)  # discard one
    yield from zip(prevxy, nextxy)  # Note here I am using "yield from"

#### Sidenote : `yield from`
With `yield from`, we can skip an extra `for` loop

In [93]:
# A simple example to see what the yield from function will do 
A = range(5)
B = range(6, 11)

def temp(A, B): #Without yield from
    for a, b in zip(A, B):
        yield a, b
            
for i in temp(A, B): 
    print(i)
# Two loops!! You need two loops!!

(0, 6)
(1, 7)
(2, 8)
(3, 9)
(4, 10)


In [94]:
# After Python 3.3 and existance of yield from
def yieldfromexample(A, B):
    yield from zip(A, B)
for i in yieldfromexample(A, B):
    print(i)

(0, 6)
(1, 7)
(2, 8)
(3, 9)
(4, 10)


In [95]:
# Just to make sure things are working
for files in CSVfileGrabber(dirname):
    numline = 0
    for x, y in consecutivexy1(getxy(readcsv(files))):
#         print(x, y)
        numline += 1
    print('Parsed lines from this csv file is {}'.format(numline))

Working on: Fish1
Parsed lines from this csv file is 16
Working on: Fish2
Parsed lines from this csv file is 16


## Write the next parts on your own
- Step 5 : Calculate distance between the two consecutive points
- Step 6 : Put it all together

In [96]:
# Step 5: Calculate euclidean distance
import math


def getdist(xy):
    """  
    Write a generator function that recieves 
    the previous and next x-y location of the fish 
    and calculates the distance between the two points
    
    Euclidean distance between two points (x1, y1) and (x2, y2) is 
    sqrt((x1-x2)^2 + (y1-y2)^2)
   """

In [97]:
# Step 6: Put it all together
def getframes(dist, threshold, frames_per_sec):
    """
    Count frames with distance below a user-defined threshold and
    complete the print statement given below
    (Hint: use enumerate to find number of frames)
    
    Example:
    Of 16.27 seconds recording time, time spent with speed less than 10 is 12.83 seconds
    """
    
    print('Of {:0.2f} seconds recording time, time spent with speed less than {} is {:0.2f} seconds')

## Task2 : Solution

In [107]:
import math
def getdist(xy):
    # Calculate euclidean distance
    for prevxy, nextxy in xy:
        # zip allows you to iterate two lists parallely
        dist = [(a - b)**2 for a, b in zip(prevxy, nextxy)]
        dist = math.sqrt(sum(dist))
        yield dist

def getframes(dist, threshold=10, frames_per_sec=30):
    dist_count = 0
    for i, d in enumerate(dist):
        if d < threshold:
            dist_count += 1
    print('Of {:0.3f} seconds recording time, time spent with speed less than {} is {:0.3f} seconds'.format(
        i / frames_per_sec, threshold, dist_count / frames_per_sec))

In [108]:
# Test your code with larger datasets
dirname = '/Users/seetha/Desktop/Microbetest/Collective/'
for files in CSVfileGrabber(dirname):
    getframes(
        getdist(
            consecutivexy2(
                readxy(files))), threshold=10, frames_per_sec=30)

Working on: Fish1
Of 16.267 seconds recording time, time spent with speed less than 10 is 12.833 seconds
Working on: Fish6
Of 16.267 seconds recording time, time spent with speed less than 10 is 15.133 seconds


### Task 3 - solution

In [116]:
def words(text):
    if text.startswith('#FIXME'):
        return True
        
def myfilter(myfunc, myseq):
    for x in myseq:
        if myfunc(x):
            yield x
        else:
            continue

In [117]:
for i in myfilter(words, open('FixationTaskToDo.txt')):
    print(i)

#FIXME: Reset all counters when the subject name changes

#FIXME: The program crashes sometimes, possibility after overlaying the video on top of the gray screen

#FIXME: Look into why rewarded fixations appear to fall outside the allowed window.

#FIXME: randomize calibration spot during calibration

#FIXME: run a check to ensure that the monkey does not enter other locations other than the target in the response window

#FIXME: Make sure we align the trials to the screen flip

#FIXME: Randomly cycle through fixation locations which can be interrupted, should be resumed when done

#FIXME: Add feedback on the experimenter's plot on how many trials have been completed

