# Section 2 - Coding

In this section we will load and manipulate "unconventional" data files - for which you will be required to create a simple loading functionality and then be able to process and query a data file.

There is a "section2_data.txt" file attached to this IPython notebook. The data file contains few rows from a computer vision dataset. Each row corresponds to a frame in a video and contains some metadata and annotations over it.

The following notebook will pose some questions about reading and processing this data.

Feel free to use any existing python library to answer the questions.

In [2]:
import pandas as pd
import json
import re

In [3]:
!head section2_data.txt

{"_i": 0, "frame": "frame_000.png", "video": "video000", "value": 39, "labels": ["bird"]}
{"_i": 1, "frame": "frame_001.png", "video": "video000", "value": 33, "labels": ["frog", "dog"]}
{"_i": 2, "frame": "frame_002.png", "video": "video000", "value": 25, "labels": ["panda", "panda"]}
{"_i": 3, "frame": "frame_003.png", "video": "video000", "value": 28, "labels": ["dog", "dog"]}
{"_i": 4, "frame": "frame_004.png", "video": "video000", "value": 16, "labels": ["cat"]}
{"_i": 5, "frame": "frame_005.png", "video": "video000", "value": 32, "labels": ["bird", "frog", "bird"]}
{"_i": 6, "frame": "frame_006.png", "video": "video000", "value": 35, "labels": ["bird", "dog"]}
{"_i": 7, "frame": "frame_000.png", "video": "video001", "value": 25, "labels": ["dog", "bird"]}
{"_i": 8, "frame": "frame_001.png", "video": "video001", "value": 16, "labels": ["dog", "panda", "bird"]}
{"_i": 9, "frame": "frame_002.png", "video": "video001", "value": 23, "labels": ["panda"]}


In [5]:
def parse(d):
    dictionary = dict()
    # Removes curly braces and splits the pairs into a list
    pairs = d.strip('{}').split(', ')
    for i in pairs:
        pair = i.split(': ')
        # Other symbols from the key-value pair should be stripped.
        dictionary[pair[0].strip('\'\'\"\"')] = pair[1].strip('\'\'\"\"')
    return dictionary
try:
    geeky_file = open('section2_data.txt', 'rt')
    lines = geeky_file.read().split('\n')
    for l in lines:
        if l != '':
            dictionary = parse(l)
            print(dictionary)
    geeky_file.close()
except:
    print("Something unexpected occurred!")

{'_i': '0', 'frame': 'frame_000.png', 'video': 'video000', 'value': '39', 'labels': '["bird"]'}
Something unexpected occurred!


In [6]:
d = {}
line = lines[0].strip('{}').split(',')
for l in line:
    key, val = l.replace('"', '').strip().split(':')
    if(key == 'labels'): 
        break
#     print(l.replace('"', '').strip().split(':'))
    d[key] = [val]
    
for i in range (1, 51):
    line = lines[i].strip('{}').split(',')
    for l in line:
        key = l.replace('"', '').strip().split(':')[0]
        if(key == 'labels'): 
            break
        val = l.replace('"', '').strip().split(':')[1]
        d[key].append(val)
#     print(i)

In [7]:
labels = []
for line in lines:
    la = re.search('\[.*\]', line)
    labels.append(la.group())
#     print(la.group())
print(labels)

['["bird"]', '["frog", "dog"]', '["panda", "panda"]', '["dog", "dog"]', '["cat"]', '["bird", "frog", "bird"]', '["bird", "dog"]', '["dog", "bird"]', '["dog", "panda", "bird"]', '["panda"]', '["bird", "cat", "bird"]', '["frog", "cat", "cat"]', '["frog", "cat", "dog"]', '["dog", "panda"]', '["dog", "bird"]', '["frog"]', '["frog"]', '["dog", "frog", "panda"]', '["bird", "panda", "panda"]', '["frog", "frog", "frog"]', '["dog", "dog", "bird"]', '["cat", "dog", "bird"]', '["bird"]', '["bird", "dog", "dog"]', '["cat", "dog", "dog"]', '["panda"]', '["cat", "cat", "panda"]', '["cat", "cat", "dog"]', '["frog"]', '["frog", "bird"]', '["panda", "panda", "bird"]', '["dog", "dog", "frog"]', '["frog", "bird"]', '["bird", "cat"]', '["panda"]', '["panda"]', '["bird"]', '["dog", "panda", "cat"]', '["frog"]', '["dog", "panda"]', '["frog"]', '["frog", "dog", "cat"]', '["panda", "bird", "cat"]', '["panda", "cat", "bird"]', '["dog"]', '["panda"]', '["bird", "bird"]', '["bird"]', '["panda"]', '["frog"]', '["

In [19]:
df = pd.DataFrame(d)
df['labels'] = labels
df.head()

Unnamed: 0,_i,frame,video,value,labels
0,0,frame_000.png,video000,39,"[""bird""]"
1,1,frame_001.png,video000,33,"[""frog"", ""dog""]"
2,2,frame_002.png,video000,25,"[""panda"", ""panda""]"
3,3,frame_003.png,video000,28,"[""dog"", ""dog""]"
4,4,frame_004.png,video000,16,"[""cat""]"


## Section 1 - Design a data loader

Design a data structure, that give a file path `"section2_data.txt"`, it will read and parse the contents of the file above.

#### Q1 - Design the data structure with the following properties:
- The data structure is either callable or indexable. It will accepts a single parameter, as integer, and return the parsed contents of the row corresponding to the given index.
- The data structure needs to return the number of rows in the file (and in memory) when called with the python command `len(my_data_struct)`


#### Q2 - Prove that you can initialize the reader and then calculate its length `len(reader)` and print the 26th and 43rd elements of the dataset.

In [22]:
## YOUR SOLUTION
len(df)

51

## Section 2 - Process the data

#### Q1 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the number of frames of that video.

In [23]:
### YOUT SOLUTION
def video_frame_count(ds):
    videos = pd.unique(ds['video'])
    vfc = {vid: [] for vid in videos}
    for i in range(len(ds)):
        vfc[df['video'][i]].append(df['value'][i])
#         print(i)
    return vfc

In [24]:
k = {}
k[df['video'][0]] = df['value'][0] 

In [25]:
video_frame_count(df)

{' video000': [' 39', ' 33', ' 25', ' 28', ' 16', ' 32', ' 35'],
 ' video001': [' 25',
  ' 16',
  ' 23',
  ' 3',
  ' 8',
  ' 2',
  ' 0',
  ' 26',
  ' 38',
  ' 34'],
 ' video002': [' 40', ' 35', ' 23', ' 37', ' 14'],
 ' video003': [' 28',
  ' 26',
  ' 2',
  ' 24',
  ' 24',
  ' 5',
  ' 33',
  ' 26',
  ' 41',
  ' 35',
  ' 39',
  ' 6',
  ' 23',
  ' 5',
  ' 4',
  ' 40',
  ' 20',
  ' 21'],
 ' video004': [' 17',
  ' 26',
  ' 32',
  ' 31',
  ' 21',
  ' 5',
  ' 29',
  ' 41',
  ' 33',
  ' 14',
  ' 2']}

In [26]:
df['value'][1]

' 33'

#### Q2 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the sum of the `value` field of all the frames containing a `dog`.

In [None]:
### YOUR SOLUTION
def video_value_sum_with_dog(my_data_struct):
            

#### Q3 - Last, create an algorithm that returns a dictionary with the count of each of the animal types in the dataset, excluding occurrences in `video003` and rows where the `value` is odd.

In [None]:
### YOUR SOLUTION
def animal_count(my_data_struct):
   