# Section 2 - Coding

In this section we will load and manipulate "unconventional" data files - for which you will be required to create a simple loading functionality and then be able to process and query a data file.

There is a "section2_data.txt" file attached to this IPython notebook. The data file contains few rows from a computer vision dataset. Each row corresponds to a frame in a video and contains some metadata and annotations over it.

The following notebook will pose some questions about reading and processing this data.

Feel free to use any existing python library to answer the questions.

In [1]:
!head section2_data.txt

{"_i": 0, "frame": "frame_000.png", "video": "video000", "value": 39, "labels": ["bird"]}
{"_i": 1, "frame": "frame_001.png", "video": "video000", "value": 33, "labels": ["frog", "dog"]}
{"_i": 2, "frame": "frame_002.png", "video": "video000", "value": 25, "labels": ["panda", "panda"]}
{"_i": 3, "frame": "frame_003.png", "video": "video000", "value": 28, "labels": ["dog", "dog"]}
{"_i": 4, "frame": "frame_004.png", "video": "video000", "value": 16, "labels": ["cat"]}
{"_i": 5, "frame": "frame_005.png", "video": "video000", "value": 32, "labels": ["bird", "frog", "bird"]}
{"_i": 6, "frame": "frame_006.png", "video": "video000", "value": 35, "labels": ["bird", "dog"]}
{"_i": 7, "frame": "frame_000.png", "video": "video001", "value": 25, "labels": ["dog", "bird"]}
{"_i": 8, "frame": "frame_001.png", "video": "video001", "value": 16, "labels": ["dog", "panda", "bird"]}
{"_i": 9, "frame": "frame_002.png", "video": "video001", "value": 23, "labels": ["panda"]}


## Section 1 - Design a data loader

Design a data structure, that give a file path `"section2_data.txt"`, it will read and parse the contents of the file above.

#### Q1 - Design the data structure with the following properties:
- The data structure is either callable or indexable. It will accepts a single parameter, as integer, and return the parsed contents of the row corresponding to the given index.
- The data structure needs to return the number of rows in the file (and in memory) when called with the python command `len(my_data_struct)`


In [1]:
## YOUR SOLUTION
import pandas as pd

In [18]:
my_data_struct = pd.read_json("section2_data.txt", lines=True)

my_data_struct

Unnamed: 0,_i,frame,video,value,labels
0,0,frame_000.png,video000,39,[bird]
1,1,frame_001.png,video000,33,"[frog, dog]"
2,2,frame_002.png,video000,25,"[panda, panda]"
3,3,frame_003.png,video000,28,"[dog, dog]"
4,4,frame_004.png,video000,16,[cat]
5,5,frame_005.png,video000,32,"[bird, frog, bird]"
6,6,frame_006.png,video000,35,"[bird, dog]"
7,7,frame_000.png,video001,25,"[dog, bird]"
8,8,frame_001.png,video001,16,"[dog, panda, bird]"
9,9,frame_002.png,video001,23,[panda]


In [19]:
len(my_data_struct)

51

#### Q2 - Prove that you can initialize the reader and then calculate its length `len(reader)` and print the 26th and 43rd elements of the dataset.

In [21]:
## YOUR SOLUTION
my_data_struct.iloc[[26,43]]

Unnamed: 0,_i,frame,video,value,labels
26,26,frame_004.png,video003,24,"[cat, cat, panda]"
43,43,frame_003.png,video004,31,"[panda, cat, bird]"


## Section 2 - Process the data

#### Q1 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the number of frames of that video.

In [24]:
### YOUT SOLUTION
def video_frame_count(my_data_struct):
    video_frame_dict = {}
    groups = my_data_struct.groupby("video")
    for name, group in groups:
        video_name = name
        num_frames = len(groups)
        video_frame_dict[video_name] = num_frames
    return video_frame_dict   
video_frame_dict = video_frame_count(my_data_struct)
print(video_frame_dict)

{'video000': 5, 'video001': 5, 'video002': 5, 'video003': 5, 'video004': 5}


#### Q2 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the sum of the `value` field of all the frames containing a `dog`.

In [26]:
### YOUR SOLUTION
def video_value_sum_with_dog(my_data_struct):
    video_dog_dict = {}
    groups = df.groupby("video")
    for name, group in groups:
        video_name = name
        dog_group = group[group["labels"].apply(lambda x: "dog" in x)]
        dog_value_sum = dog_group["value"].sum()
        video_dog_dict[video_name] = dog_value_sum
    return video_dog_dict
video_dog_dict = video_value_sum_with_dog(my_data_struct)
print(video_dog_dict)
            

{'video000': 96, 'video001': 69, 'video002': 91, 'video003': 129, 'video004': 49}


#### Q3 - Last, create an algorithm that returns a dictionary with the count of each of the animal types in the dataset, excluding occurrences in `video003` and rows where the `value` is odd.

In [31]:
### YOUR SOLUTION
def animal_count(my_data_struct):
    animal_count_dict = {}
    filtered_df = df[(df["video"] != "video003") & (df["value"] % 2 == 0)]
    for index, row in filtered_df.iterrows():
        animal_list = row["labels"]
        for animal in animal_list:
            if animal not in animal_count_dict:
                animal_count_dict[animal] = 1
            else:
                animal_count_dict[animal] += 1
    return animal_count_dict
animal_count_dict = animal_count(my_data_struct)
print(animal_count_dict) 

{'dog': 10, 'cat': 7, 'bird': 6, 'frog': 8, 'panda': 4}
