# Section 2 - Coding

In this section we will load and manipulate "unconventional" data files - for which you will be required to create a simple loading functionality and then be able to process and query a data file.

There is a "section2_data.txt" file attached to this IPython notebook. The data file contains few rows from a computer vision dataset. Each row corresponds to a frame in a video and contains some metadata and annotations over it.

The following notebook will pose some questions about reading and processing this data.

Feel free to use any existing python library to answer the questions.

In [1]:
!head section2_data.txt

{"_i": 0, "frame": "frame_000.png", "video": "video000", "value": 39, "labels": ["bird"]}
{"_i": 1, "frame": "frame_001.png", "video": "video000", "value": 33, "labels": ["frog", "dog"]}
{"_i": 2, "frame": "frame_002.png", "video": "video000", "value": 25, "labels": ["panda", "panda"]}
{"_i": 3, "frame": "frame_003.png", "video": "video000", "value": 28, "labels": ["dog", "dog"]}
{"_i": 4, "frame": "frame_004.png", "video": "video000", "value": 16, "labels": ["cat"]}
{"_i": 5, "frame": "frame_005.png", "video": "video000", "value": 32, "labels": ["bird", "frog", "bird"]}
{"_i": 6, "frame": "frame_006.png", "video": "video000", "value": 35, "labels": ["bird", "dog"]}
{"_i": 7, "frame": "frame_000.png", "video": "video001", "value": 25, "labels": ["dog", "bird"]}
{"_i": 8, "frame": "frame_001.png", "video": "video001", "value": 16, "labels": ["dog", "panda", "bird"]}
{"_i": 9, "frame": "frame_002.png", "video": "video001", "value": 23, "labels": ["panda"]}


## Section 1 - Design a data loader

Design a data structure, that give a file path `"section2_data.txt"`, it will read and parse the contents of the file above.

#### Q1 - Design the data structure with the following properties:
- The data structure is either callable or indexable. It will accepts a single parameter, as integer, and return the parsed contents of the row corresponding to the given index.
- The data structure needs to return the number of rows in the file (and in memory) when called with the python command `len(my_data_struct)`


In [13]:
## YOUR SOLUTION
import pandas as pd
import numpy as np
import ast

dataList = []
with open('section2_data.txt') as f:
    for line in f.readlines():
        # ast.literal_eval raises an exception if the input isn't a valid Python datatype
        dataList.append(ast.literal_eval(line))

#storing the value into the data frame ( the data structure )
df = pd.DataFrame.from_records(dataList, index="_i")

df.head()

Unnamed: 0_level_0,frame,video,value,labels
_i,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,frame_000.png,video000,39,[bird]
1,frame_001.png,video000,33,"[frog, dog]"
2,frame_002.png,video000,25,"[panda, panda]"
3,frame_003.png,video000,28,"[dog, dog]"
4,frame_004.png,video000,16,[cat]


#### Q2 - Prove that you can initialize the reader and then calculate its length `len(reader)` and print the 26th and 43rd elements of the dataset.

In [26]:
## YOUR SOLUTION
#printing the length
print('Length of the Data Structure : ' , len(df) , end ='\n\n')

#accessing the value with index
print('26th Element of the Dataset : ' , df.loc[25] , end='\n\n')
print('43th Element of the Dataset : ' , df.loc[42])

Length of the Data Structure :  51

26th Element of the Dataset :  frame           frame_003.png
video                video003
value                      24
labels                [panda]
dog_in_label            False
Name: 25, dtype: object

43th Element of the Dataset :  frame                frame_002.png
video                     video004
value                           32
labels          [panda, bird, cat]
dog_in_label                 False
Name: 42, dtype: object


## Section 2 - Process the data

#### Q1 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the number of frames of that video.

In [22]:
### YOUT SOLUTION
def video_frame_count(my_data_struct):
    #group the value with video and call count function for the frame
   groupDS = my_data_struct.groupby('video')['frame'].count()
   
   #returning dictionary
   return (dict(groupDS))
   
ans1 = video_frame_count(df)

#checking answer
print(ans1)
    

{'video000': 7, 'video001': 10, 'video002': 5, 'video003': 18, 'video004': 11}


#### Q2 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the sum of the `value` field of all the frames containing a `dog`.

In [21]:
### YOUR SOLUTION
def video_value_sum_with_dog(my_data_struct):
    #check which labels have dog in it
   my_data_struct['dog_in_label'] = my_data_struct['labels'].apply( lambda x : 'dog' in x )
   
   #filter that value into another data structure
   fltr = my_data_struct['dog_in_label'] == True
   fltrData = my_data_struct.where(fltr).dropna()
   
   #return the value by grouping with video and calling sum function on value
   return dict(fltrData.groupby('video')['value'].sum() )

ans2 = video_value_sum_with_dog(df)
   
print(ans2)
            

{'video000': 96.0, 'video001': 69.0, 'video002': 91.0, 'video003': 129.0, 'video004': 49.0}


#### Q3 - Last, create an algorithm that returns a dictionary with the count of each of the animal types in the dataset, excluding occurrences in `video003` and rows where the `value` is odd.

In [20]:
### YOUR SOLUTION
def animal_count(my_data_struct):
   #filter the index having odd and video003
   indxRemove = list( my_data_struct[ ( my_data_struct['video'] == 'video003') & 
                                                  (my_data_struct['value'] %2 != 0) ].index )
   
   #drop that indexes
   new_df = my_data_struct.drop(indxRemove)
   
   #convert the list of list to single list 
   animalsList = list( np.concatenate( list( new_df['labels']) ).flat  )  
   
   #dict of the animal count 
   animals_count = {}
   
   #counting the count of each animal 
   for each_animal in set(animalsList):
       animals_count[each_animal] = animalsList.count(each_animal)
   
   #returing the each count 
   return animals_count

ans3 = animal_count(df)

print(ans3)

{'frog': 15, 'cat': 14, 'panda': 15, 'dog': 21, 'bird': 22}
