# Section 2 - Coding

In this section we will load and manipulate "unconventional" data files - for which you will be required to create a simple loading functionality and then be able to process and query a data file.

There is a "section2_data.txt" file attached to this IPython notebook. The data file contains few rows from a computer vision dataset. Each row corresponds to a frame in a video and contains some metadata and annotations over it.

The following notebook will pose some questions about reading and processing this data.

Feel free to use any existing python library to answer the questions.

In [144]:
!head /kaggle/input/data-kirana/section2_data.txt

{"_i": 0, "frame": "frame_000.png", "video": "video000", "value": 39, "labels": ["bird"]}
{"_i": 1, "frame": "frame_001.png", "video": "video000", "value": 33, "labels": ["frog", "dog"]}
{"_i": 2, "frame": "frame_002.png", "video": "video000", "value": 25, "labels": ["panda", "panda"]}
{"_i": 3, "frame": "frame_003.png", "video": "video000", "value": 28, "labels": ["dog", "dog"]}
{"_i": 4, "frame": "frame_004.png", "video": "video000", "value": 16, "labels": ["cat"]}
{"_i": 5, "frame": "frame_005.png", "video": "video000", "value": 32, "labels": ["bird", "frog", "bird"]}
{"_i": 6, "frame": "frame_006.png", "video": "video000", "value": 35, "labels": ["bird", "dog"]}
{"_i": 7, "frame": "frame_000.png", "video": "video001", "value": 25, "labels": ["dog", "bird"]}
{"_i": 8, "frame": "frame_001.png", "video": "video001", "value": 16, "labels": ["dog", "panda", "bird"]}
{"_i": 9, "frame": "frame_002.png", "video": "video001", "value": 23, "labels": ["panda"]}


## Section 1 - Design a data loader

Design a data structure, that give a file path `"section2_data.txt"`, it will read and parse the contents of the file above.

#### Q1 - Design the data structure with the following properties:
- The data structure is either callable or indexable. It will accepts a single parameter, as integer, and return the parsed contents of the row corresponding to the given index.
- The data structure needs to return the number of rows in the file (and in memory) when called with the python command `len(my_data_struct)`


In [145]:
class Data_Struct_ank:
    def __init(self, file_path):
        self.data = []
        with open(file_path, 'r') as file:
            for line in file:
                row = json.loads(line.strip())  # Parse each line as JSON
                self.data.append(row)

    def __call__(self, index):
        if 0 <= index < len(self.data):
            return self.data[index]
        else:
            raise IndexError("Custom error (Ankur Shukla): Index out of range")

    def __len__(self):
        return len(self.data)

    
    def __getitem__(self, index):
        return self.data[index]

    def __iter__(self):
        self.current_index = 0
        return self

    def __next__(self):
        if self.current_index < len(self.data):
            result = self.data[self.current_index]
            self.current_index += 1
            return result
        else:
            raise StopIteration


In [146]:
my_data_struct = DataLoader("/kaggle/input/data-kirana/section2_data.txt")

In [147]:
row = my_data_struct(0)
print(row)
num_rows = len(my_data_struct)  
print(f"Number of rows: {num_rows}")

['{"_i": 0, "frame": "frame_000.png", "video": "video000", "value": 39, "labels": ["bird"]}']
Number of rows: 51


#### Q2 - Prove that you can initialize the reader and then calculate its length `len(reader)` and print the 26th and 43rd elements of the dataset.

In [148]:
## YOUR SOLUTION
num_rows = len(my_data_struct)
print(f"Number of rows: {num_rows}")


Number of rows: 51


###  Printing the 26th and 43rd elements of the dataset (indices are 0-based)

In [149]:
if num_rows >= 26:
    row_26 = my_data_struct(25)  # 26th element
    print("26th Element:", row_26)

26th Element: ['{"_i": 25, "frame": "frame_003.png", "video": "video003", "value": 24, "labels": ["panda"]}']


In [150]:
if num_rows >= 43:
    row_43 = my_data_struct(42)  # 43rd element
    print("43rd Element:", row_43)

43rd Element: ['{"_i": 42, "frame": "frame_002.png", "video": "video004", "value": 32, "labels": ["panda", "bird", "cat"]}']


### For index pass greater than 51 must throw error as in code

In [151]:
# row_43 = my_data_struct(52)

# output : 
#     ---------------------------------------------------------------------------
# IndexError                                Traceback (most recent call last)
# Cell In[21], line 1
# ----> 1 row_43 = my_data_struct(52)

# Cell In[14], line 15, in DataLoader.__call__(self, index)
#      13     return self.data[index]
#      14 else:
# ---> 15     raise IndexError(" Coustom error (Ankur Shukla) : Index out of range")

# IndexError:  Coustom error (Ankur Shukla) : Index out of range

## Section 2 - Process the data

#### Q1 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the number of frames of that video.

In [152]:
### YOUT SOLUTION


In [153]:
def video_frame_count(my_data_struct):
    
    video_frame_count = {}
    for i in range(len(my_data_struct)):
        
# Iterate through the data structure to process each row
        for row in my_data_struct(i):
        
    
            data_dict = json.loads(row)
            video_name = data_dict['video']
    
    # Check if the video_name is already in the dictionary, if not, add it with the value as 1
            if video_name not in video_frame_count:
                video_frame_count[video_name] = 1
            else:
                
        # Increment the count for the existing video_name
                 video_frame_count[video_name] += 1
    return video_frame_count 

In [154]:
data_dict=video_frame_count(my_data_struct)
print(data_dict)

{'video000': 7, 'video001': 10, 'video002': 5, 'video003': 18, 'video004': 11}


#### Q2 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the sum of the `value` field of all the frames containing a `dog`.

In [155]:
my_data_struct(0)

['{"_i": 0, "frame": "frame_000.png", "video": "video000", "value": 39, "labels": ["bird"]}']

In [156]:
### YOUR SOLUTION
import json
def video_value_sum_with_dog(my_data_struct):
    video_value_sum = {}

    
    for i in range(len(my_data_struct)):
        row = my_data_struct(i)

        # Iterate through the rows 
        for row_str in row:
            data_dict = json.loads(row_str)
            video_name = data_dict['video']
            value = data_dict['value']

            # Check if 'dog' is in the 'labels' field
            if 'dog' in data_dict['labels']:
                # Check if the video_name is already in the dictionary, if not, add it with the value as 0
                if video_name not in video_value_sum:
                    video_value_sum[video_name] = 0
                # Add the 'value' to the existing sum
                video_value_sum[video_name] += value

    return video_value_sum

            

In [157]:
video_value_sum_with_dog(my_data_struct)

{'video000': 96,
 'video001': 69,
 'video002': 91,
 'video003': 129,
 'video004': 49}

#### Q3 - Last, create an algorithm that returns a dictionary with the count of each of the animal types in the dataset, excluding occurrences in `video003` and rows where the `value` is odd.

In [158]:
### YOUR SOLUTION
def animal_count(my_data_struct):
    animal_type_count = {}

    # Iterate through the data structure
    for i in range(len(my_data_struct)):
        row = my_data_struct(i)

        
        for row_str in row:
            data_dict = json.loads(row_str)
            video_name = data_dict['video']
            value = data_dict['value']
            animal_types = data_dict['labels']

            # Exclude 'video003' and rows with odd 'value'
            if video_name != 'video003' and value % 2 == 0:
                for animal_type in animal_types:
                    # Increment the count for each animal type
                    if animal_type not in animal_type_count:
                        animal_type_count[animal_type] = 1
                    else:
                        animal_type_count[animal_type] += 1

    return animal_type_count

In [159]:
animal_count(my_data_struct)

{'dog': 10, 'cat': 7, 'bird': 6, 'frog': 8, 'panda': 4}