# Disney+ Movies and TV Shows #


You are working as Junior Data Analyst at Disney Plus. There is combined data on movies and TV shows. It is available in [Kaggle](https://www.kaggle.com/datasets/shivamb/disney-movies-and-tv-shows). 

>The Content Manager wants you to clean the data and provide insights to answer some of the questions like:
>
>1. Identify the unique list of ratings. How many movies and tv shows are listed under each rating?
>2. What is the average duration of movies and tv shows?
>3. How many movies and tv shows have been released till now in Germany. Give the list year-wise.
>4. Which Director has directed the maximum number of movies and in which genre?

The task has been handed over to you by your team lead in the data department. They expect you to apply your Python knowledge on strings, descriptive statistics and want you to <span style="color:#fc5a37">build re-usable functions.</span>

Your analysis will help the Content Manager at Disney Plus to make a decision on the direction and future investments the company makes in movies as well as TV shows and will influence what gets released. 

Also, your manager wants to use the functions himself and he is not code savvy. For him to understand the functions and re-use them, <span style="color:#fc5a37">add DocStrings to your functions</span> mentioning what it does, expected inputs along with their default values and expected outputs.

There are expected outputs mentioned for few of the cells for you to confirm that you are on right track. 

You might have to refer to Python documentation. 

## Data dictionary ##
|nr | name | description |
|----|----|:----|
|1.| **show_id** | Unique id |
|2.| **type** | Movie or TV Show |
|3.| **title** | Name of the movie/show |
|4.| **director** | Directors of the movie/show |
|5.| **cast** | Main cast of the movie/show |
|6.| **country** | Country of production |
|7.| **date_added** | Date added on Disney Plus |
|8.| **release_year** | Original Release Year of the movie/tv show |
|9.| **rating** | Rating of the movie/show |
|10.| **duration** | Total duration of the movie/show |
|11.| **listed_in** | Genres in which the movie is listed |  
|12.| **description** | One-Line content description |


## Step 1: Read the file

In [3]:
# This code block is pre-coded for you. You don't have to write anything in this one.

from csv import reader #Package that reads the csv file.

with open('./Data/disney_plus_titles.csv', encoding="utf-8") as opened_file:
    read_file = reader(opened_file)
    data_list = list(read_file) # This stores the data in the list of lists format.

- extract the header (column names), which is first row in data_list and assign it to **`data_header`**
- keep rest other rows (except header) in **`data`**. 

Hint: How would you access the first element of a list?

In [None]:
data_list

In [4]:
# Complete this code block
data_header = data_list[0]
data = data_list[1:]

In [None]:
# print the column names of given data. 
data_header

In [None]:
data

## Step 2: Create function `explore_data` to explore data ##

### Function description and output overview
The function **`explore_data`** should print the following:
1. Selected rows from start to end.
2. Number of rows and columns if chosen to do so.

The function will take following parameters as input
1. **`my_data_local`**
2. **`start_row_local`**
3. **`end_row_local`**
4. **`rows_and_columns_local`** a boolean to check if you want to display total number of rows and columns. 

For example, I want to display first five rows and also want to display total number of rows and columns in the input data. 

So I will call this function as **`explore_data(data,0,5,True)`** and the expected output is:  
  
```python
['s1', 'Movie - Animation', 'Duck the Halls: A Mickey Mouse Christmas Special', 'Alonso Ramirez Ramos, Dave Wasson', 'Chris Diamantopoulos, Tony Anselmo, Tress MacNeille, Bill Farmer, Russi Taylor, Corey Burton', '', 'November 26, 2021', '2016', 'TV-G', '23 min', 'Animation, Family', 'Join Mickey and the gang as they duck the halls!']


['s2', 'Movie-Comedy', 'Ernest Saves Christmas', 'John Cherry', 'Jim Varney, Noelle Parker, Douglas Seale', '', 'November 26, 2021', '1988', 'PG', '91 min', 'Comedy', 'Santa Claus passes his magic bag to a new St. Nic.']


['s3', 'Movie', 'Ice Age: A Mammoth Christmas', 'Karen Disher', 'Raymond Albert Romano, John Leguizamo, Denis Leary, Queen Latifah', 'United States', 'November 26, 2021', '2011', 'TV-G', '23 min', 'Animation, Comedy, Family', "Sid the Sloth is on Santa's naughty list."]


['s4', 'Movie', 'The Queen Family Singalong', 'Hamish Hamilton', 'Darren Criss, Adam Lambert, Derek Hough, Alexander Jean, Fall Out Boy, Jimmie Allen', '', 'November 26, 2021', '2021', 'TV-PG', '41 min', 'Musical', 'This is real life, not just fantasy!']


['s5', 'TV Show - Docuseries', 'The Beatles: Get Back', '', 'John Lennon, Paul McCartney, George Harrison, Ringo Starr', '', 'November 25, 2021', '2021', '', '1 Season', 'Docuseries, Historical, Music', 'A three-part documentary from Peter Jackson capturing a moment in music history with The Beatles.']


no. of rows: 1450

no. of columns: 12
```

### Pseudocode (example)


[How to write Pseudocode](https://www.geeksforgeeks.org/how-to-write-a-pseudo-code/)

Below is the Pseudocode for the function

def explore_data(Give the parameters along with default values):

  1. Slice the **`my_data_local`** using **`start_row_local`** and **`end_row_local`**. Remember data is a list.

  2. Write a for loop to print each element in sliced data and adding a new empty line after each row using print('\n').
        
  3. Write an **`if statement`** which checks for if user wants to display number of rows and columns.   
    
     If **yes**, then print: 
     - the number of rows using **`len`** function on data 
     - and number of columns using **`len`** function on the first element of data. 

### Function definition

def explore_data(my_data_local, start_row_local, end_row_local, rows_and_colums_local):
    """
    
    """
    for rows in range(start_row_local - 1, end_row_local):
        print(data[rows],'\n')
    if rows_and_colums_local:
        print(f'The total number of rows is {len(my_data_local)}.')
        print(f'The total number of colums is {len(my_data_local[0])}.')

In [None]:
explore_data(data,1,5,True)

In [5]:
# Update the function definition based on the Pseudocode above
def explore_data(my_data_local, start_row_local, end_row_local, rows_and_colums_local=False):
    '''
    This function explores the data. It takes following parameters as input:\n
        1. Data
        2. Start Row
        3. End Row
        4. A boolean to display number of rows and columns
    It will print:
        1. Required rows
        2. Number of rows and columns if required.
    '''
    if start_row_local > 0:
        for rows in range(start_row_local-1, end_row_local):
            print(data[rows],'\n')
    elif start_row_local < 0:
        for rows in range(start_row_local, end_row_local+1):
            print(data[rows],'\n')
    if rows_and_colums_local:
        print(f'The total number of rows is {len(my_data_local)}.')
        print(f'The total number of colums is {len(my_data_local[0])}.')

def explore_data(data_local, start_row_local, end_row_local, show_last_row):
    if show_last_row == True and end_row_local == -1:
        sliced_data = data_local[start_row_local:]
    else:
        sliced_data = data_local[start_row_local: end_row_local]
    for data in sliced_data:
        print(data, '\n')

### Calling function to explore data

#### top 5 rows
Use `explore_data` function to display top 5 rows along with number of rows and columns.

In [None]:
explore_data(data,-5,-1,True)

####  last 5 rows

Again use the `explore_data` function to display the last 5 rows. 

Use negative indexing for the **start_row**. 

For the **end_row**, modify your function definition to display until the last row. 

This time you don't have to display the number of rows and columns. 

Double-check that the **`title`** in the last row is <span style="color:#fc5a37">'Captain Sparky vs. The Flying Saucers'.</span>

In [None]:
explore_data(data,-5,-1)

## Step 3: Separate Movies and TV Shows

### Description
Code a **`for loop`** to separate lists

1. Movies to **`disney_movies`**
2. TV Shows to **`disney_shows`**
3. In **`disney_other`** collect anything other than movies and shows
    
So, if the **`type`** is *"Movie - Animation"* or *"MOVIE-Comedy"*, it should be in **`disney_movies`**.

There could be many other combinations. So, if the **`type`** starts with *"movie"* it should be in **`disney_movies`**

Similarly, for example, if the **`type`** is *"TV Show - Docuseries"* or *"TV show - Season 1"*, it should be in **`disney_shows`**. 

Rest other that do not start with *"Movie"* or *"TV Show"* should be classified as **`disney_other`**. 

### Pseudocode (example)

Write a for loop to go through each row in the **`data`**

- Convert the **`title`** to lower case. Because the **`title`** column is the second column, you will need to get the element at index 1 in each row. 
- Write an if else statement
  - If title starts with <span style="color:#fc5a37"> "movie" </span>
    - Update the **`title`** to just <span style="color:#fc5a37"> "Movie" </span>
    - Append it to **`disney_movies`**
  - Else, do the same with <span style="color:#fc5a37"> "tv" </span> and append to **`disney_shows`**
  - Anything else, should be appended to **`disney_other`**


### Write code below

In [6]:
disney_movies = []
disney_shows = []
disney_other = []

data_1 = []
for r in data:
    data_1.append(r.copy()) #To copy a list which contains other list, have to go down deeper and make copy of the lowest level list.

for row in data_1:
    typ = row[1].lower()
    if typ.startswith('movie'):
        row[1] = 'Movie'
        disney_movies.append(row)
    elif typ.startswith('tv'):
        row[1] = 'TV Show'
        disney_shows.append(row)
    else:
        disney_other.append(row)

### Explore newly created datasets

Explore all your new datasets created using your earlier created function. 

How many Movies and TV Shows are listed? 

Are there any other than Movies and TV Shows?

Expected outputs:

```python

Disney Movies:  1052

Disney TV Shows:  398

Disney Others:  0
```

In [None]:
# Movies
print(len(disney_movies))

In [None]:
# TV Shows
print(len(disney_shows))

In [None]:
# Others
print(len(disney_other))

## Step 4: Get a list of unique values in a column ##

### Function description and output overview

The function **`list_of_elements`** should print:
  - The list of unique values in any required column

The function will take the following as parameters:
1. **`my_data_local`**
2. **`col_index_local`**

For example, I want to get unique values in `rating`column.
I will call this function and the output will be:

```python
['TV-G', 'PG', 'TV-PG', '', 'PG-13', 'TV-14', 'G', 'TV-Y7', 'TV-Y', 'TV-Y7-FV']
```


### Pseudocode
Below is the Pseudocode for the function

def list_of_elements(Give the parameters along with default values):

  1. Create an empty list and assign it to **`result_list`**

  2. Write a for loop to iterate over each row element in **`my_data_local`**
      - If element in the row at **`col_index_local`** is not in the **`result_list`**, append it.
      
  3. Return **`result_list`**

### Function definition

In [7]:
# Update the function definition based on the Pseudocode above
def list_of_elements(my_data_local, col_index_local):
    '''
    This function prints the list of unique values in any required column. \n
    It takes following parameters as input:\n
        1. Data
        2. Column index
    '''
    result_list = []
    for row in my_data_local:
        if row[col_index_local] not in result_list:
            result_list.append(row[col_index_local])
    return result_list

### Calling function

What is the column index of `rating`column?

Use the above created function to generate the list of unique `rating` in the original dataset. 

In [None]:
print(list_of_elements(data,8))

## Step 5: For each corresponding unique value in a column get the number of movies and shows. ##

### Function description and output overview

The function `elements_count` should return:
  - The key value pair where
    - key is the unique element
    - value is the count of movies or TV shows or both with that unique element

The function will take the following as parameters:
1. **`my_data_local`**
2. **`col_index_local`**

For example, I want to get unique values and corresponding count in **`rating`** column in **`disney_movies`**.
I will call this function on **`disney_movies`** and the output will be:

```python
{'G': 253, 'PG': 235, 'TV-G': 233, 'TV-PG': 181, 'PG-13': 66, 'TV-14': 37, 'TV-Y7': 36, 'TV-Y7-FV': 7, 'TV-Y': 3, '': 1}
```


### Pseudocode
Below is the Pseudocode for the function

def elements_count(Give the parameters along with default values):

  1. Create an empty dictionary and assign it to **`element_count_dict`**

  2. Write a for loop to iterate over each row element in **`my_data_local`**
      - If element in the row at **`col_index_local`** is not in the key of **`element_count_dict`**, add it and initialize the value to 1.
      - If element in the row at **`col_index_local`** is in the **`element_count_dict`**, increment the value by 1.

  3. Sort the dictionary based on the corresponding values in descending order. 
      
  3. Return **`element_count_dict`**

### Function definition

In [8]:
# Update the function definition based on the Pseudocode above
def elements_count(my_data_local, col_index_local):
    """
    This function returns the key value pair where
    - key is the unique element
    - value is the count of movies or TV shows or both with that unique element \n
    It takes following parameters as input:\n
        1. Data
        2. Column index
    It returns a dictionary in which the key-value pairs are stored with value in descending order.
    """
    element_count_dict = {}
    for row in my_data_local:
        item = row[col_index_local]
        if item not in element_count_dict:
            element_count_dict[item] = 1
        else:
            element_count_dict[item] += 1
    sorted_elements_count = dict(sorted(element_count_dict.items(), key=lambda x:x[1], reverse=True))
    return sorted_elements_count

In [9]:
elements_count(data, 10)

{'Animation, Comedy, Family': 124,
 'Action-Adventure, Animation, Comedy': 77,
 'Action-Adventure, Animation, Kids': 45,
 'Action-Adventure, Animation, Family': 40,
 'Animals & Nature, Documentary, Family': 40,
 'Animals & Nature, Docuseries, Family': 39,
 'Animals & Nature, Documentary': 35,
 'Animation, Family, Fantasy': 31,
 'Action-Adventure, Comedy, Family': 28,
 'Animation, Family': 26,
 'Documentary': 25,
 'Comedy, Family, Fantasy': 21,
 'Comedy, Family': 21,
 'Documentary, Historical': 21,
 'Animation, Kids': 17,
 'Action-Adventure, Animation, Fantasy': 17,
 'Action-Adventure, Family, Fantasy': 16,
 'Comedy, Coming of Age, Family': 16,
 'Action-Adventure, Family, Science Fiction': 16,
 'Animation, Family, Kids': 16,
 'Action-Adventure, Comedy, Coming of Age': 15,
 'Action-Adventure, Science Fiction': 14,
 'Comedy, Coming of Age, Drama': 13,
 'Documentary, Family': 12,
 'Action-Adventure, Science Fiction, Superhero': 11,
 'Action-Adventure, Animals & Nature, Family': 11,
 'Actio

### Calling function
Print the Movie Ratings Count and TV Shows Ratings Count. 

You will have to call above function separately on **`disney_movies`** and **`disney_shows`**

In [None]:
elements_count(disney_movies,8)

In [None]:
elements_count(disney_shows,8)

## Step 6: Get the list of categories for `listed_in` column ##

Get the list of unique values in `listed_in` column. 

It's now a one line code. 

You have to just re-use one of the earlier created functions. 

Could you identify which one?

In [10]:
print(list_of_elements(data,10))

['Animation, Family', 'Comedy', 'Animation, Comedy, Family', 'Musical', 'Docuseries, Historical, Music', 'Biographical, Documentary', 'Action-Adventure, Superhero', 'Docuseries, Reality, Survival', 'Animals & Nature, Docuseries, Family', 'Comedy, Family, Musical', 'Documentary', 'Comedy, Family, Music', 'Documentary, Family', 'Action-Adventure, Animals & Nature, Docuseries', 'Animals & Nature', 'Animation', 'Animation, Kids', 'Comedy, Coming of Age, Drama', 'Comedy, Family, Fantasy', 'Animation, Comedy, Drama', 'Animation, Family, Fantasy', 'Action-Adventure, Animation, Comedy', 'Comedy, Family', 'Action-Adventure, Comedy, Family', 'Lifestyle', 'Movies', 'Action-Adventure, Science Fiction', 'Action-Adventure, Fantasy, Superhero', 'Coming of Age, Music', 'Animation, Drama', 'Concert Film, Music', 'Animation, Comedy, Coming of Age', 'Animation, Comedy', 'Animation, Crime, Family', 'Science Fiction', 'Action-Adventure, Fantasy', 'Comedy, Fantasy, Kids', 'Action-Adventure, Comedy, Kids', '

## Step 7: Get the unique list of above categories and then the count in each category ##

### Function description and output overview

From the output of **Step 6**, it is clear that there are multiple genres listed in single row.

So, a movie might be listed in animation as well as family.

The function **`elements_count`** should return:
  - The key value pair where
    - key is the unique element
    - value is the count of movies or TV shows or both with that unique element

The function will take the following as parameters:
1. **`my_data_local`**
2. **`col_index_local`**
3. **`sep_local`** which separates each items in one single cell. So if it is listed in 'animation, family', it should be considered in the count for both the genres

For example, I want to get unique values and corresponding count in `listed_in`column in `disney_movies`.
I will call this function on `disney_movies` and the output will be:

```python
{'Family': 533, 'Comedy': 407, 'Animation': 381, 'Action-Adventure': 314, 'Documentary': 174, 'Fantasy': 158, 'Coming of Age': 153, 'Animals & Nature': 130, 'Drama': 121, 'Science Fiction': 76, 'Biographical': 41, 'Musical': 40, 'Kids': 39, 'Music': 38, 'Sports': 38, 'Historical': 38, 'Buddy': 24, 'Romance': 19, 'Superhero': 16, 'Crime': 16, 'Mystery': 8, 'Concert Film': 7, 'Variety': 7, 'Parody': 7, 'Anthology': 7, 'Dance': 6, 'Thriller': 5, 'Western': 5, 'Reality': 4, 'Lifestyle': 3, 'Movies': 3, 'Survival': 3, 'Spy/Espionage': 2, 'Romantic Comedy': 2, 'Medical': 2, 'Disaster': 2}
```

### Pseudocode



<span style="color:#fc5a37"> UPDATE </span> the Pseudocode below for the requirement above.

def elements_count(Give the parameters along with default values):

  1. Create an empty dictionary and assign it to **`element_count_dict`**

  2. Write a for loop to iterate over each row element in **`my_data_local`**
      - If element in the row at **`col_index_local`** is not in the key of **`element_count_dict`**, add it and initialize the value to 1.
      - If element in the row at **`col_index_local`** is in the **`element_count_dict`**, increment the value by 1.

  3. Sort the dictionary based on the corresponding values in descending order. 
      
  3. Return **`element_count_dict`**

### Function definition
Update the function definition below. Include DocString that would explain what this function does.

In [None]:
data[2][10].split(',')

In [11]:
def elements_count(my_data_local, col_index_local, sep_local=','):
    """
    This function returns the key value pair where
    - key is the unique element
    - value is the count of movies or TV shows or both with that unique element \n
    It takes following parameters as input:\n
        1. Data
        2. Column index
        3. The delimiter which separate each element in a single cell 
    It returns a dictionary in which the key-value pairs are stored with value in descending order.
    """
    element_count_dict = {}
    for row in my_data_local:
        cell_elements = row[col_index_local].split(sep_local)
        for item in cell_elements:
            item_strip = item.strip()
            if item_strip not in element_count_dict:
                element_count_dict[item_strip] = 1
            else:
                element_count_dict[item_strip] += 1
    sorted_elements_count = dict(sorted(element_count_dict.items(), key=lambda x:x[1], reverse=True))
    return sorted_elements_count

### Calling Function
Use above created function to print the unique list of Genres which is in the column `listed-in`.

It should print this separately for **`Movies`** and **`TV Shows`**.

In [12]:
elements_count(disney_movies, 10)

{'Family': 533,
 'Comedy': 407,
 'Animation': 381,
 'Action-Adventure': 314,
 'Documentary': 174,
 'Fantasy': 158,
 'Coming of Age': 153,
 'Animals & Nature': 130,
 'Drama': 121,
 'Science Fiction': 76,
 'Biographical': 41,
 'Musical': 40,
 'Kids': 39,
 'Music': 38,
 'Sports': 38,
 'Historical': 38,
 'Buddy': 24,
 'Romance': 19,
 'Superhero': 16,
 'Crime': 16,
 'Mystery': 8,
 'Concert Film': 7,
 'Variety': 7,
 'Parody': 7,
 'Anthology': 7,
 'Dance': 6,
 'Thriller': 5,
 'Western': 5,
 'Reality': 4,
 'Lifestyle': 3,
 'Movies': 3,
 'Survival': 3,
 'Spy/Espionage': 2,
 'Romantic Comedy': 2,
 'Medical': 2,
 'Disaster': 2}

In [13]:
elements_count(disney_shows, 10)

{'Animation': 161,
 'Action-Adventure': 138,
 'Docuseries': 122,
 'Comedy': 119,
 'Kids': 102,
 'Family': 99,
 'Animals & Nature': 78,
 'Coming of Age': 52,
 'Fantasy': 34,
 'Reality': 22,
 'Anthology': 21,
 'Buddy': 16,
 'Historical': 15,
 'Science Fiction': 15,
 'Drama': 13,
 'Music': 10,
 'Game Show / Competition': 10,
 'Survival': 6,
 'Sports': 5,
 'Lifestyle': 5,
 'Variety': 5,
 'Medical': 4,
 'Anime': 4,
 'Musical': 4,
 'Mystery': 4,
 'Superhero': 3,
 'Series': 3,
 'Western': 2,
 'Soap Opera / Melodrama': 2,
 'Parody': 2,
 'Police/Cop': 1,
 'Talk Show': 1,
 'Romance': 1,
 'Spy/Espionage': 1,
 'Travel': 1}

In [14]:
elements_count(data,10)

{'Family': 632,
 'Animation': 542,
 'Comedy': 526,
 'Action-Adventure': 452,
 'Animals & Nature': 208,
 'Coming of Age': 205,
 'Fantasy': 192,
 'Documentary': 174,
 'Kids': 141,
 'Drama': 134,
 'Docuseries': 122,
 'Science Fiction': 91,
 'Historical': 53,
 'Music': 48,
 'Musical': 44,
 'Sports': 43,
 'Biographical': 41,
 'Buddy': 40,
 'Anthology': 28,
 'Reality': 26,
 'Romance': 20,
 'Superhero': 19,
 'Crime': 16,
 'Variety': 12,
 'Mystery': 12,
 'Game Show / Competition': 10,
 'Survival': 9,
 'Parody': 9,
 'Lifestyle': 8,
 'Concert Film': 7,
 'Western': 7,
 'Medical': 6,
 'Dance': 6,
 'Thriller': 5,
 'Anime': 4,
 'Movies': 3,
 'Spy/Espionage': 3,
 'Series': 3,
 'Romantic Comedy': 2,
 'Soap Opera / Melodrama': 2,
 'Disaster': 2,
 'Police/Cop': 1,
 'Talk Show': 1,
 'Travel': 1}

Fill the blanks below manually based on the output in the above two cells.

The maximum movies are listed in __Family__ followed by __Comedy__.

The maximum Shows are listed in __Animation__ followed by __Action-Adventure__.

## Step 8: What is the average duration of movies and shows ##

### Function description and output overview

You need to write two functions for this step:

1. The duration of movies is in minutes and shows in seasons.

    Define function **`duration_conversion`** that will remove the suffix <span style="color:#fc5a37"> minutes </span> and <span style="color:#fc5a37"> seasons </span> and convert them to numeric.

    This should take below parameters:

    a. my_data_local

    b. col_index_local  
  
  
2. Create a function **`average`** that will create the average of all the values in a column in given dataset.

    This should take below parameters:

    a. my_data_local
    
    b. col_index_local

### Pseudocode

1. Write pseudocode for function **`duration_conversion`** that will delete any suffixes after the space.

2. Write pseudcode for function **`average`**

### Function definition
Update the function definition for **`duration_conversion`** and **`average`**.

In [15]:
def duration_conversion(my_data_local, col_index_local):
    """
    This function will remove the suffix minutes or seasons and convert them to numeric.\n
    It takes following parameters as input:\n
        1. Data
        2. Column index
    It returns a list in which the column duration contains only numeric values.
    """
    new_list = []
    for row in my_data_local:
        mod_row = row.copy()
        position = row[col_index_local].find(' ')
        mod_row[col_index_local] = int(mod_row[col_index_local][:position])
        new_list.append(mod_row)
    return new_list

In [16]:
def average(my_data_local, col_index_local):
    """
    This function returns the average of all the values in a column in given dataset\n
    It takes following parameters as input:\n
        1. Data
        2. Column index
    It returns the average of selected column.
    """
    sum = 0
    for row in my_data_local:
        sum += row[col_index_local]
    avg = sum / len(my_data_local)
    return avg

### Calling functions
Use the newly created functions to calculate the average duration for movies. 

You will have to first call the function **`duration_conversion`** on disney_movies to convert the duration to numeric. 

Also, remember to store this in another data list **`disney_movies_mod`**. 

Now call your average function using this new data list to calculate the average duration of movies. 

The expected output is:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">
Average Duration of Movies in minutes:  71.9106463878327

In [17]:
disney_movies_mod = duration_conversion(disney_movies, 9)

In [18]:
avg_movies = average(disney_movies_mod, 9)

In [19]:
print("Average Duration of Movies in minutes: ", avg_movies)

Average Duration of Movies in minutes:  71.9106463878327


 Now Calculate the average duration for TV Shows. 
 
 The Expected output is:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">
 The average duration of TV shows is 2 seasons.

In [20]:
disney_shows_mod = duration_conversion(disney_shows, 9)

In [21]:
avg_shows = average(disney_shows_mod, 9)
print("Average Duration of TV shows in seasons: ", avg_shows)

Average Duration of TV shows in seasons:  2.1180904522613067


## (OPTIONAL) Step 9: Can you make your function more robust? ##

### Function description and output overview
Make a function **`filter_data`** to filter the data based on criteria in the input.

Use this function **`filter_data`** in **`elements_count`** to generate statistics for filtered data. 

For example, if your content manager asks you to provide the count of movies based on genre only in Germany. 

You start working on the function and after some time he wants the data only for the year 2004. 

With this ever changing filtering criteria, you decide to make your function handle this dynamic changing filtering list. 

You now accept the filtering criteria in a dictionary. Where keys are the column index and value are the respective filter values.

If I want to print the count of movies based on genre and without any filtering criteria. 

We need to just call the function like this:
```python
print(items_count(disney_movies, 10, ',', {}))
```
The output should look like:

```python
{'Family': 533, 'Comedy': 407, 'Animation': 381, 'Action-Adventure': 314, 'Documentary': 174, 'Fantasy': 158, 'Coming of Age': 153, 'Animals & Nature': 130, 'Drama': 121, 'Science Fiction': 76, 'Biographical': 41, 'Musical': 40, 'Kids': 39, 'Music': 38, 'Sports': 38, 'Historical': 38, 'Buddy': 24, 'Romance': 19, 'Superhero': 16, 'Crime': 16, 'Mystery': 8, 'Concert Film': 7, 'Variety': 7, 'Parody': 7, 'Anthology': 7, 'Dance': 6, 'Thriller': 5, 'Western': 5, 'Reality': 4, 'Lifestyle': 3, 'Movies': 3, 'Survival': 3, 'Spy/Espionage': 2, 'Romantic Comedy': 2, 'Medical': 2, 'Disaster': 2}
```


In [22]:
def filter_data(row_local, filter_local):
    '''
    This function will take one row of the data at a time and also the filter criteria in the form of a dictionary where key is the column location and the value is the filter value expected.\n
    It takes following parameters as input:\n
        1. List contains one row
        2. Dictionary {column:criteria,...}
    It will return a list which contains one row that fulfill the criteria, otherwise return a empty list. 
    '''
    include_row_ll = True # By default, the row qualifies for the filtering criteria
    # Write a for loop that will loop over all the key value pairs of the filters and check the values in the row.
    # Inside for loop, write a if-else that will check if there are "," in that field. If yes, then create a list out of it splitting it and then checking each value for the filter values. Else, directly check it for the filter values.
    if not filter_local:
        new_row = row_local
    else:
        count = 0
        for key in filter_local:
            if row_local[key].find(',') != -1:
                cell_items = row_local[key].split(',')
                for item in cell_items:
                    if filter_local[key] == item.strip():
                        count += 1
            else:
                if filter_local[key] == row_local[key]:
                    count += 1
    if count == len(filter_local):
        new_row = row_local
    else:
        new_row = []
    return new_row

In [23]:
filter_data(data[128], {5:'Hungary'})

['s129',
 'Movie',
 'Eragon',
 'Stefen Fangmeier',
 'Ed Speleers, Jeremy Irons, Sienna Guillory, Robert Carlyle, John Malkovich, Garrett Hedlund',
 'United States, United Kingdom, Hungary',
 'August 20, 2021',
 '2006',
 'PG',
 '104 min',
 'Action-Adventure, Family, Fantasy',
 'In a mythical time, a teenage boy becomes a dragon rider and embarks on a journey of adventure.']

In [None]:
data[128]

In [24]:
def items_count(data_local, col_local, sep_local, filters_local):
    '''
    This function returns the filtered key value pair where
    - key is the unique element
    - value is the count of movies or TV shows or both with that unique element \n
    It takes following parameters as input:\n
        1. Data
        2. Column to be looked into
        3. The delimiter e.g. ','
        4. Filter criteria as dictionary e.g. {column:criteria, ....}
    It returns a dictionary in which the key-value pairs are stored with value in descending order. 
    '''    
    #use `filter_data` function within this function
    element_count_dict = {}
    for row in data_local:
        filtered_row = filter_data(row, filters_local)
        if not filtered_row:
            continue
        else:
            cell_elements = filtered_row[col_local].split(sep_local)
            for item in cell_elements:
                item_strip = item.strip()
                if item_strip not in element_count_dict:
                    element_count_dict[item_strip] = 1
                else:
                    element_count_dict[item_strip] += 1
    sorted_elements_count = dict(sorted(element_count_dict.items(), key=lambda x:x[1], reverse=True))
    return sorted_elements_count
        

In [25]:
filter_data(data[130],{5:'germany'})

[]

### Calling function

Print the count of movies based on genre and without any filtering criteria.

In [None]:
print(items_count(disney_movies, 10, ',', {}))

Now, print the count of movies based on genre and only for Germany. So you will have to give the filter criteria as {5: 'Germany'}

In [29]:
print(items_count(disney_movies, 10, ',', {5:'Hungary'}))

{'Family': 2, 'Fantasy': 2, 'Action-Adventure': 2, 'Animation': 1, 'Comedy': 1, 'Coming of Age': 1}


Now, print the count of movies based on genre and only for Germany and that too only for year 2004. Now the filter to be used is {5: 'Germany', 7: '2004'}

The expected output is:

```python 
{'Comedy': 2, 'Action-Adventure': 1, 'Family': 1, 'Coming of Age': 1}
```

In [31]:
print(items_count(disney_movies, 10, ',', {5:'China',7:'2004'}))

{}
