# Disney+ Movies and TV Shows #


You are working as Junior Data Analyst at Disney Plus. There is combined data on movies and TV shows. It is available in [Kaggle](https://www.kaggle.com/datasets/shivamb/disney-movies-and-tv-shows). 

The Content Manager wants you to clean the data and provide insights to answer some of the questions like:

1. Identify the unique list of ratings. How many movies and tv shows are listed under each rating?
2. What is the average duration of movies and tv shows?
3. How many movies and tv shows have been released till now in Germany. Give the list year-wise.
4. Which Director has directed the maximum number of movies and in which genre?

The task has been handed over to you by your team lead in the data department. They expect you to apply your Python knowledge on strings, descriptive statistics and want you to <span style="color:#fc5a37">build re-usable functions.</span>

Your analysis will help the Content Manager at Disney Plus to make a decision on the direction and future investments the company makes in movies as well as TV shows and will influence what gets released. 

Also, your manager wants to use the functions himself and he is not code savvy. For him to understand the functions and re-use them, <span style="color:#fc5a37">add DocStrings to your functions</span> mentioning what it does, expected inputs along with their default values and expected outputs.

There are expected outputs mentioned for few of the cells for you to confirm that you are on right track. 

You might have to refer to Python documentation. 

## Data dictionary ##

1. show_id - Unique id
2. type - Movie or TV Show
3. title - Name of the movie/show
4. director - Directors of the movie/show
5. cast - Main cast of the movie/show
6. country - Country of production
7. date_Added - Date added on Disney+
8. release_year - Original Release Year of the movie/tv show
9. rating - Rating of the movie/show 
10. duration - Total duration of the movie/show

## Step 1: Read the file

In [1]:
# This code block is pre-coded for you. You don't have to write anything in this one.
from csv import reader #Package that reads the csv file.
opened_file = open('./Data/disney_plus_titles.csv', encoding="utf-8")
read_file = reader(opened_file)
data_list = list(read_file) # This stores the data in the list of lists format.

Extract the header (column names) which is first row in data_list in `data_header` and keep rest other rows (except header) in `data`. 

Hint: How would you access the first element of a list?

In [2]:
data_header = data_list[0]
data = data_list[1:]

In [None]:
print(data_header)

## Step 2: Create function `explore_data` to explore data ##

### Function description and output overview
The function `explore_data` should print the following:
1. Selected rows from start to end.
2. Number of rows and columns if chosen to do so.

The function will take following parameters as input
1. `my_data_local`
2. `start_row_local`
3. `end_row_local`
4. `rows_and_columns_local` a boolean to check if you want to display total number of rows and columns. 

For example, I want to display first five rows and also want to display total number of rows and columns in the input data. 

So I will call this function as `explore_data(data,0,5,True)`and the expected output is:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">

['s1', 'Movie - Animation', 'Duck the Halls: A Mickey Mouse Christmas Special', 'Alonso Ramirez Ramos, Dave Wasson', 'Chris Diamantopoulos, Tony Anselmo, Tress MacNeille, Bill Farmer, Russi Taylor, Corey Burton', '', 'November 26, 2021', '2016', 'TV-G', '23 min', 'Animation, Family', 'Join Mickey and the gang as they duck the halls!']


['s2', 'Movie-Comedy', 'Ernest Saves Christmas', 'John Cherry', 'Jim Varney, Noelle Parker, Douglas Seale', '', 'November 26, 2021', '1988', 'PG', '91 min', 'Comedy', 'Santa Claus passes his magic bag to a new St. Nic.']


['s3', 'Movie', 'Ice Age: A Mammoth Christmas', 'Karen Disher', 'Raymond Albert Romano, John Leguizamo, Denis Leary, Queen Latifah', 'United States', 'November 26, 2021', '2011', 'TV-G', '23 min', 'Animation, Comedy, Family', "Sid the Sloth is on Santa's naughty list."]


['s4', 'Movie', 'The Queen Family Singalong', 'Hamish Hamilton', 'Darren Criss, Adam Lambert, Derek Hough, Alexander Jean, Fall Out Boy, Jimmie Allen', '', 'November 26, 2021', '2021', 'TV-PG', '41 min', 'Musical', 'This is real life, not just fantasy!']


['s5', 'TV Show - Docuseries', 'The Beatles: Get Back', '', 'John Lennon, Paul McCartney, George Harrison, Ringo Starr', '', 'November 25, 2021', '2021', '', '1 Season', 'Docuseries, Historical, Music', 'A three-part documentary from Peter Jackson capturing a moment in music history with The Beatles.']


no. of rows: 1450

no. of columns: 12

### Pseudocode
Below is the Pseudocode for the function

def explore_data(Give the parameters along with default values):

  1. Slice the `my_data_local` using `start_row_local` and `end_row_local`. Remember data is a list.

  2. Write a for loop to print each element in sliced data and adding a new empty line after each row using print('\n').
        
  3. Write an `if statement` which checks for if user wants to display number of rows and columns. 
      - If yes, then print the number of rows using `len` function on data and number of columns using `len` function on the first element of data. 

### Function definition

In [4]:
def explore_data(data_l, start_l, end_l, rows_and_columns_l=False):
    """Function that explores the data and returns the data between given start and end rows. Also, it displays total number of rows and columns if set as True. By default, it is set to False.
    
    Args:
        data_list, starting row, ending row(exclusive) and flag to check if number of rows and columns are to be displayed.

    Returns:
        sliced data_list and of flag is set then number of columns and rows.
        
    """
    if end_l == None:
        data_slice_l = data_l[start_l:]
    else:
        data_slice_l = data_l[start_l:end_l]
        
    for row_l in data_slice_l:
        print(row_l)
        print('\n')
        
    if rows_and_columns_l:
        print('no. of rows:', len(data_l))
        print('no. of columns:', len(data_l[0]))

### Calling function to explore data

Use `explore_data` function to display top 5 rows along with number of rows and columns.

In [None]:
explore_data(data,0,5,True)

Again use the `explore_data` function to display the last 5 rows. 

Use negative indexing for the start_row. 

For the end_row, modify your function definition to display until the last row. 

This time you don't have to display the number of rows and columns. 

Make sure that the `title` in the last row is <span style="color:#fc5a37">'Captain Sparky vs. The Flying Saucers'.</span>

In [None]:
explore_data(data,-5, None, False)

## Step 3: Separate Movies and TV Shows

### Description
Code a `for loop` to separate

1. Movies to `disney_movies`
2. TV Shows to `disney_shows`
3. also check if there are any other than movies and shows. 
    
So, if the `type` is `Movie - Animation` or `MOVIE-Comedy`, it should be in `disney_movies`.

There could be many other combinations. So, if the `type`starts with `movie`it should be in `disney_movies`

Similarly, for example, if the `type`is `TV Show - Docuseries` or `TV show - Season 1`, it should be in `disney_shows`. 

Rest other that do not start with `Movie`or `TV Show` should be classified as `disney_other`. 

### Pseudocode

Write a for loop to go through each row in the `data`

- Convert the `title`to lower case. Because the `title`column is the second column, you will need to get the element at index 1 in each row. 
- Write an if else statement
  - If title starts with <span style="color:#fc5a37"> "movie" </span>
    - Update the `title`to just <span style="color:#fc5a37"> "Movie" </span>
    - Append it to `disney_movies`
  - Else, do the same with <span style="color:#fc5a37"> "tv" </span> and append to `disney_shows`
  - Anything else, should be appended to `disney_other`


### Write code below

In [7]:
disney_movies = []
disney_shows = []
others = []

for row in data:
    type = row[1].lower()
    if type.startswith("movie"):
        row[1] = "Movie"
        disney_movies.append(row)
    elif type.startswith("tv"):
        row[1] = "TV Show"
        disney_shows.append(row)
    else:
        others.append(row)

### Explore newly created datasets

Explore all your new datasets created using your earlier created function. 

How many Movies and TV Shows are listed? 

Are there any other than Movies and TV Shows?

Expected outputs:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">

Disney Movies:  1052

Disney TV Shows:  398

Disney Others:  0

In [None]:
print("Disney Movies: ", len(disney_movies))
print("Disney TV Shows: ", len(disney_shows))
print("Disney Others: ", len(others))

In [None]:
explore_data(disney_movies,0,5,False)

In [None]:
explore_data(disney_shows,0,5,False)

## Step 4: Get a list of unique values in a column ##

### Function description and output overview

The function `list_of_elements` should print:
  - The list of unique values in any required column

The function will take the following as parameters:
1. `my_data_local`
2. `col_index_local`

For example, I want to get unique values in `ratings`column.
I will call this function and the output will be:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">
['TV-G', 'PG', 'TV-PG', '', 'PG-13', 'TV-14', 'G', 'TV-Y7', 'TV-Y', 'TV-Y7-FV']


### Pseudocode
Below is the Pseudocode for the function

def list_of_elements(Give the parameters along with default values):

  1. Create an empty list and assign it to `result_list`

  2. Write a for loop to iterate over each row element in `my_data_local`
      - If element in the row at `col_index_local`is not in the `result_list`, append it.
      
  3. Return `result_list`

### Function definition

In [11]:
def list_of_elements(my_data_local, col_index_local):
    """Function to generate list of unique values in given column
    
    Args:
        data and column location 

    Returns:
        unique values in given column location
        
    """
    result_list_l = []
    for row_l in my_data_local:
        if row_l[col_index_local] not in result_list_l:
            result_list_l.append(row_l[col_index_local])
    return result_list_l

### Calling function

What is the column index of `rating`column?

Use the above created function to generate the list of unique `rating` in the original dataset. 

In [None]:
print(list_of_elements(data,8))

In [None]:
#Another way of accessing the index of the column
print(list_of_elements(data, data_header.index("rating")))

## Step 5: For each corresponding unique value in a column get the number of movies and shows. ##

### Function description and output overview

The function `elements_count` should return:
  - The key value pair where
    - key is the unique element
    - value is the count of movies or TV shows or both with that unique element

The function will take the following as parameters:
1. `my_data_local`
2. `col_index_local`

For example, I want to get unique values and corresponding count in `ratings`column in `disney_movies`.
I will call this function on `disney_movies` and the output will be:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">
{'G': 253, 'PG': 235, 'TV-G': 233, 'TV-PG': 181, 'PG-13': 66, 'TV-14': 37, 'TV-Y7': 36, 'TV-Y7-FV': 7, 'TV-Y': 3, '': 1}


### Pseudocode
Below is the Pseudocode for the function

def elements_count(Give the parameters along with default values):

  1. Create an empty dictionary and assign it to `element_count_dict`

  2. Write a for loop to iterate over each row element in `my_data_local`
      - If element in the row at `col_index_local`is not in the key of `element_count_dict`, add it and initialize the value to 1.
      - If element in the row at `col_index_local`is in the `element_count_dict`, increment the value by 1.

  3. Sort the dictionary based on the corresponding values in descending order. 
      
  3. Return `element_count_dict`

In [14]:
def elements_count(my_data_local, col_index_local):
    """Function to generate list of unique ratings and count of movies/TV shows with those unique ratings in given column in descending order of count
    
    Args:
        data

    Returns:
        unique ratings along with count sorted in descending order
        
    """
    element_count_dict = {}
    for row_l in my_data_local:
        element_l = row_l[col_index_local]
        if element_l in element_count_dict:
            element_count_dict[element_l] += 1
        else:
            element_count_dict[element_l] = 1
    sorted_elements_count = dict(sorted(element_count_dict.items(), key=lambda item: item[1], reverse=True))
    return sorted_elements_count

### Calling function
Print the Movie Ratings Count and TV Shows Ratings Count. 

You will have to call above function separately on `disney_movies`and `disney_shows`

In [None]:
print("Moview Ratings Count:", elements_count(disney_movies,8))
print("Shows Ratings Count:", elements_count(disney_shows,8))

## Step 6: Get the list of categories for `listed_in` column ##

Get the list of unique values in `listed_in` column. 

It's now a one line code. 

You have to just re-use one of the earlier created functions. 

Could you identify which one?

In [None]:
print(list_of_elements(data,data_header.index("listed_in")))

## Step 7: Get the unique list of above categories and then the count in each category ##

### Function description and output overview

From the output of `step 6`, it is clear that there are multiple genres listed in single row.

So, a movie might be listed in animation as well as family.

The function `elements_count` should return:
  - The key value pair where
    - key is the unique element
    - value is the count of movies or TV shows or both with that unique element

The function will take the following as parameters:
1. `my_data_local`
2. `col_index_local`
3. `sep_local`which separates each items in one single cell. So if it is listed in 'animation, family', it should be considered in the count for both the genres

For example, I want to get unique values and corresponding count in `listed_in`column in `disney_movies`.
I will call this function on `disney_movies` and the output will be:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">

{'Family': 533, 'Comedy': 407, 'Animation': 381, 'Action-Adventure': 314, 'Documentary': 174, 'Fantasy': 158, 'Coming of Age': 153, 'Animals & Nature': 130, 'Drama': 121, 'Science Fiction': 76, 'Biographical': 41, 'Musical': 40, 'Kids': 39, 'Music': 38, 'Sports': 38, 'Historical': 38, 'Buddy': 24, 'Romance': 19, 'Superhero': 16, 'Crime': 16, 'Mystery': 8, 'Concert Film': 7, 'Variety': 7, 'Parody': 7, 'Anthology': 7, 'Dance': 6, 'Thriller': 5, 'Western': 5, 'Reality': 4, 'Lifestyle': 3, 'Movies': 3, 'Survival': 3, 'Spy/Espionage': 2, 'Romantic Comedy': 2, 'Medical': 2, 'Disaster': 2}

### Pseudocode
<span style="color:#fc5a37"> UPDATE </span> the Pseudocode below for the requirement above.

def elements_count(Give the parameters along with default values):

  1. Create an empty dictionary and assign it to `element_count_dict`

  2. Write a for loop to iterate over each row element in `my_data_local`
      - If element in the row at `col_index_local`is not in the key of `element_count_dict`, add it and initialize the value to 1.
      - If element in the row at `col_index_local`is in the `element_count_dict`, increment the value by 1.

  3. Sort the dictionary based on the corresponding values in descending order. 
      
  3. Return `element_count_dict`

### Function definition
Update the function definition below. Include DocString that would explain what this function does.

In [17]:
def elements_count(my_data_local, col_index_local, sep_local):
    """Function to generate list of unique ratings and count of movies/TV shows with those unique ratings in given column in descending order of count
    
    Args:
        data

    Returns:
        unique ratings along with count sorted in descending order
        
    """
    element_count_dict = {}
    for row_l in my_data_local:
        element_list_l = row_l[col_index_local].split(sep_local)
        for element_l in element_list_l:
            if element_l in element_count_dict:
                element_count_dict[element_l] += 1
            else:
                element_count_dict[element_l] = 1
    sorted_elements_count = dict(sorted(element_count_dict.items(), key=lambda item: item[1], reverse=True))
    return sorted_elements_count

### Calling Function
Use above created function to print the unique list of Genres which is in the column `listed_in`.

It should print this separately for `Movies` and `TV Shows`.

In [None]:
print("Moview Category Count:", elements_count(disney_movies, data_header.index("listed_in"), ", "))

In [None]:

print("Shows Category Count:", elements_count(disney_shows, 10, ", "))

The maximum movies are listed in Family followed by Comedy and animation.

The maximum Shows are listed in Animation followed by Action-Adventure and Docuseries.

## Step 8: What is the average duration of movies and shows ##

### Function description and output overview

You need to write two functions for this step:

1. The duration of movies is in minutes and shows in seasons.

    Define function `duration_conversion`that will remove the suffix <span style="color:#fc5a37"> minutes </span> and <span style="color:#fc5a37"> seasons </span> and convert them to numeric.

    This should take below parameters:

    a. my_data_local

    b. col_index_local

2. Create a function `average` that will create the average of all the values in a column in given dataset.

    This should take below parameters:

    a. my_data_local
    
    b. col_index_local

### Pseudocode

1. Write pseudocode for function `duration_conversion`that will delete any suffixes after the space.

2. Write pseudcode for function `average`

### Function definition
Update the function definition for `duration_conversion`and `average`.

In [20]:
def duration_conversion(my_data_local):
    """Removes Suffix and converts the value in integer. The column is hardcoded to location 9.

    Args:
        data

    Returns:
        updates the column by dropping the suffix and converting the values to integers
        
    """
    for row_l in my_data_local:
        new_values = []
        new_values = row_l[9].split(" ")
        row_l[9] = int(new_values[0])
    return my_data_local

In [21]:
def average(data_l,loc_l):
    """Calculates the average of the values in given column

    Args:
        data, column location

    Returns:
        average of all the values
        
    """
    sum_l = 0
    count_l = 0
    for row_l in data_l:
        sum_l += row_l[loc_l]
        count_l += 1
    return(sum_l/count_l)

### Calling functions
Use the newly created functions to calculate the average duration for movies. 

You will have to first call the function `duration_conversion` on disney_movies to convert the duration to numeric. 

Also, remember to store this in another data list `disney_movies_mod`. 

Now call your average function using this new data list to calculate the average duration of movies. 

The expected output is:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">
Average Duration of Movies in minutes:  71.9106463878327

In [22]:
disney_movies_mod = duration_conversion(disney_movies)

In [None]:
explore_data(disney_movies_mod,0,2,True)

In [None]:
print("Average Duration of Movies in minutes: ", average(disney_movies_mod, 9))

 Now Calculate the average duration for TV Shows. 
 
 The Expected output is:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">
 The average duration of TV shows is 2 seasons.

In [None]:
disney_show_mod = duration_conversion(disney_shows)
explore_data(disney_show_mod,0,2,True)

In [None]:
print("Average Duration of Shows in seasons: ", average(disney_show_mod, 9))

## (OPTIONAL) Step 9: Can you make your function more robust? ##

### Function description and output overview
Make a function `filter_data` to filter the data based on criteria in the input.

Use this function `filter_data` in `elements_count` to generate statistics for filtered data. 

For example, if your content manager asks you to provide the count of movies based on genre only in Germany. 

You start working on the function and after some time he wants the data only for the year 2004. 

With this ever changing filtering criteria, you decide to make your function handle this dynamic changing filtering list. 

You now accept the filtering criteria in a dictionary. Where keys are the column index and value are the respective filter values.

If I want to print the count of movies based on genre and without any filtering criteria. 

We need to just call the function like this:

print(items_count(disney_movies, 10, ',', {}))

The output should look like:

<style>
    .code_block {
        background-color: #ff795c;
        color: white;
    }
</style>

<div class="code_block">

{'Family': 533, 'Comedy': 407, 'Animation': 381, 'Action-Adventure': 314, 'Documentary': 174, 'Fantasy': 158, 'Coming of Age': 153, 'Animals & Nature': 130, 'Drama': 121, 'Science Fiction': 76, 'Biographical': 41, 'Musical': 40, 'Kids': 39, 'Music': 38, 'Sports': 38, 'Historical': 38, 'Buddy': 24, 'Romance': 19, 'Superhero': 16, 'Crime': 16, 'Mystery': 8, 'Concert Film': 7, 'Variety': 7, 'Parody': 7, 'Anthology': 7, 'Dance': 6, 'Thriller': 5, 'Western': 5, 'Reality': 4, 'Lifestyle': 3, 'Movies': 3, 'Survival': 3, 'Spy/Espionage': 2, 'Romantic Comedy': 2, 'Medical': 2, 'Disaster': 2}

In [27]:
def filter_data(row_ll, filters_ll):
    """Selects the row based on list of filter criteria

    Args:
        single row, list of filter criteria in dictionary format where key is the column location and value is the desired value for that column

    Returns:
        True or False based on if it qualifies for the given filter criteria
        
    """
    include_row_ll = True
    for loc_filter_ll, val_filter_ll in filters_ll.items():
        if ',' in row_ll[loc_filter_ll]:
            parts_ll = [part.strip() for part in row_ll[loc_filter_ll].split(',')]
            if val_filter_ll not in parts_ll:
                include_row_ll = False
                break
        else:
            if row_ll[loc_filter_ll] != val_filter_ll:
                include_row_ll = False
                break
    return include_row_ll

In [28]:
def items_count(dataset_l, loc_l, sep_l, filters_l):
    """Give the list of unique values with it's count sorted in descending order. The column might contain the list of values separated by separator. Also, you can specify any number of filtering criteria in the form of dictionary where key would be the column location and value would be filtering value.

    Args:
        dataset, column location, separator for values in given column, filter criteria dictionary
    Returns:
        Unique Values along with it's count in descending order
        
    """
    items_count_l = {}
    for row_l in dataset_l:
        if filters_l != {}:
            include_row_l = filter_data(row_l, filters_l)
        else:
            include_row_l = True
        if include_row_l:
            items_list_l = row_l[loc_l].split(sep_l)
            for item_l in items_list_l:
                if item_l.strip() in items_count_l:
                    items_count_l[item_l.strip()] += 1
                else:
                    items_count_l[item_l.strip()] = 1
    sorted_items_count = dict(sorted(items_count_l.items(), key=lambda item: item[1], reverse=True))
    return sorted_items_count

Print the count of movies based on genre and without any filtering criteria

In [None]:
print(items_count(disney_movies, 10, ',', {}))

Now, print the count of movies based on genre and only for Germany

In [None]:
print(items_count(disney_movies, 10, ',', {5: 'Germany'}))

Now, print the count of movies based on genre and only for Germany and that too only for year 2004

In [None]:
print(items_count(disney_movies, 10, ',', {5: 'Germany', 7: '2004'}))

## Step 9: Now let's create an interactive tool for the content manage ##

For this, create a new folder `Ànalysis_tool`. Make different functions as modules. For example ` explore_data.py`. In the main file `analysis.py`, import all the modules. Code your interactive tool after ` if __name__ == "__main__": ` The tool should be interactive in the following way:

1. Ask if the combined data has to be analyzed or movies or TV shows has to be analyzed.
2. Create options based on your created functions. Ask interactively to give filter options or the column options on which it wants the unique list. Be innovative.
3. Make it repetitive until user wants to exist. 

Once you are done, execute this code using terminal by giving `Python analysis.py`

Your tool might look like this:
````
To analyze combined dataset type c 
To analyze only movies type m 
To analyze only TV shows type t  
To quit q 
Enter the option!: 
````
If you give option 'm', it will analyze only movies. Next it will ask:
```
To explore the dataset type e 
To list the unique values in a column type u 
To count number of movies/TV shows for each unique values type c 
To find the average value in a column type a 
Enter the option!: 
````
If you give option c, it will give the count of unique values in asked column based on given filter criteria.
```
Which column:
0: show_id
1: type
2: title
3: director
4: cast
5: country
6: date_added
7: release_year
8: rating
9: duration
10: listed_in
11: description
Enter the number as per the list above: 
````
If you give 10 and then in separator as ',', it will ask you for filter criteria
````
Enter the number as per the list above: 10
Enter the separator in case of multiple values: ,
Are there filter criteria yes/no: 
````
If you want to give filter criteria, say 'yes' and it will ask for column and value on which to filter
````
Which column:
0: show_id
1: type
2: title
3: director
4: cast
5: country
6: date_added
7: release_year
8: rating
9: duration
10: listed_in
11: description
Enter column to filter as per above list: 
````
Say, you want to filter on country.
````
['', 'United States', 'Australia', 'Canada', 'United Kingdom', 'Ireland', 'Denmark', 'Spain', 'Poland', 'Hungary', 'Germany', 'Singapore', 'Thailand', 'Brazil', 'Belgium', 'Austria', 'South Africa', 'Japan', 'France', 'Hong Kong', 'United Arab Emirates', 'Mexico', 'Switzerland', 'India', 'China', 'Tanzania', 'Panama', 'Angola', 'Botswana', 'Namibia', 'South Korea', 'Russia', 'Malaysia', 'Kazakhstan', 'Taiwan', 'Syria', 'Iran', 'Egypt', 'Pakistan', 'New Zealand', 'Norway', 'Sweden', 'Slovenia', 'Czech Republic']
What value you want to filter: 
````
Now give the value of country. Say, Germany.
````
Are there more filter criteria yes/no: 
````
Now, let's say you want to filter on release_year
````
Which column:
0: show_id
1: type
2: title
3: director
4: cast
5: country
6: date_added
7: release_year
8: rating
9: duration
10: listed_in
11: description
Enter column to filter as per above list: 
````
After giving 7:
````
['2016', '1988', '2011', '2021', '2008', '2020', '2007', '2014', '2015', '2013', '2012', '2006', '2010', '1996', '2009', '1993', '1994', '2019', '2017', '2005', '2000', '2004', '1987', '1967', '1991', '1956', '1995', '1984', '1985', '1974', '1959', '2003', '1997', '2018', '2001', '1990', '1992', '1998', '1952', '1955', '1977', '1957', '1999', '1989', '1948', '1964', '1969', '1942', '1950', '1951', '1953', '1949', '1940', '1946', '1954', '1936', '1944', '1935', '1939', '1975', '1978', '1971', '1961', '2002', '1962', '1981', '1932', '1938', '1941', '1986', '1947', '1937', '1966', '1943', '1934', '1976', '1980', '1960', '1983', '1973', '1972', '1928', '1965', '1979', '1970', '1963', '1933', '1945', '1982', '1968']
What value you want to filter: 
````
Let's get the data for year 2004.
```
Are there more filter criteria yes/no: 
````
That's all. No more filter criteria. So it should return:
````
{'Comedy': 2, 'Action-Adventure': 1, 'Family': 1, 'Coming of Age': 1}

To analyze combined dataset type c 
To analyze only movies type m 
To analyze only TV shows type t  
To quit q 
Enter the option!:
````
