<div style="text-align: center;">
    <div style="background-color:#000000; padding-top:25px"> 
        <img src="netflix-logo.png" style="max-height:150px;" />
        <p style="color:#FFFFFF;
                   font-size:30px; 
                   font-weight:700; 
                   text-align: center; 
                   text-transform:uppercase;
                   letter-spacing:1px; 
                   line-height:1.2; 
                   padding-bottom:25px;">Exploratory Data Analysis</p>
    </div>
</div>

# Contents ⬇️ <a id='contents'></a>

[1. Contents ⬇️](#contents)   
[2. Introduction 📓](#introduction)  
[3. Project Goal 🎯](#project_goal)  
[4. Data Analysis 📊](#data-analysis)    
- [4.1 Initialization](#initialization)  
- [4.2 Load data](#load-data)  
- [4.3 Prepare the data](#prepare-the-data)    

# Introduction 📓 <a id='introduction'></a> 
[Back to Contents](#contents)

**Netflix Inc.** is an American media company based in Los Gatos, California. Founded in 1997 by Reed Hastings and Marc Randolph in Scotts Valley, California, it operates the **over-the-top subscription video on-demand service** Netflix brand, which includes **original films and television series commissioned or acquired by the company**, and third-party content licensed from other distributors.   

They have over 8000 movies or TV shows available on their platform, as of mid-2021, they have over 200M subscribers globally. This dataset has been taken from [Kaggle](https://www.kaggle.com/datasets/shivamb/netflix-shows) and consists of listings of all the movies and tv shows available on Netflix. This dataset was last updated in 2021.  

The dataset consists of 12 columns:
1. `show_id`: Unique ID for every Movie / TV show
2. `type`: Identifier - A Movie or TV Show
3. `title`: Title of the Movie / TV show
4. `director`: Director of the Movie
5. `cast`: Actors involved in the movie / TV show
6. `country`: Country where the movie / TV show was produced
7. `date_added`: Date it was added on Netflix
8. `release_year`: Actual Release year of the move / TV show
9. `rating`: TV Rating of the movie / TV show
10. `duration`: Total Duration - in minutes or number of seasons
11. `listed_in`: Genre of the movie / TV show
12. `description`: The summary description

# Project Goal 🎯 <a id='project_goal'></a>  
[Back to Contents](#contents)

# Data Analysis 📊 <a id='data-analysis'></a>  
[Back to Contents](#contents)

## Initialization <a id='initialization'></a>  
[Back to Contents](#contents)

To begin with, we need a few libraries for our statistical analysis - `pandas`, `matplotlib`, and `plotly.express`. We'll import all of them so that we can use the functions or methodds provided by them in our analysis:  

1. **Pandas**: It is a data manipulation library that provides functions to read, write and manipulate data in various formats.  

2. **Matplotlib**: It is a plotting library that is used to visualize data in various formats.  

3. **Plotly Express**: Plotly Express is a terse, consistent, high-level API for creating figures.

In [1]:
# Loading all the libraries
import pandas as pd
from matplotlib import pyplot as plt
from plotly import express as px

## Load data <a id='load-data'></a>  
[Back to Contents](#contents)

We have been data in `netflix_data.csv` file in the root directory in `csv` format with comma as field separator. We need to read the data file and load the data into DataFrame using `read_csv()` method provided by `pandas`.

In [2]:
# Load the data file into DataFrame
netflix_data = pd.read_csv('../netflix_data.csv')

## Prepare the data <a id='prepare-the-data'></a>  
[Back to Contents](#contents)

Let's get the general information of the data in the DataFrame - `netflix_data`:

In [3]:
# Print the general/summary information about the netflix_data DataFrame
netflix_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       8807 non-null   object
 1   type          8807 non-null   object
 2   title         8807 non-null   object
 3   director      6173 non-null   object
 4   cast          7982 non-null   object
 5   country       7976 non-null   object
 6   date_added    8797 non-null   object
 7   release_year  8807 non-null   int64 
 8   rating        8803 non-null   object
 9   duration      8804 non-null   object
 10  listed_in     8807 non-null   object
 11  description   8807 non-null   object
dtypes: int64(1), object(11)
memory usage: 825.8+ KB


The Dataframe - `netflix_data` has a total of **8807 rows and 12 columns**.

The **columns description** are as follows:
1. `show_id`: Unique ID for every Movie / TV show
2. `type`: Identifier - A Movie or TV Show
3. `title`: Title of the Movie / TV show
4. `director`: Director of the Movie
5. `cast`: Actors involved in the movie / TV show
6. `country`: Country where the movie / TV show was produced
7. `date_added`: Date it was added on Netflix
8. `release_year`: Actual Release year of the move / TV show
9. `rating`: TV Rating of the movie / TV show
10. `duration`: Total Duration - in minutes or number of seasons
11. `listed_in`: Genre of the movie / TV show
12. `description`: The summary description

There are some null values in `director`, `cast`, `country`, `date_added`, `rating` and `duration`. There are also some mismatch in data types of the columns that we'll address in a while.

**Let's get a sample of 10 rows from the Dataframe**:

In [4]:
# Print a sample of data for netflix_data
netflix_data.sample(n=10, random_state=100)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
3695,s3696,TV Show,Rabbids Invasion,,"Damien Laquet, David Gasman, Barbara Scaff",France,"July 1, 2019",2018,TV-Y7,1 Season,"Kids' TV, TV Comedies","Giant, rambunctious rabbits have invaded and a..."
7338,s7339,TV Show,Los 10 años de Peter Capusotto,,"Diego Capusotto, Ivana Acosta",,"June 29, 2018",2015,TV-MA,1 Season,"International TV Shows, Spanish-Language TV Sh...",Fictional personality Peter Capusotto parodies...
6939,s6940,Movie,Haunting on Fraternity Row,Brant Sersen,"Jacob Artist, Jayson Blair, Shanley Caswell, C...",United States,"March 1, 2019",2018,TV-MA,99 min,"Horror Movies, Independent Movies",When a fraternity's last big luau serves up ho...
6626,s6627,Movie,Don't Be Afraid of the Dark,Troy Nixey,"Katie Holmes, Guy Pearce, Bailee Madison, Jack...","United States, Australia, Mexico","November 2, 2019",2010,R,99 min,Horror Movies,Young Sally Hurst discovers she isn't alone in...
3092,s3093,TV Show,Kevin Hart: Don’t F**k This Up,,,United States,"December 27, 2019",2019,TV-MA,1 Season,Docuseries,"Amid turmoil in his career and marriage, comed..."
5696,s5697,Movie,1000 Rupee Note,Shrihari Sathe,"Usha Naik, Sandeep Pathak, Shrikant Yadav, Gan...",India,"December 1, 2016",2014,TV-14,89 min,"Dramas, International Movies",After randomly receiving a handsome political ...
6472,s6473,Movie,Chittagong,Bedabrata Pain,"Manoj Bajpayee, Barry John, Delzad Hiwale, Veg...","United States, India, Bangladesh","January 1, 2018",2012,NR,105 min,"Dramas, Independent Movies, International Movies",In the turbulent 1930s of British colonial Ind...
4922,s4923,Movie,Mercury 13,"David Sington, Heather Walsh",,United States,"April 20, 2018",2018,TV-PG,79 min,Documentaries,"After rigorous testing in 1961, a small group ..."
4180,s4181,Movie,Old Lord Savanna,André D'Elia,Leonardo Ribeiro,Brazil,"January 18, 2019",2018,TV-14,96 min,"Documentaries, International Movies",This documentary captures the environmental an...
684,s685,TV Show,Elite,,"Danna Paola, Miguel Herrán, María Pedraza, Itz...",Spain,"June 18, 2021",2021,TV-MA,4 Seasons,"Crime TV Shows, International TV Shows, Spanis...",When three working-class teens enroll in an ex...


If we see the data above, we could say that `date_added` and `duration` columns should have correct data types for better analysis. We'll fix that in a while.

We can use the `duplicated()` method together with `sum()` to **check if we have any duplicate rows in the DataFrame - `netflix_data`**. `duplicated()` method returns a boolean Series (True/False) denoting duplicate rows. So, we could apply `sum()` over that series to get a summation of all the True(s) - and False(s).

In [5]:
# Checking for duplicated records in netflix data
netflix_data.duplicated().sum()

0

**We don't have any duplicate rows in the `netflix_data` Dataframe**.  

Let's check for just duplicate show IDs using `duplicated()` method together with `sum()`. Since, this time we want to find out if we have any duplicate show IDs, we will first get a Series of data for `show_id` column and then, apply `duplicated()` method along with `sum()` on it.

In [6]:
# Checking for just duplicate show IDs
netflix_data['show_id'].duplicated().sum()

0

**We don't have any duplicate show IDs in the `netflix_data` Dataframe**.

Let's verify that the `type` column contains values for only the two type of shows - **Movie** and **TV Show**:

In [7]:
# Check type column contains only two values - Movie and TV Show
netflix_data['type'].unique()

array(['Movie', 'TV Show'], dtype=object)

**The `type` column has correct values and contains values for only the two type of shows - Movie and TV Show**.

Let's verify that the `title` column doesn't have any duplicates:

In [8]:
netflix_data['title'].duplicated().sum()

0

Great! So, we don't have any duplicate movie titles in the dataset.

### Fix Data

We know from before that there are some null values in `director`, `cast`, `country`, `date_added`, `rating` and `duration` columns. 

**Let's see how many missing values do we have, by using `isna().sum()` in `director` column.** 

The `isna()` method returns a Series (or Dataframe) containing Boolean values - True or False, indicating whether the values in the cell is missing or not. The missing values that are considered by the `isna()` method are - `NaN` in numeric arrays, `None` or `NaN` in object arrays, `NaT` in datetimelike. It doesn't take any other user-defined values into account.

The `sum()` method adds up all the Boolean values by `isna()` method and gives us the total count of missing values.

In [9]:
# Counting the number of missing values in the director column
netflix_data['director'].isna().sum()

2634

**There are in total 2634 missing values in the `director` column**.  

Let's print the records where `director` column has missing values to peek into the data:

In [10]:
netflix_data[netflix_data['director'].isna()].head(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
10,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",,,,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, Docuseries, International TV S...","Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
14,s15,TV Show,Crime Stories: India Detectives,,,,"September 22, 2021",2021,TV-MA,1 Season,"British TV Shows, Crime TV Shows, Docuseries",Cameras following Bengaluru police on the job ...
15,s16,TV Show,Dear White People,,"Logan Browning, Brandon P. Bell, DeRon Horton,...",United States,"September 22, 2021",2021,TV-MA,4 Seasons,"TV Comedies, TV Dramas",Students of color navigate the daily slights a...
17,s18,TV Show,Falsa identidad,,"Luis Ernesto Franco, Camila Sodi, Sergio Goyri...",Mexico,"September 22, 2021",2020,TV-MA,2 Seasons,"Crime TV Shows, Spanish-Language TV Shows, TV ...",Strangers Diego and Isabel flee their home in ...
19,s20,TV Show,Jaguar,,"Blanca Suárez, Iván Marcos, Óscar Casas, Adriá...",,"September 22, 2021",2021,TV-MA,1 Season,"International TV Shows, Spanish-Language TV Sh...","In the 1960s, a Holocaust survivor joins a gro..."
21,s22,TV Show,Resurrection: Ertugrul,,"Engin Altan Düzyatan, Serdar Gökhan, Hülya Dar...",Turkey,"September 22, 2021",2018,TV-14,5 Seasons,"International TV Shows, TV Action & Adventure,...",When a good deed unwittingly endangers his cla...
25,s26,TV Show,Love on the Spectrum,,Brooke Satchwell,Australia,"September 21, 2021",2021,TV-14,2 Seasons,"Docuseries, International TV Shows, Reality TV",Finding love can be hard for anyone. For young...


Since, the records where `director` column has missing values, has useful information, we can't just drop the records. So, we will replace the missing values in `director` column with `Unknown`:

In [11]:
# Filling missing values in director column with 'Unknown'
netflix_data['director'] = netflix_data['director'].fillna('Unknown')

Let's count the number of missing values in the `director` column once again:

In [12]:
# Counting the number of missing values in the director column
netflix_data['director'].isna().sum()

0

Awesome! Let's see how many missing values do we have, by using `isna().sum()` in `cast` column.

In [13]:
# Counting the number of missing values in the cast column
netflix_data['cast'].isna().sum()

825

**There are in total 825 missing values in the `cast` column**.  

Let's print the records where `cast` column has missing values to peek into the data:

In [14]:
netflix_data[netflix_data['cast'].isna()].head(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
3,s4,TV Show,Jailbirds New Orleans,Unknown,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
10,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",Unknown,,,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, Docuseries, International TV S...","Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
14,s15,TV Show,Crime Stories: India Detectives,Unknown,,,"September 22, 2021",2021,TV-MA,1 Season,"British TV Shows, Crime TV Shows, Docuseries",Cameras following Bengaluru police on the job ...
16,s17,Movie,Europe's Most Dangerous Man: Otto Skorzeny in ...,"Pedro de Echave García, Pablo Azorín Williams",,,"September 22, 2021",2020,TV-MA,67 min,"Documentaries, International Movies",Declassified documents reveal the post-WWII li...
20,s21,TV Show,Monsters Inside: The 24 Faces of Billy Milligan,Olivier Megaton,,,"September 22, 2021",2021,TV-14,1 Season,"Crime TV Shows, Docuseries, International TV S...","In the late 1970s, an accused serial rapist cl..."
45,s46,Movie,My Heroes Were Cowboys,Tyler Greco,,,"September 16, 2021",2021,PG,23 min,Documentaries,Robin Wiltshire's painful childhood was rescue...
66,s67,TV Show,Raja Rasoi Aur Anya Kahaniyan,Unknown,,India,"September 15, 2021",2014,TV-G,1 Season,"Docuseries, International TV Shows",Explore the history and flavors of regional In...
69,s70,TV Show,Stories by Rabindranath Tagore,Unknown,,India,"September 15, 2021",2015,TV-PG,1 Season,"International TV Shows, TV Dramas",The writings of Nobel Prize winner Rabindranat...
74,s75,TV Show,The World's Most Amazing Vacation Rentals,Unknown,,,"September 14, 2021",2021,TV-PG,2 Seasons,Reality TV,"With an eye for every budget, three travelers ..."


Since, the records where `cast` column has missing values, has useful information, we can't just drop the records. So, we will replace the missing values in `cast` column with `Unknown`:

In [15]:
# Filling missing values in cast column with 'Unknown'
netflix_data['cast'] = netflix_data['cast'].fillna('Unknown')

Let's count the number of missing values in the `cast` column once again:

In [16]:
# Counting the number of missing values in the cast column
netflix_data['cast'].isna().sum()

0

Awesome! Let's see how many missing values do we have, by using `isna().sum()` in `country` column.

In [17]:
# Counting the number of missing values in the country column
netflix_data['country'].isna().sum()

831

**There are in total 831 missing values in the `country` column**.  

Let's print the records where `country` column has missing values to peek into the data:

In [18]:
netflix_data[netflix_data['country'].isna()].head(10)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,Unknown,Unknown,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
5,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...
6,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...
10,s11,TV Show,"Vendetta: Truth, Lies and The Mafia",Unknown,Unknown,,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, Docuseries, International TV S...","Sicily boasts a bold ""Anti-Mafia"" coalition. B..."
11,s12,TV Show,Bangkok Breaking,Kongkiat Komesiri,"Sukollawat Kanarot, Sushar Manaying, Pavarit M...",,"September 23, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...","Struggling to earn a living in Bangkok, a man ..."
13,s14,Movie,Confessions of an Invisible Girl,Bruno Garotti,"Klara Castanho, Lucca Picon, Júlia Gomes, Marc...",,"September 22, 2021",2021,TV-PG,91 min,"Children & Family Movies, Comedies",When the clever but socially-awkward Tetê join...
14,s15,TV Show,Crime Stories: India Detectives,Unknown,Unknown,,"September 22, 2021",2021,TV-MA,1 Season,"British TV Shows, Crime TV Shows, Docuseries",Cameras following Bengaluru police on the job ...
16,s17,Movie,Europe's Most Dangerous Man: Otto Skorzeny in ...,"Pedro de Echave García, Pablo Azorín Williams",Unknown,,"September 22, 2021",2020,TV-MA,67 min,"Documentaries, International Movies",Declassified documents reveal the post-WWII li...
18,s19,Movie,Intrusion,Adam Salky,"Freida Pinto, Logan Marshall-Green, Robert Joh...",,"September 22, 2021",2021,TV-14,94 min,Thrillers,After a deadly home invasion at a couple’s new...


Since, the records where `country` column has missing values, has useful information, we can't just drop the records. So, we will replace the missing values in `country` column with `Unknown`:

In [19]:
# Filling missing values in country column with 'Unknown'
netflix_data['country'] = netflix_data['country'].fillna('Unknown')

Let's count the number of missing values in the `country` column once again:

In [20]:
# Counting the number of missing values in the country column
netflix_data['country'].isna().sum()

0

Awesome! Let's see how many missing values do we have, by using `isna().sum()` in `rating` column.

In [21]:
# Counting the number of missing values in the rating column
netflix_data['rating'].isna().sum()

4

**There are in total 4 missing values in the `rating` column**.  

Let's print the records where `rating` column has missing values to peek into the data:

In [22]:
netflix_data[netflix_data['rating'].isna()]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
5989,s5990,Movie,13TH: A Conversation with Oprah Winfrey & Ava ...,Unknown,"Oprah Winfrey, Ava DuVernay",Unknown,"January 26, 2017",2017,,37 min,Movies,Oprah Winfrey sits down with director Ava DuVe...
6827,s6828,TV Show,Gargantia on the Verdurous Planet,Unknown,"Kaito Ishikawa, Hisako Kanemoto, Ai Kayano, Ka...",Japan,"December 1, 2016",2013,,1 Season,"Anime Series, International TV Shows","After falling through a wormhole, a space-dwel..."
7312,s7313,TV Show,Little Lunch,Unknown,"Flynn Curry, Olivia Deeble, Madison Lu, Oisín ...",Australia,"February 1, 2018",2015,,1 Season,"Kids' TV, TV Comedies","Adopting a child's perspective, this show take..."
7537,s7538,Movie,My Honor Was Loyalty,Alessandro Pepe,"Leone Frisa, Paolo Vaccarino, Francesco Miglio...",Italy,"March 1, 2017",2015,,115 min,Dramas,"Amid the chaos and horror of World War II, a c..."


Since, the records where `rating` column has missing values, has useful information, we can't just drop the records. So, we will replace the missing values in `rating` column with `Unknown`:

In [23]:
# Filling missing values in rating column with 'Unknown'
netflix_data['rating'] = netflix_data['rating'].fillna('Unknown')

Let's count the number of missing values in the `rating` column once again:

In [24]:
# Counting the number of missing values in the rating column
netflix_data['rating'].isna().sum()

0

Let's see what are the set of values for `rating` column in the Dataframe - `netflix_data` to validate that they have meaningful values:

In [25]:
netflix_data['rating'].unique()

array(['PG-13', 'TV-MA', 'PG', 'TV-14', 'TV-PG', 'TV-Y', 'TV-Y7', 'R',
       'TV-G', 'G', 'NC-17', '74 min', '84 min', '66 min', 'NR',
       'Unknown', 'TV-Y7-FV', 'UR'], dtype=object)

It seems all the values are valid except the ones with ratings - `74 min`, `84 min` and `66 min`. These values doesn't seem to be valid values for ratings. Let's inspect the record where we have either of these three ratings:

In [26]:
netflix_data[(netflix_data['rating'] == '74 min') | (netflix_data['rating'] == '84 min') | (netflix_data['rating'] == '66 min') ]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
5541,s5542,Movie,Louis C.K. 2017,Louis C.K.,Louis C.K.,United States,"April 4, 2017",2017,74 min,,Movies,"Louis C.K. muses on religion, eternal love, gi..."
5794,s5795,Movie,Louis C.K.: Hilarious,Louis C.K.,Louis C.K.,United States,"September 16, 2016",2010,84 min,,Movies,Emmy-winning comedy writer Louis C.K. brings h...
5813,s5814,Movie,Louis C.K.: Live at the Comedy Store,Louis C.K.,Louis C.K.,United States,"August 15, 2016",2015,66 min,,Movies,The comic puts his trademark hilarious/thought...


It seems that the values for the `rating` is supposed to be the values for the `duration` in the Dataframe - `netflix_data` for these 3 records. Let's fix that first:

In [27]:
# Fix row one with show_id - s5542
netflix_data.loc[netflix_data['show_id'] == 's5542', 'duration'] = '74 min'
netflix_data.loc[netflix_data['show_id'] == 's5542', 'rating'] = 'Unknown'
netflix_data[netflix_data['show_id'] == 's5542']

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
5541,s5542,Movie,Louis C.K. 2017,Louis C.K.,Louis C.K.,United States,"April 4, 2017",2017,Unknown,74 min,Movies,"Louis C.K. muses on religion, eternal love, gi..."


In [28]:
# Fix row one with show_id - s5795
netflix_data.loc[netflix_data['show_id'] == 's5795', 'duration'] = '84 min'
netflix_data.loc[netflix_data['show_id'] == 's5795', 'rating'] = 'Unknown'
netflix_data[netflix_data['show_id'] == 's5795']

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
5794,s5795,Movie,Louis C.K.: Hilarious,Louis C.K.,Louis C.K.,United States,"September 16, 2016",2010,Unknown,84 min,Movies,Emmy-winning comedy writer Louis C.K. brings h...


In [29]:
# Fix row one with show_id - s5814
netflix_data.loc[netflix_data['show_id'] == 's5814', 'duration'] = '66 min'
netflix_data.loc[netflix_data['show_id'] == 's5814', 'rating'] = 'Unknown'
netflix_data[netflix_data['show_id'] == 's5814']

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
5813,s5814,Movie,Louis C.K.: Live at the Comedy Store,Louis C.K.,Louis C.K.,United States,"August 15, 2016",2015,Unknown,66 min,Movies,The comic puts his trademark hilarious/thought...


Awesome! Let's now convert the datatype of `date_added` column from **String** to **DateTime** of the format - `YYYY-MM-DD`:

In [30]:
# Convert date_added to datetime format
netflix_data['date_added'] = pd.to_datetime(netflix_data['date_added'].str.strip())

Let's have a peek into the data after changing the datatype of `date_added` column:

In [31]:
netflix_data.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,Unknown,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,Unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",Unknown,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,Unknown,Unknown,Unknown,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,Unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


Let's also derive a new column - `year_added` from `date_added`:

In [32]:
netflix_data['year_added'] = netflix_data['date_added'].dt.strftime('%Y')

Let's see what are the years we have in `year_added` column:

In [33]:
netflix_data['year_added'].unique()

array(['2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014',
       '2013', '2012', '2011', '2009', '2008', nan, '2010'], dtype=object)

We've missing values too in the `year_added` column. Let's replace it with `Unknown`:

In [34]:
# Filling missing values in year_added column with 'Unknown'
netflix_data['year_added'] = netflix_data['year_added'].fillna('Unknown')

Let's see again all the years we have in `year_added` column:

In [35]:
netflix_data['year_added'].unique()

array(['2021', '2020', '2019', '2018', '2017', '2016', '2015', '2014',
       '2013', '2012', '2011', '2009', '2008', 'Unknown', '2010'],
      dtype=object)

Now, before proceeding let's have a fresh look into our data in `netflix_data`:

In [36]:
# Print a sample of data for netflix_data
netflix_data.sample(n=10, random_state=100)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added
3695,s3696,TV Show,Rabbids Invasion,Unknown,"Damien Laquet, David Gasman, Barbara Scaff",France,2019-07-01,2018,TV-Y7,1 Season,"Kids' TV, TV Comedies","Giant, rambunctious rabbits have invaded and a...",2019
7338,s7339,TV Show,Los 10 años de Peter Capusotto,Unknown,"Diego Capusotto, Ivana Acosta",Unknown,2018-06-29,2015,TV-MA,1 Season,"International TV Shows, Spanish-Language TV Sh...",Fictional personality Peter Capusotto parodies...,2018
6939,s6940,Movie,Haunting on Fraternity Row,Brant Sersen,"Jacob Artist, Jayson Blair, Shanley Caswell, C...",United States,2019-03-01,2018,TV-MA,99 min,"Horror Movies, Independent Movies",When a fraternity's last big luau serves up ho...,2019
6626,s6627,Movie,Don't Be Afraid of the Dark,Troy Nixey,"Katie Holmes, Guy Pearce, Bailee Madison, Jack...","United States, Australia, Mexico",2019-11-02,2010,R,99 min,Horror Movies,Young Sally Hurst discovers she isn't alone in...,2019
3092,s3093,TV Show,Kevin Hart: Don’t F**k This Up,Unknown,Unknown,United States,2019-12-27,2019,TV-MA,1 Season,Docuseries,"Amid turmoil in his career and marriage, comed...",2019
5696,s5697,Movie,1000 Rupee Note,Shrihari Sathe,"Usha Naik, Sandeep Pathak, Shrikant Yadav, Gan...",India,2016-12-01,2014,TV-14,89 min,"Dramas, International Movies",After randomly receiving a handsome political ...,2016
6472,s6473,Movie,Chittagong,Bedabrata Pain,"Manoj Bajpayee, Barry John, Delzad Hiwale, Veg...","United States, India, Bangladesh",2018-01-01,2012,NR,105 min,"Dramas, Independent Movies, International Movies",In the turbulent 1930s of British colonial Ind...,2018
4922,s4923,Movie,Mercury 13,"David Sington, Heather Walsh",Unknown,United States,2018-04-20,2018,TV-PG,79 min,Documentaries,"After rigorous testing in 1961, a small group ...",2018
4180,s4181,Movie,Old Lord Savanna,André D'Elia,Leonardo Ribeiro,Brazil,2019-01-18,2018,TV-14,96 min,"Documentaries, International Movies",This documentary captures the environmental an...,2019
684,s685,TV Show,Elite,Unknown,"Danna Paola, Miguel Herrán, María Pedraza, Itz...",Spain,2021-06-18,2021,TV-MA,4 Seasons,"Crime TV Shows, International TV Shows, Spanis...",When three working-class teens enroll in an ex...,2021


Let's see if we have any missing values in `duration` column in `movies_data` dataframe: 

In [37]:
netflix_data['duration'].isna().sum()

0

We don't have any missing values in the `duration` column.

Awesome! So, now our data is ready for analysis. Let's save the above clean data into seperate file:

In [38]:
netflix_data.to_csv('../netflix_clean_data.csv', index=False)

## Overall trend of content available on Netflix  <a id='overall-trend-of-content'></a>  
[Back to Contents](#contents)

Let's create a histogram to visualize the distribution of movies and TV shows based on their release year. This will help us understand the overall trend of content available on Netflix and identify any patterns or changes over time.

In [39]:
fig = px.histogram(
    netflix_data,
    title='Distribution of Movies and TV shows based on their release year',
    x='release_year',
    color="type",
    barmode='overlay')
fig.update_layout(yaxis_title="Number of Releases")
fig.update_layout(xaxis_title="Year of Release")
fig.show()

Let's also compare the above histogram with the number of movies released per `release_year` for both the `type`s:

In [40]:
netflix_year_wise_distribution = netflix_data.pivot_table(index='release_year', columns='type', aggfunc='count', values='show_id')
netflix_year_wise_distribution = netflix_year_wise_distribution.reset_index()
netflix_year_wise_distribution.columns = ['release_year', 'no_of_movies', 'no_of_tv_shows']

In [41]:
netflix_year_wise_distribution.head(20)

Unnamed: 0,release_year,no_of_movies,no_of_tv_shows
0,1925,,1.0
1,1942,2.0,
2,1943,3.0,
3,1944,3.0,
4,1945,3.0,1.0
5,1946,1.0,1.0
6,1947,1.0,
7,1954,2.0,
8,1955,3.0,
9,1956,2.0,


In [42]:
netflix_year_wise_distribution.tail(20)

Unnamed: 0,release_year,no_of_movies,no_of_tv_shows
54,2002,44.0,7.0
55,2003,51.0,10.0
56,2004,55.0,9.0
57,2005,67.0,13.0
58,2006,82.0,14.0
59,2007,74.0,14.0
60,2008,113.0,23.0
61,2009,118.0,34.0
62,2010,154.0,40.0
63,2011,145.0,40.0


Based on the data above and the insights from the histogram, **we can make several observations about the overall trend of content available on Netflix and identify some patterns or changes over time**:

1. **Increase in Content**: **The number of movies and TV shows available on Netflix has shown a significant increase over the years**. The early years (1925-1967) had a relatively smaller number of releases, but from the early 2000s onwards, there has been a substantial growth in the content offerings.

2. **Shifting Focus to TV Shows**: Initially, Netflix primarily focused on movies, as seen in the early years of the dataset where there were more movies than TV shows. However, starting from around 2013, the number of TV shows being added to Netflix started to increase rapidly.

3. **Steady Growth**: From the early 2000s to 2017, there was a consistent growth in both movie and TV show releases. The number of additions per year gradually increased, reflecting Netflix's expansion and efforts to diversify its content library.

4. **Plateauing of Movie Releases**: While the number of TV show releases has continued to rise, the number of movies added to Netflix seems to have plateaued since around 2017. **This suggests that Netflix has been focusing more on original TV show productions and acquiring TV show licenses, possibly due to their popularity and binge-watching nature**.

5. **Impact of the COVID-19 Pandemic**: Looking at the years 2020 and 2021, there is a slight decrease in the number of movie releases compared to previous years. **This could be attributed to the disruption caused by the COVID-19 pandemic, which may have affected the production and release schedules of movies**.

6. The maximum number of TV Shows were released in the year 2020 - 436 shows and the maximum number of movies were released in the years 2017 and 2018 - 767 movies.

Overall, the data indicates a significant shift in the content available on Netflix over time, with a emphasis on TV shows and a consistent growth in the number of offerings. Understanding these trends can help Netflix make informed decisions about their content acquisition and production strategies and cater to the evolving preferences of their global subscriber base.

## Study popularity of different rating categories among Netflix subscribers  <a id='study-popularity'></a>  
[Back to Contents](#contents)

 Let's plot a histogram to analyze the distribution of movies and TV shows based on their ratings. This can help us understand the popularity of different rating categories among Netflix subscribers.

In [43]:
fig = px.histogram(
    netflix_data,
    title='Distribution of movies and TV shows based on their ratings',
    x='rating',
    color="type",
    barmode='overlay')
fig.update_layout(yaxis_title="Number of movies / TV Shows")
fig.update_layout(xaxis_title="TV Ratings")
fig.show()

Let's also compare the above histogram with the number of movies released per `rating` for both the `type`s:

In [44]:
netflix_rating_wise_distribution = netflix_data.pivot_table(index='rating', columns='type', aggfunc='count', values='show_id')
netflix_rating_wise_distribution = netflix_rating_wise_distribution.reset_index()
netflix_rating_wise_distribution.columns = ['rating', 'no_of_movies', 'no_of_tv_shows']

In [45]:
netflix_rating_wise_distribution.head(10)

Unnamed: 0,rating,no_of_movies,no_of_tv_shows
0,G,41.0,
1,NC-17,3.0,
2,NR,75.0,5.0
3,PG,287.0,
4,PG-13,490.0,
5,R,797.0,2.0
6,TV-14,1427.0,733.0
7,TV-G,126.0,94.0
8,TV-MA,2062.0,1145.0
9,TV-PG,540.0,323.0


Based on the data and the histogram above, **we can conclude the following about the popularity of different rating categories among Netflix subscribers**:

1. **TV-MA (Mature Audience)**: TV-MA rated content has the highest number of both movies (2062) and TV shows (1145) available on Netflix. This rating category is intended for mature audiences and indicates that **Netflix has a significant amount of content targeting adult viewers**.  

2. **TV-14 (Parents Strongly Cautioned)**: The TV-14 rating category is the second most popular among Netflix subscribers, with 1427 movies and 733 TV shows falling under this rating. This suggests that **there is a substantial demand for content suitable for viewers aged 14 and above**.

3. **R-Rated Movies**: Among movies specifically, the R rating category is one of the most prevalent, with 797 movies available on Netflix. The R rating indicates that the content may contain adult themes, strong language, violence, or other mature content. This suggests that **there is a significant audience for more mature and adult-oriented movies on the platform**.

4. **TV-PG and TV-G**: **TV-PG and TV-G ratings have a moderate presence on Netflix**, with 540 movies and 323 TV shows falling under the TV-PG category, and 126 movies and 94 TV shows classified as TV-G. These ratings are suitable for general audiences and may appeal to families and younger viewers.

5. **Limited Availability of Other Ratings**: The dataset indicates a limited number of movies and TV shows in other rating categories such as G, NC-17, NR, PG, and PG-13. This could suggest that **Netflix focuses more on content for mature audiences (TV-MA and TV-14) and may have a relatively smaller selection of content for younger or family-oriented audiences**.  

Overall, the data suggests that **Netflix subscribers have a higher preference for content rated TV-MA and TV-14, indicating a significant demand for mature and adult-oriented programming**.

## Study the growth and expansion of the platform's library <a id='study-the-growth'></a>  
[Back to Contents](#contents)

Let's create a histogram to visualize the frequency of content additions to Netflix over time. This can provide us insights into the growth and expansion of the platform's library.

In [46]:
fig = px.histogram(
    netflix_data,
    title='Frequency of content additions to Netflix over time',
    x='year_added',
    color="type",
    barmode='overlay')
fig.update_layout(yaxis_title="Number of Movies / TV Show")
fig.update_layout(xaxis_title="Year added to Netflix")
fig.show()

Let's also compare the above histogram with the number of movies/ TV shows added to Netflix per year for both the types:

In [47]:
netflix_added_year_wise_distribution = netflix_data.pivot_table(index='year_added', columns='type', aggfunc='count', values='show_id')
netflix_added_year_wise_distribution = netflix_added_year_wise_distribution.reset_index()
netflix_added_year_wise_distribution.columns = ['year_added', 'no_of_movies', 'no_of_tv_shows']

In [48]:
netflix_added_year_wise_distribution

Unnamed: 0,year_added,no_of_movies,no_of_tv_shows
0,2008,1.0,1.0
1,2009,2.0,
2,2010,1.0,
3,2011,13.0,
4,2012,3.0,
5,2013,6.0,5.0
6,2014,19.0,5.0
7,2015,56.0,26.0
8,2016,253.0,176.0
9,2017,839.0,349.0


Based on the data above and histogram, **we can gain the following insights into the growth and expansion of Netflix's library over time**:

1. **Steady Growth**: **The number of content additions to Netflix has shown a consistent upward trend from 2008 to 2021, indicating the platform's continuous expansion**. The number of movies and TV shows added per year has generally increased over time.

2. **Accelerated Growth in Recent Years**: **The most substantial growth in content additions occurred in the years 2016 to 2021**. During this period, there was a significant surge in both movie and TV show additions, with notable spikes in 2017, 2018, and 2019.

3. **Shift towards TV Shows**: While the number of movie additions has remained relatively consistent over the years, there has been a notable increase in TV show additions. **This indicates a strategic shift by Netflix towards expanding its TV show library, possibly in response to the growing popularity of binge-watching and serialized content.**

4. **High Growth in 2017-2019**: The years 2017, 2018, and 2019 witnessed remarkable growth in content additions, with the highest number of movies and TV shows added during this period. This could be attributed to **Netflix's aggressive content acquisition and production strategies, including the release of original series and licensing deals with various studios**.

5. **Slight Decrease in 2020-2021**: The years 2020 and 2021 saw a slight decrease in the number of content additions compared to the peak years of 2017-2019.   

Overall, **the data suggests that Netflix has experienced significant growth and expansion in its content library over the years, with a focus on increasing the number of TV shows. The platform has continuously added new movies and TV shows, contributing to its vast collection and offering diverse options to its subscribers**.

## Analyze the duration against the release year to check for any patterns or changes in the length of content over time

In order to analyze the duration (in minutes for movies or number of seasons for TV shows) against the release year can reveal any patterns or changes in the length of content over time, it will be great if we could divide the dataframe - `netflix_data` into two parts - `netflix_movies` and `netflix_tv_shows`:

In [49]:
netflix_movies = netflix_data[netflix_data['type'] == 'Movie']
netflix_movies = netflix_movies.reset_index(drop=True)
netflix_movies.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,Unknown,United States,2021-09-25,2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",2021
1,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",Unknown,2021-09-24,2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021
2,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021
3,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021
4,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic",2021-09-23,2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,2021


In [50]:
netflix_tv_shows = netflix_data[netflix_data['type'] == 'TV Show']
netflix_tv_shows = netflix_tv_shows.reset_index(drop=True)
netflix_tv_shows.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added
0,s2,TV Show,Blood & Water,Unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021
1,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",Unknown,2021-09-24,2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021
2,s4,TV Show,Jailbirds New Orleans,Unknown,Unknown,Unknown,2021-09-24,2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021
3,s5,TV Show,Kota Factory,Unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021
4,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",Unknown,2021-09-24,2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...,2021


Let's clean up the `duration` column in both the dataframes - `netflix_movies` and `netflix_tv_shows`:

In [51]:
netflix_movies['duration'] = netflix_movies['duration'].str.replace(' min', '')
netflix_movies['duration'] = netflix_movies['duration'].astype(int)
netflix_movies.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,Unknown,United States,2021-09-25,2020,PG-13,90,Documentaries,"As her father nears the end of his life, filmm...",2021
1,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",Unknown,2021-09-24,2021,PG,91,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,2021
2,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...",2021-09-24,1993,TV-MA,125,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",2021
3,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,2021-09-24,2021,PG-13,104,"Comedies, Dramas",A woman adjusting to life after a loss contend...,2021
4,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic",2021-09-23,2021,TV-MA,127,"Dramas, International Movies",After most of her family is murdered in a terr...,2021


In [52]:
netflix_tv_shows['duration'] = netflix_tv_shows['duration'].str.replace(' Season', '')
netflix_tv_shows['duration'] = netflix_tv_shows['duration'].str.replace('s', '')
netflix_tv_shows['duration'] = netflix_tv_shows['duration'].astype(int)
netflix_tv_shows.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,year_added
0,s2,TV Show,Blood & Water,Unknown,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,2021-09-24,2021,TV-MA,2,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",2021
1,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",Unknown,2021-09-24,2021,TV-MA,1,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,2021
2,s4,TV Show,Jailbirds New Orleans,Unknown,Unknown,Unknown,2021-09-24,2021,TV-MA,1,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo...",2021
3,s5,TV Show,Kota Factory,Unknown,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,2021-09-24,2021,TV-MA,2,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...,2021
4,s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, H...",Unknown,2021-09-24,2021,TV-MA,1,"TV Dramas, TV Horror, TV Mysteries",The arrival of a charismatic young priest brin...,2021


Let's plot a scatterplot for movies on Netflix - `release_year` on x-axis and `duration` on y-axis. This plot can help determine if there is a trend towards shorter or longer movies.

In [53]:
fig = px.scatter(
    netflix_movies,
    title='Distribution of movies duration against the release year',
    x='release_year',
    y='duration',
    height=800)
fig.update_layout(yaxis_title="Duration (in mins)")
fig.update_layout(xaxis_title="Year of Release")
fig.show()

Let's plot a scatterplot for TV Shows on Netflix - `release_year` on x-axis and `duration` on y-axis. This plot can help determine if there is a trend towards shorter or longer TV Shows.

In [54]:
fig = px.scatter(
    netflix_tv_shows,
    title='Distribution of TV Shows duration against the release year',
    x='release_year',
    y='duration',
    height=800)
fig.update_layout(yaxis_title="Duration (no of seasons)")
fig.update_layout(xaxis_title="Year of Release")
fig.show()

With the scatter plots above, we can conclude two things:
1. There is a trend towards shorter and medium length movies.
2. There are a few really long length movies released lately too.
3. There is a trend towards longer TV Shows.