# 0.0 About the data source

TV Shows and Movies listed on Netflix
This dataset consists of tv shows and movies available on Netflix as of 2019. The dataset is collected from Flixable which is a third-party Netflix search engine.

In 2018, they released an interesting report which shows that the number of TV shows on Netflix has nearly tripled since 2010. The streaming service’s number of movies has decreased by more than 2,000 titles since 2010, while its number of TV shows has nearly tripled. It will be interesting to explore what all other insights can be obtained from the same dataset.

Integrating this dataset with other external datasets such as IMDB ratings, rotten tomatoes can also provide many interesting findings.

Inspiration
Some of the interesting questions (tasks) which can be performed on this dataset -

Understanding what content is available in different countries
Identifying similar content by matching text-based features
Network analysis of Actors / Directors and find interesting insights
Is Netflix has increasingly focusing on TV rather than movies in recent years.

# 1.1 Load the data

In [None]:
import numpy as np 
import pandas as pd
import seaborn as sns
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
path = '/kaggle/input/netflix-shows'
os.chdir(path)
os.getcwd()

# 1.2 Basic exploration
From below we can know that the data set contains 6234 rows and 12 columns, printing the first 5 rows to get a basic idea about the data set.

In [None]:
netflix = pd.read_csv('netflix_titles.csv')
print(netflix.shape)
netflix.head()

# 2.1 Exploring some knowledge by plotting
By checking below chart we can know that there are lots of TV shows that are belong to TV-MA type.

In [None]:
netflix_bar_rating = netflix['rating'].value_counts()
netflix_bar_rating = pd.DataFrame(netflix_bar_rating).reset_index()
netflix_bar_rating.columns = ['rating','Nbr']
sns.barplot(y = 'rating',x = 'Nbr', data=netflix_bar_rating)
netflix_bar_rating

By looking at below chart we can know that there are more movies than TV shows for Netflix

In [None]:
netflix_bar_type = netflix['type'].value_counts()
netflix_bar_type = pd.DataFrame(netflix_bar_type).reset_index()
netflix_bar_type.columns = ['type','Nbr']
sns.barplot(y = 'type',x = 'Nbr', data=netflix_bar_type)
netflix_bar_type

I also want to know in which year did Netflix released more shows, you can see that the number of released shows had been increasing until 2019, but I don't know why it happened

In [None]:
netflix_bar_year = netflix['release_year'].value_counts()
netflix_bar_year = pd.DataFrame(netflix_bar_year).reset_index()
netflix_bar_year.columns = ['year','Nbr']
netflix_bar_sort = netflix_bar_year.sort_values('Nbr',ascending=False)
netflix_bar_top = netflix_bar_sort.head(10)
sns.barplot(y = 'Nbr',x = 'year', data=netflix_bar_top)
netflix_bar_top

You can also find that the first show is released in 1925, Netflix released its first show, then it stopped releasing until 17 years later, they released their 2 shows in 1942.

In [None]:
netflix_bar_tail = netflix_bar_year.sort_values('year',ascending=False).tail(10)
netflix_bar_tail

Let's find out what are those 3 shows name. 
I have never been watching these films, but by the name I know the first show that they released was a tv show that introduced the story of women filmmakers, it is kind of like a documentary and it is about art. But the films they released in 1942 were all about the war.


In [None]:
netflix[netflix['release_year']==1925]

In [None]:
netflix[netflix['release_year']==1942]

And I found that the duration of the second movie on the list is only 18 minutes. Emm, I am considering to watch it someday.


# 2.1.1 An interesting TV show that I would like to watch in 2018, which name is "You".

After searching the most popular video in 2018, I found an interesting TV show named "You", not sure if it is as interesting as its description.
It describes a girl and a boy met, and they fall in love. But actually the boy is a bad guy, who use technological means to get the girls information, then use the information to make her happly and love him.
The TV show is filmed from 2 perspectives, from the girl(heroine)'s perpective it is a love story, but from the boy(hero)'s perspective it is a crime story.
Let me see the details of this TV show by using below code.

In [None]:
netflix_you = netflix[netflix['title']=='You']
netflix_you