Dataset:

https://www.kaggle.com/shivamb/netflix-shows/download

### Import libraries and dataset

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px # for data visualization
from textblob import TextBlob  # for sentiment analysis

df = pd.read_csv("netflix_titles.csv")
df.shape

(8807, 12)

In [2]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

In [4]:
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


### Distribution of Content:

To begin the task of analyzing Netflix data, I’ll start by looking at the distribution of content ratings on Netflix:

In [5]:
z = df.groupby(['rating']).size().reset_index(name='counts')
pieChart = px.pie(z, values='counts', names='rating',
                  title='Distribution of Content Ratings on Netlfix',
                  color_discrete_sequence=px.colors.qualitative.Set3)
pieChart.show()

The graph above shows that the majority of content on Netflix is categorized as “TV-MA”, which means that most of the content available on Netflix is intended for viewing by mature and adult audiences.

### Top 5 Actors and Directors:

Now let’s see the top 5 successful directors on this platform:

In [6]:
df['director'] = df['director'].fillna('No Director Specified')
filtered_directors = pd.DataFrame()
filtered_directors = df['director'].str.split(',',expand=True).stack()
filtered_directors = filtered_directors.to_frame()
filtered_directors.columns = ['Director']
directors = filtered_directors.groupby(['Director']).size().reset_index(name='Total Content')
directors = directors.sort_values(by=['Total Content'],ascending=False)
directorsTop5 = directors.head()
directorsTop5 = directorsTop5.sort_values(by=['Total Content'])
fig1=px.bar(directorsTop5, x='Total Content', y='Director',title='Top 5 Directors on Netflix')
fig1.show()

Now let’s have a look at the top 5 successful actors on this platform:

In [7]:
df['cast']=df['cast'].fillna('No Cast Specified')
filtered_cast=pd.DataFrame()
filtered_cast=df['cast'].str.split(',',expand=True).stack()
filtered_cast=filtered_cast.to_frame()
filtered_cast.columns=['Actor']
actors=filtered_cast.groupby(['Actor']).size().reset_index(name='Total Content')
actors=actors[actors.Actor !='No Cast Specified']
actors=actors.sort_values(by=['Total Content'],ascending=False)
actorsTop5=actors.head()
actorsTop5=actorsTop5.sort_values(by=['Total Content'])
fig2=px.bar(actorsTop5,x='Total Content',y='Actor',title='Top 5 Actors on Netflix')
fig2.show()

### Analyzing Content on Netflix:

The next thing to analyze from this data is the trend of production over the years on Netflix:

In [10]:
df1 = df[['type','release_year']]
df1=df1.rename(columns={"release_year":"Release Year"})
df2 = df1.groupby(['Release Year','type']).size().reset_index(name='Total Content')
df2=df2[df2['Release Year']>2000]
fig3 = px.line(df2, x="Release Year", y="Total Content", color='type',title='Trend of Content produced over the years on Netflix')
fig3.show()

Analyze the sentiment of content on Netflix:

In [13]:
dfx=df[['release_year','description']]
dfx=dfx.rename(columns={'release_year':'Release Year'})
for index, row in dfx.iterrows():
    z=row['description']
    testimonial=TextBlob(z)
    p=testimonial.sentiment.polarity
    if p==0:
        sent='Neutral'
    elif p>0:
        sent='Positive'
    else:
        sent='Negative'
    dfx.loc[[index,2],'Sentiment']=sent

dfx=dfx.groupby(['Release Year','Sentiment']).size().reset_index(name='Total Content')

dfx=dfx[dfx['Release Year']>2000]
fig4 = px.bar(dfx, x="Release Year", y="Total Content", color="Sentiment",title="Sentiment of content on Netflix")
fig4.show()

So the above graph shows that the overall positive content is always greater than the neutral and negative content combined.

Source:

https://thecleverprogrammer.com/2021/01/16/netflix-data-analysis-with-python/