# NETFLIX

## How Netflix’s Recommendations System Works

Our business is a subscription service model that offers personalized recommendations, to help you find shows and movies of interest to you. To do this we have created a proprietary, complex recommendations system. This article provides a high level description of our recommendations system in plain language.

![](https://storage.googleapis.com/craft-ediflo-website/user-uploads/img/netflix-rec.jpg?mtime=20190606092105)
### The basics

Whenever you access the Netflix service, our recommendations system strives to help you find a show or movie to enjoy with minimal effort. We estimate the likelihood that you will watch a particular title in our catalog based on a number of factors including:
* your interactions with our service (such as your viewing history and how you rated other titles),

* other members with similar tastes and preferences on our service, and

* information about the titles, such as their genre, categories, actors, release year, etc.

In addition to knowing what you have watched on Netflix, to best personalize the recommendations we also look at things like:
* the time of day you watch,

* the devices you are watching Netflix on, and

* how long you watch.

In [None]:
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt
from collections import Counter

In [None]:
import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

In [None]:
data=pd.read_csv('../input/netflix-shows/netflix_titles.csv',index_col=0)

In [None]:
data.info()

In [None]:
data.head()

### filling NaN values

In [None]:
data.isna().sum()

In [None]:
data.drop(['director', 'date_added', 'description'], axis=1, inplace=True)

In [None]:
data.dropna(subset=['cast'],inplace=True)

In [None]:
data.country= data.country.fillna('United States')

In [None]:
data.rating=data.rating.fillna('TV-MA')

In [None]:
data.isna().sum()

# visualizing trends between the variables

In [None]:
cast_counter_list = dict(Counter(data['cast']).most_common(5))

In [None]:
freq_actor=pd.DataFrame({'actor_name':cast_counter_list.keys(),'no_ofMovies':cast_counter_list.values()})

In [None]:
freq_actor

In [None]:
plt.figure(figsize=(10,7))
sb.barplot(x=freq_actor.actor_name,y=freq_actor['no_ofMovies'])
plt.xticks(rotation=20)
plt.title('most popular actor')
plt.show()

In [None]:
data['country'].value_counts()

In [None]:
country_counter_list = dict(Counter(data['country']).most_common(6))

In [None]:
country_counter_list

In [None]:
freq_country=pd.DataFrame({'country_name':country_counter_list.keys(),'no_ofMovies':country_counter_list.values()})

In [None]:
freq_country

In [None]:
plt.figure(figsize=(10,7))
sb.barplot(x=freq_country.country_name,y=freq_country['no_ofMovies'])
plt.xticks(rotation=20)
plt.title('popular film industries')
plt.show()

In [None]:
duration_counter_list = dict(Counter(data['duration']).most_common(40))

In [None]:
# pie chart of duration of films

px.pie(data,names=duration_counter_list.keys(),values=duration_counter_list.values())

In [None]:
listed_in_counter_list = dict(Counter(data['listed_in']).most_common(30))

In [None]:
# pie chart of popular genres

px.pie(data,names=listed_in_counter_list.keys(),values=listed_in_counter_list.values())

In [None]:
# plots between released year vs type of movie

sb.displot(data,x='release_year',hue='type')
plt.title('released year vs type of movie')
plt.show()

In [None]:
sb.stripplot(data=data, x="release_year", y="type")
plt.show()

In [None]:
data.rating.value_counts()

## Maturity ratings and classifications on Netflix
### Kids
* **TV-Y**   Designed to be appropriate for all children

* **TV-Y7**  Suitable for ages 7 and up

* **G**   Suitable for General Audiences

* **TV-G**  Suitable for General Audiences

* **PG**   Parental Guidance suggested

* **TV-PG**  Parental Guidance suggested

### Teens
* **PG-13**  Parents strongly cautioned. May be Inappropriate for ages 12 and under.

* **TV-14**  Parents strongly cautioned. May not be suitable for ages 14 and under.

### Adults
* **R** Restricted. May be inappropriate for ages 17 and under.

* **TV-MA**  For Mature Audiences. May not be suitable for ages 17 and under.

* **NC-17**  Inappropriate for ages 17 and under

In [None]:
adults = ['R','TV-MA','NC-17','NR','UR']
teens = ['PG-13','TV-14']
kids = ['TV-Y','TV-Y7','G','TV-G','PG','TV-PG','TV-Y7-FV']

In [None]:
for i in range(len(data.rating)):
    if(data.rating[i] in adults):
        data.rating[i]='adults'
    elif(data.rating[i] in teens):
        data.rating[i]= 'teens'
    elif(data.rating[i] in kids):
        data.rating[i]= 'kids'

In [None]:
data.rating.value_counts()

In [None]:
# displot between film release year density vs rating

sb.displot(
    data=data,
    x="release_year", hue="rating",
    kind="kde", height=6,
    multiple="fill", clip=(0, None),
    palette="ch:rot=-.25,hue=1,light=.75",
)
plt.title('film release year density vs rating')
plt.show()

In [None]:
import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

In [None]:
data.type = data.type.astype('category')
data.rating = data.rating.astype('category')

In [None]:
duration_list = list(data.duration)

In [None]:
for i in range(len(duration_list)):
    if duration_list[i].split(' ')[1]=='min':
        duration_list[i] = int(duration_list[i].split(' ')[0])
    else:
        duration_list[i] = int(duration_list[i].split(' ')[0])*90

In [None]:
data.duration = duration_list

In [None]:
data.info()

In [None]:
# movie type vs rating vs release year

fig = px.sunburst(data, path=['rating', 'type', 'release_year'])
fig.show()

In [None]:
# sankey diagram for categorical variables

fig = px.parallel_categories(data,dimensions=['type', 'rating'])
fig.show()

In [None]:
# bubble plot between film duration vs released year

fig = px.scatter(data, x="release_year", y="duration",
                 color="type",size="duration",marginal_x="box",
                 marginal_y="violin",hover_name="title",
                 title="Click on the legend items!")
fig.show()

In [None]:
# area plot of rating distribution

fig = px.area(data, x="release_year", y="duration", color="rating")
fig.show()

## If you like, an upvote would be deeply appreciated. Thanks! :)