# Introduction
* Netflix has become one of the most popular streaming platforms in the world, offering a wide range of movies and TV shows for viewers to enjoy. As the platform continues to grow, it is important to understand the types of content available and how they are distributed. In this analysis, we will explore different aspects of Netflix's content library, including which types of content are most popular, how content production has evolved over time, and which countries are responsible for producing the most content. Specifically, we will examine whether TV shows or movies are more prevalent on Netflix by month, and whether there has been a shift in focus towards TV shows in recent years. We will also look at the top 10 directors who have produced content on Netflix. By visualizing this data with charts and graphs, we can gain a better understanding of the trends and patterns that exist within Netflix's content library, and how these patterns vary across countries and over time.

## Content
* First look at the data, get some basic information
* Prepare Data
* Visualization
* Summarize

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
import seaborn as sns
sns.set_style('darkgrid')
import warnings
warnings.filterwarnings('ignore')

from collections import Counter
from matplotlib.font_manager import FontProperties


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))




In [None]:
netflix = pd.read_csv('/kaggle/input/netflix-shows/netflix_titles.csv')

## First look at the data, get some basic information

In [None]:
netflix.head()

In [None]:
netflix.info()

* dtypes: 
  * int64(1)
    * release_year
  * object(11)
    * show_id, type, title, director, cast, country, date_added, rating, duration, listed_in, description


# Prepare Data
* Data Types


In [None]:
netflix['date_added'] = netflix['date_added'].str.strip()
netflix['date_added'] = pd.to_datetime(netflix['date_added'], format='%B %d, %Y')


In [None]:
netflix["date_added"].dtypes

# Visualization

In [None]:
TV_show = []
movies = []
for i in range(1,13):
  a,b = netflix["type"][netflix["date_added"].dt.month == i].value_counts().values
  x,y = netflix["type"][netflix["date_added"].dt.month == i].value_counts().index

  if x == 'Movie':
    movies.append(a)
    TV_show.append(b)
  else:
    movies.append(b)
    TV_show.append(a) 

In [None]:
plt.rcParams['axes.facecolor'] = 'white'
plt.rcParams['grid.color'] = 'gray'
plt.rcParams['grid.linestyle'] = ':'
plt.plot(range(1,13),TV_show, color = "red")
plt.plot(range(1,13),movies, color = "gray")
plt.title('Distribution of movies and TV shows by month', fontname='Arial', fontsize=15, fontweight='bold')
plt.xlabel("Month")
plt.ylabel("Count")
plt.show()

In [None]:
us = netflix.country.value_counts()[0]
others = netflix.country[netflix.country != "United States"].value_counts().sum()
a = netflix.country.value_counts().head(10)

fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 8))

# Pie chart
explode = (0, 0.1)
labels = ["US", "Others"]
ax1.pie((us, others), shadow=True, autopct='%1.1f%%', labels=labels,
        colors=["#E50914", "#343a40"], explode=explode, startangle=90,
        textprops={'color': 'white', 'fontsize': 12, 'fontweight': 'bold'})
ax1.set_title("Percentage of Content by Country", fontsize=15, fontweight='bold')
ax1.text(-0.7, -0.1, "USA",color = "white",fontsize=12, fontweight='bold')
ax1.text(0.2, 0, "OTHERS",color = "white",fontsize=12, fontweight='bold')
# Bar chart
ax2.barh(a.index, a.values, color="#E50914")
ax2.set_title('Number of Titles by Country', fontsize=15, fontweight='bold')


# we choose the top 10 countries with the most
top_countries = netflix.country.value_counts().index[:10]

listed_in = []
countries = []

for country in top_countries:
    temp = netflix["listed_in"][netflix.country == country].value_counts()
    if not temp.empty:
        listed_in_counts = temp.index[0]
        listed_in.append(listed_in_counts)
        countries.append(country)

# we only get the top 10 listed_in category and country
top_listed_in = listed_in[:10]
top_countries = countries[:10]

# We calculate the frequencies of listed_in categories
listed_in_counts = []
for category in top_listed_in:
    listed_in_counts.append(netflix["listed_in"][netflix["listed_in"] == category].count())

# create bar plot
ax3.barh(top_listed_in, listed_in_counts, color="#E50914")

ax3.set_title("Most Common Listed In Categories in Top 10 Countries", fontname='Arial', fontsize=15, fontweight='bold')
ax3.set_xlabel("Frequency")
ax3.set_ylabel("Listed In Category")

#ax4 

explode = (0,0.1)
labels = ["Movie","TV Show"]
ax4.pie(netflix["type"].value_counts(),shadow = True,startangle = 90, autopct='%1.1f%%', textprops={'color': 'white', 'fontsize': 12, 'fontweight': 'bold'}, labels = labels, colors = ["#E50914","#343a40"],explode = explode)
ax4.legend(loc="best", labels=labels, prop={'size': 12}, title_fontsize=14,
           facecolor='white', framealpha=1, edgecolor='black', bbox_to_anchor=(1, 0.5))
ax4.axis('equal')
ax4.set_title('Netflix Streaming movies and TV shows between 2008 and 2021', fontname='Arial', fontsize=15, fontweight='bold')

# Adjust subplots
plt.subplots_adjust(left=0.01, right=0.95, bottom=0.05, top=0.95,
                    wspace=0.9, hspace=0.50)
plt.show()


In [None]:
plt.figure(figsize = (12,5))
plt.barh(netflix.listed_in.value_counts().head(10).index,netflix.listed_in.value_counts().head(10).values, color = "#E50914")
plt.title("Distribution of categories",fontsize=15, fontweight='bold')
plt.show()

In [None]:

counts = netflix.groupby([netflix['date_added'].dt.year, 'type']).size().unstack()

counts.plot(kind='line', color = ["#E50914", "#343a40"])
plt.xticks(counts.index, rotation=90)
plt.title("Distribution of Movies by Year",fontsize=15, fontweight='bold')
plt.xlabel('Year Added to Netflix')
plt.ylabel('Count')
plt.show()


In [None]:
top_directors = netflix.director.value_counts().head(10)

plt.barh(top_directors.index,top_directors.values, color = "#E50914")
plt.title("Top 10 Directors on Netflix",fontsize=15, fontweight='bold')
plt.xlabel("Number of Movies/TV Shows they made",fontsize=9)
plt.show()

# Summarize

Insights: 

1. The USA produces 35.3% of all content on Netflix, with the rest coming from other countries.
1. India produces the most content in the Comedies, Dramas, and International 1. Movies categories, while the USA produces the most content in the Documentaries category.
1. Between 2008 and 2021, Netflix released 69.6% movies and 30.4% TV shows.
1. The most popular content categories on Netflix worldwide are Dramas, International Movies, and Documentaries, while the least popular categories are Romantic Movies and TV Dramas.
1. Between 2008 and 2015, the number of movies and TV shows released by Netflix was fairly consistent, ranging from 0-200 per year. However, from 2015-2019, there was a sharp increase in the number of movies released, with a total of 1400 films produced in 2019. From 2019-2021, there has been a decline in movie production, but it remains the most produced type of content on Netflix.
1. Rajiv Chilaka is the director who has produced the most content on Netflix.