**About Dataset**
Netflix is one of the most popular media and video streaming platforms. They have over 8000 movies or tv shows available on their platform, as of mid-2021, they have over 200M Subscribers globally. This tabular dataset consists of listings of all the movies and tv shows available on Netflix, along with details such as - cast, directors, ratings, release year, duration, etc.

In [1]:
#Lets import the necessary libraries

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
#Load the dataset

raw_data = pd.read_csv("/kaggle/input/netflix-shows/netflix_titles.csv")

FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/netflix-shows/netflix_titles.csv'

In [None]:
#Let's get an overview

raw_data.head()

In [None]:
#Lets check for missing values

raw_data.isnull().sum()


In [None]:
#lets drop the rows which has null values

data1=raw_data.dropna()

In [None]:
data1.isnull().sum()

In [None]:
#Lets remove the unwanted columns
unwanted_columns = ["show_id","description","cast"]

data = data1.drop(unwanted_columns, axis=1)

In [None]:
#Lets get the summary statistics
data.describe()

In [None]:
#lets check for duplicates
duplicate_rows = data[data.duplicated()]
duplicate_rows

**Movie/Show Distribution by Countries**

In [None]:
no_of_movies = data["country"].value_counts().nlargest(10)
no_of_movies

#lets create a barchart to visualize

sns.barplot(x = no_of_movies.index,y = no_of_movies.values, palette="viridis")
plt.title("Total Number of Shows/Movies by County")
plt.xlabel("Country")
plt.ylabel("No of Movies/Shows")
plt.xticks(rotation=45)
plt.show()

**Top director by no of movies or shows**

In [None]:
top_director = data["director"].value_counts().nlargest(10)
top_director

sns.barplot(x = top_director.index, y = top_director.values, palette = "tab10")
plt.title("No of Movies by Director")
plt.xlabel("Director")
plt.ylabel("No of Movies/Shows")
plt.xticks(rotation=90)
plt.show()

**Number of unique directors for each country**

In [None]:
directors_by_country = data.groupby("country")["director"].nunique().nlargest(10)
plt.figure(figsize=(12, 6))
sns.stripplot(x=directors_by_country.values, y=directors_by_country.index, color='blue', size=8, jitter=True)
plt.title('Top 10 Countries with the Most Unique Directors')
plt.xlabel('Number of Directors')
plt.ylabel('Country')
plt.tight_layout()
plt.show()

**Lets find the number of movies and TV shows are in Netflix**

In [None]:
count_type = data["type"].value_counts()
plt.figure(figsize=(8, 8))
plt.pie(count_type, labels=count_type.index, autopct='%1.1f%%', colors=['#66b3ff', '#99ff99'])
plt.title('Distribution of Movies and TV Shows on Netflix')
plt.show()

**In Which Date Most of the Movies Added**

In [None]:
date = data["date_added"].value_counts().nlargest(10)
plt.figure(figsize=(12, 6))
plt.plot(date.index.astype(str), date.values, marker='o', linestyle='-', color='b')
plt.title('Top 10 Months with the Highest Number of Additions on Netflix')
plt.xlabel('Year-Month')
plt.ylabel('Number of Additions')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

**In Which Year Most of the Movies Released**

In [None]:
year = data["release_year"].value_counts().nlargest(10)
plt.figure(figsize=(12, 6))
plt.plot(year.index.astype(str), year.values, marker='o', linestyle='-', color='r')
plt.title('Top 10 Year with Maximum Movie Releases')
plt.xlabel('Year')
plt.ylabel('Number of Movies')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

**1. Which country has more movies and TV shows: the United States or India?**

Answer: The United States and India.

**2. Which director has more movies or shows: Raul Campos, Jan Sutter, or Marcus Robby?**

Answer: Raul Campos, Jan Sutter, and Marcus Robby.

**3. In which country do most directors belong?**

Answer: The United States.

**4. What is the distribution of movies and TV shows?**

Answer: Movies account for 97%, and TV shows make up 2.8%.

**5. On which date were most of the movies added, and in which year were most of the movies released?**

Answer: Most movies were added on January 1, 2020, and over 650 movies were released in 2017.