<a href="https://colab.research.google.com/github/WMinerva292/WMinerva292/blob/main/CapstoneProjectNetflix.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Capstone Project - Netflix**

Customer Behaviour and it’s prediction lies
at the core of every Business Model. From
Stock Exchange, e-Commerce and
Automobile to even Presidential Elections,
predictions serve a great purpose. Most of
these predictions are based on the data
available about a person’s activity either
online or in-person.

Recommendation Engines are the much
needed manifestations of the desired
Predictability of User Activity.
Recommendation Engines move one step
further and not only give information but
put forth strategies to further increase users
interaction with the platform.

In today’s world OTT platform and Streaming
Services have taken up a big chunk in the
Retail and Entertainment industry.
Organizations like Netflix, Amazon etc.
analyse User Activity Pattern’s and suggest
products that better suit the user needs and
choices.

For the purpose of this Project we will be
creating one such Recommendation Engine
from the ground-up, where every single user,
based on there area of interest and ratings,
would be recommended a list of movies that
are best suited for them.

**Dataset Information:**

1. ID – Contains the separate keys for
customer and movies.
2. Rating – A section contains the user
ratings for all the movies.
3. Genre – Highlights the category of the
movie.
4. Movie Name – Name of the movie with
respect to the movie id.

**Objectives:**

1. Find out the list of most popular and liked genre
2. Create Model that finds the best suited Movie for one
user in every genre.
3. Find what Genre Movies have received the best and
worst ratings based on User Rating.

In [None]:
# Loading the Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load the .csv file

df = pd.read_csv('/content/netflix_titles-1.csv')

In [None]:
df.head()

Unnamed: 0,ID,Movie Name,Rating,Genre
0,s1,Dick Johnson Is Dead,PG-13,Documentaries
1,s2,Blood & Water,TV-MA,"International TV Shows, TV Dramas, TV Mysteries"
2,s3,Ganglands,TV-MA,"Crime TV Shows, International TV Shows, TV Act..."
3,s4,Jailbirds New Orleans,TV-MA,"Docuseries, Reality TV"
4,s5,Kota Factory,TV-MA,"International TV Shows, Romantic TV Shows, TV ..."


In [None]:
df.describe()

Unnamed: 0,ID,Movie Name,Rating,Genre
count,8807,8807,8803,8807
unique,8807,8804,17,514
top,s1,15-Aug,TV-MA,"Dramas, International Movies"
freq,1,2,3207,362


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8807 entries, 0 to 8806
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   ID          8807 non-null   object
 1   Movie Name  8807 non-null   object
 2   Rating      8803 non-null   object
 3   Genre       8807 non-null   object
dtypes: object(4)
memory usage: 275.3+ KB


In [None]:
df.isnull().sum()

Unnamed: 0,0
ID,0
Movie Name,0
Rating,4
Genre,0


In [None]:
df.isnull().sum()['Rating']

4

In [None]:
# 1. Find out the list of most popular and like genre.
# Calculate the average rating of each genre
df['Rating'] = pd.to_numeric(df['Rating'], errors='coerce')
genre_ratings = df.groupby('Genre')['Rating'].mean().reset_index()
popular_genres = genre_ratings.sort_values(by='Rating', ascending=False)
print('Most popular and liked genre')
print(genre_ratings)

Most popular and liked genre
                                                 Genre  Rating
0                                   Action & Adventure     NaN
1                   Action & Adventure, Anime Features     NaN
2    Action & Adventure, Anime Features, Children &...     NaN
3    Action & Adventure, Anime Features, Classic Mo...     NaN
4    Action & Adventure, Anime Features, Horror Movies     NaN
..                                                 ...     ...
509             TV Horror, TV Mysteries, Teen TV Shows     NaN
510                           TV Horror, Teen TV Shows     NaN
511                  TV Sci-Fi & Fantasy, TV Thrillers     NaN
512                                           TV Shows     NaN
513                                          Thrillers     NaN

[514 rows x 2 columns]


In [None]:
# 2. Create Model that finds the best suited Movie for one user in every genre.
best_suited_movie = df.groupby(['Genre', 'Movie Name'])['Rating'].mean().reset_index()
best_suited_movie

Unnamed: 0,Genre,Movie Name,Rating
0,Action & Adventure,"10,000 B.C.",
1,Action & Adventure,16 Blocks,
2,Action & Adventure,24 Hours to Live,
3,Action & Adventure,3 Days to Kill,
4,Action & Adventure,6 Bullets,
...,...,...,...
8799,Thrillers,The Vanished,
8800,Thrillers,The Wrong Babysitter,
8801,Thrillers,Two Graves,
8802,Thrillers,We Belong Together,


In [None]:
#or
df = pd.read_csv('/content/netflix_titles-1.csv')


In [None]:
# Find the best-suited movie for each genre
best_movies = df.loc[df.groupby("Genre")["Rating"].idxmax().dropna()]

# Display the results
print("Best-suited Movie for Each Genre:")
print(best_movies[["Genre", "Movie Name", "Rating"]])

Best-suited Movie for Each Genre:
                                                  Genre  \
270                                  Action & Adventure   
240                  Action & Adventure, Anime Features   
279   Action & Adventure, Anime Features, Children &...   
689   Action & Adventure, Anime Features, Classic Mo...   
1041  Action & Adventure, Anime Features, Horror Movies   
...                                                 ...   
3887             TV Horror, TV Mysteries, Teen TV Shows   
1277                           TV Horror, Teen TV Shows   
5263                  TV Sci-Fi & Fantasy, TV Thrillers   
297                                            TV Shows   
3429                                          Thrillers   

                                Movie Name Rating  
270                                Beckett  TV-MA  
240     The Witcher: Nightmare of the Wolf  TV-MA  
279   Monster Hunter: Legends of the Guild  TV-PG  
689                   Mobile Suit Gundam I  TV-14

In [None]:
# 3. Find what Genre Movies have received the best and worst ratings based on User Rating.
# Handle NaN values in the 'Rating' column

# Check if genre_ratings is empty after dropna
if genre_ratings.empty:
    print("genre_ratings is empty after dropping NaN values. Cannot calculate best and worst genres.")
else:
    # Remove NaN values from the 'Rating' column before finding the maximum and minimum
    genre_ratings = genre_ratings.dropna(subset=['Rating'])

    # Check if genre_ratings is empty after dropna
    if genre_ratings.empty:
        print("genre_ratings is empty after dropping NaN values. Cannot calculate best and worst genres.")
    else:
        best_genre = genre_ratings.loc[genre_ratings['Rating'].idxmax(), 'Genre'] # Extract the genre with the best rating
        best_rating = genre_ratings['Rating'].max() # Get the best rating
        worst_genre = genre_ratings.loc[genre_ratings['Rating'].idxmin(), 'Genre'] # Extract the genre with the worst rating
        worst_rating = genre_ratings['Rating'].min()  # Get the worst rating

        # Display results
        print(f"The genre with the best rating is '{best_genre}' with a rating of {best_rating:.2f}.")
        print(f"The genre with the worst rating is '{worst_genre}' with a rating of {worst_rating:.2f}.")

genre_ratings is empty after dropping NaN values. Cannot calculate best and worst genres.


In [None]:
# or
# Load the .csv file
df = pd.read_csv('/content/netflix_titles-1.csv')

# Convert 'Rating' column to numeric, handling errors by setting them to NaN
df['Rating'] = pd.to_numeric(df['Rating'], errors='coerce')

# Calculate average ratings by genre, excluding NaN values
genre_rating = df.groupby("Genre")["Rating"].mean(numeric_only=True)

# Find the genre with the highest and lowest average rating
best_genre = genre_rating.idxmax()
best_rating = genre_rating.max()

worst_genre = genre_rating.idxmin()
worst_rating = genre_rating.min()

# Display results
print(f"The genre with the best average rating is '{best_genre}' with a rating of {best_rating:.2f}.")
print(f"The genre with the worst average rating is '{worst_genre}' with a rating of {worst_rating:.2f}.")

The genre with the best average rating is 'nan' with a rating of nan.
The genre with the worst average rating is 'nan' with a rating of nan.


  best_genre = genre_rating.idxmax()
  worst_genre = genre_rating.idxmin()
