**Advance Analysis**

1. **Correlation Analysis**: Examines the relationship between movie duration and ratings.

2. **Genre Analysis**: Identifies the most and least common genres on Netflix.

3. **Timing Analysis**: Calculates the average time for movies to be added after their release.

4. **Years Difference**: Highlights movies with the longest and shortest time differences between release and addition to Netflix.

5. **Negative Difference Rows**: Finds instances where movies were added to Netflix before their theatrical release.

In [1]:
import numpy as np
import pandas as pd 

#importing the dataset
data = pd.read_csv("C:/Users/HP/Desktop/ANUDIP/Python/Project Python Anudip/cleaned_netflix_database_Anudip.csv")

**Analyzing the Rating with duration**

In [42]:
mean_duration_by_rating = data.groupby('rating')['Duration_Movies'].mean().sort_values(ascending=False)
round(mean_duration_by_rating,2)

rating
NC-17       125.00
TV-14       110.29
PG-13       108.33
R           106.72
UR          106.33
PG           98.28
TV-MA        95.87
TV-PG        94.85
NR           94.53
G            90.27
TV-G         79.67
TV-Y7-FV     68.40
TV-Y7        66.29
TV-Y         48.11
Name: Duration_Movies, dtype: float64

* This analysis shows the average duration of movies for each rating category.

- We can identify trends, such as whether certain rating categories tend to have longer or shorter movies.*

- Here, NC-17 has the highest average duration while TV-Y has the lowest average duration.

**Genre Analysis**

In [3]:
from collections import Counter

# Join all genres into a single string and split them into a list
all_genres = ', '.join(data['listed_in'].dropna()).split(', ')

# Count the occurrences of each genre
genre_counts = Counter(all_genres)

# Create a DataFrame from the genre counts
df_genre = pd.DataFrame(list(genre_counts.most_common()), columns=['genre', 'Count'])

# Sort the DataFrame by Count in descending order
df_genre = df_genre.sort_values(by='Count', ascending=False)

# Print the top 10 most common genres
print("The Top 10 genres are:\n")
print(df_genre.head(10))

# Print the last 10 common genres
print("\nThe least common genres are:\n")
print(df_genre.tail(10))

The Top 10 genres are:

                      genre  Count
0      International Movies   2752
1                    Dramas   2427
2                  Comedies   1674
3    International TV Shows   1351
4             Documentaries    869
5        Action & Adventure    859
6                 TV Dramas    763
7        Independent Movies    756
8  Children & Family Movies    641
9           Romantic Movies    616

The least common genres are:

                           genre  Count
32                     TV Horror     75
33                Anime Features     71
34                   Cult Movies     71
35                 Teen TV Shows     69
36          Faith & Spirituality     65
37                  TV Thrillers     57
38                        Movies     57
39  Stand-Up Comedy & Talk Shows     56
40             Classic & Cult TV     28
41                      TV Shows     16


- International Movies, followed by Dramas and Comedy are the most published genre.
- TV Shows, followed by Classic & Cult TV and Stand-Up Comedy & Talk Shows, are least published genre on Netflix.

**Average Time Period for Content Addition**

In [43]:
# Convert 'date_added' to datetime
date_added = pd.to_datetime(data['date_added'])

# Create a release date from 'release_year' by converting it to a datetime format
release_year = pd.to_datetime(data['release_year'].astype(str) + '-01-01')

# Calculate the average years difference without adding a new column
average_years_difference = ((date_added - release_year).dt.days / 365).mean()

# Display the result
print("Average years after release before being added to Netflix:", round(average_years_difference, 1), 'Years')

Average years after release before being added to Netflix: 5.2 Years


- This analysis calculates the average time difference between a movie's release date and when it was added to Netflix.

- The average time is approximately rounded to a specific number of years, indicating how long it typically takes for content to appear on the platform after its official release.

- Average year differnce is 5.2 years

**Maximum Years Difference**

In [16]:
# Convert 'date_added' to datetime
date_added = pd.to_datetime(data['date_added'])

# Create a release date from 'release_year' by converting it to a datetime format
release_date = pd.to_datetime(data['release_year'].astype(str) + '-01-01')

# Calculate the difference in years
years_difference = (date_added - release_date).dt.days/365.25

# Find the maximum years difference
max_years_difference = years_difference.max()

# Get the index of the maximum difference
max_index = years_difference.idxmax()

# Retrieve the corresponding title, date_added, and release_year
max_entry = data.iloc[max_index]
title = max_entry['title']
year_added = max_entry['date_added']
release_year = max_entry['release_year']

# Display the results
print(f"Title: {title}")
print(f"Date Added: {year_added}")
print(f"Release Year: {release_year}")
print(f"Maximum Years Difference: {round(max_years_difference, 1)} Years")


Title: Pioneers: First Women Filmmakers*
Date Added: 12/30/2018
Release Year: 1925
Maximum Years Difference: 94.0 Years


- This section identifies the movie with the maximum difference in years between its release and when it was added to Netflix.

- It provides details about the title, date added, release year, and the maximum years difference, highlighting significant delays in content availability.

- Here Maximum difference is of 94 years. 

**Minimum Years Difference**

In [None]:
# Convert 'date_added' to datetime
data['date_added'] = pd.to_datetime(data['date_added'])

# Create a release date from 'release_year' by converting it to a datetime format
data['release_date'] = pd.to_datetime(data['release_year'].astype(str) + '-01-01')

# Calculate the difference in years using days/365.25 to account for leap years
years_difference = (data['date_added'] - data['release_date']).dt.days / 365.25

# Filter for rows where the difference is greater than 0
positive_difference = years_difference[years_difference > 0]

# Find the minimum years difference from the filtered results
min_years_difference = positive_difference.min()

# Get the index of the minimum difference
min_index = positive_difference.idxmin()

# Retrieve the corresponding title, date_added, and release_year
min_entry = data.iloc[min_index]
title = min_entry['title']
year_added = min_entry['date_added']
release_year = min_entry['release_year']

# Display the results
print(f"Title: {title}")
print(f"Date Added: {year_added}")
print(f"Release Year: {release_year}")
print(f"Minimum Years Difference: {round(min_years_difference, 1)} Years")

Title: Sex, Explained
Date Added: 2020-01-02 00:00:00
Release Year: 2020
Minimum Years Difference: 0.0 Years


- Similar to the maximum years difference, this part finds the movie with the least difference in years, indicating how quickly some content is added to Netflix after its release.

- Analysis says that minimum differnce is of 0 years, means it got added to netflix just same year of release.

**Negative Difference Rows**

In [37]:
# Convert 'date_added' to datetime
date_added = pd.to_datetime(data['date_added'])

# Create a release date from 'release_year' by converting it to a datetime format
release_date = pd.to_datetime(data['release_year'].astype(str) + '-01-01')

# Calculate the difference in years using days/365.25 to account for leap years
years_difference = (date_added - release_date).dt.days / 365.25

# Find rows where the difference is less than 0
negative_difference_rows = data[years_difference < 0]

# Display the rows with negative differences
negative_difference_rows

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Duration_Movies,release_date
1551,s1552,TV Show,Hilda,Unknown,"Bella Ramsey, Ameerah Falzon-Ojo, Oliver Nelso...","United Kingdom, Canada, United States",12/14/2020,2021,TV-Y7,2 Seasons,Kids' TV,"Fearless, free-spirited Hilda finds new friend...",,2021-01-01
1696,s1697,TV Show,Polly Pocket,Unknown,"Emily Tennant, Shannon Chan-Kent, Kazumi Evans...","Canada, United States, Ireland",11/15/2020,2021,TV-Y,2 Seasons,Kids' TV,After uncovering a magical locket that allows ...,,2021-01-01
2920,s2921,TV Show,Love Is Blind,Unknown,"Nick Lachey, Vanessa Lachey",United States,2/13/2020,2021,TV-MA,1 Season,"Reality TV, Romantic TV Shows",Nick and Vanessa Lachey host this social exper...,,2021-01-01
3168,s3169,TV Show,Fuller House,Unknown,"Candace Cameron Bure, Jodie Sweetin, Andrea Ba...",United States,12/6/2019,2020,TV-PG,5 Seasons,TV Comedies,The Tanner family’s adventures continue as DJ ...,,2020-01-01
3287,s3288,TV Show,Maradona in Mexico,Unknown,Diego Armando Maradona,"Argentina, United States, Mexico",11/13/2019,2020,TV-MA,1 Season,"Docuseries, Spanish-Language TV Shows","In this docuseries, soccer great Diego Maradon...",,2020-01-01
3369,s3370,TV Show,BoJack Horseman,Unknown,"Will Arnett, Aaron Paul, Amy Sedaris, Alison B...",United States,10/25/2019,2020,TV-MA,6 Seasons,TV Comedies,Meet the most beloved sitcom horse of the '90s...,,2020-01-01
3433,s3434,TV Show,The Hook Up Plan,Unknown,"Marc Ruchmann, Zita Hanrot, Sabrina Ouazani, J...",France,10/11/2019,2020,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...","When Parisian Elsa gets hung up on her ex, her...",,2020-01-01
4844,s4845,TV Show,Unbreakable Kimmy Schmidt,Unknown,"Ellie Kemper, Jane Krakowski, Tituss Burgess, ...",United States,5/30/2018,2019,TV-14,4 Seasons,TV Comedies,When a woman is rescued from a doomsday cult a...,,2019-01-01
4845,s4846,TV Show,Arrested Development,Unknown,"Jason Bateman, Portia de Rossi, Will Arnett, M...",United States,5/29/2018,2019,TV-MA,5 Seasons,TV Comedies,It's the Emmy-winning story of a wealthy famil...,,2019-01-01
5394,s5395,Movie,Hans Teeuwen: Real Rancour,Doesjka van Hoogdalem,Hans Teeuwen,Netherlands,7/1/2017,2018,TV-MA,86 min,Stand-Up Comedy,Comedian Hans Teeuwen rebels against political...,86.0,2018-01-01


- This analysis identifies movies that were added to Netflix after their theatrical release, showing rows where the date added is earlier than the release year.

- It highlights instances where the content was available on Netflix before its official release in theaters.