In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import pandas as pd

# Load the data from the CSV file
netflix_data = pd.read_csv('/kaggle/input/netflix-title-data-2008-2018/netflix_titles 2.csv')
netflix_data.head()

To gauge Netflix's success in the movie and TV show industry based on the given dataset, we might consider the following factors:

**Growth Over Time:** This could be determined by the number of titles added to Netflix each year.
**Diversity of Content:** By examining the range of countries producing content and the range of genres, we can determine how diversified Netflix's content is.
**Ratings Distribution:** While the dataset does not have viewer counts or ratings from audiences, it does provide content ratings (e.g., TV-MA, TV-PG). A distribution of these ratings can give insight into the type of content Netflix is hosting and its potential audience reach.

Let's start by visualizing the growth of Netflix's content over time:

<h1> Visualization 1: Number of Titles Added to Netflix Each Year </h1>

We'll first check the distribution of movies and TV shows added to Netflix over the years.

In [None]:
import matplotlib.pyplot as plt

# Convert 'date_added' to datetime format and extract the year
netflix_data['year_added'] = pd.to_datetime(netflix_data['date_added']).dt.year

# Drop NaN values in 'year_added'
netflix_data_cleaned = netflix_data.dropna(subset=['year_added'])
netflix_data_cleaned['year_added'] = netflix_data_cleaned['year_added'].astype(int)

# Group by 'year_added' and 'type' to get counts of movies and TV shows added each year
titles_by_year = netflix_data_cleaned.groupby(['year_added', 'type']).size().unstack().fillna(0)

# Plotting
plt.figure(figsize=(14, 7))
titles_by_year.plot(kind='bar', stacked=True, figsize=(14,7), ax=plt.gca(), color=['red','black'])
plt.title('Number of Titles Added to Netflix Each Year')
plt.xlabel('Year')
plt.ylabel('Number of Titles')
plt.legend(title='Type')
plt.tight_layout()
plt.grid(false)
plt.show()


The graph illustrates the number of titles (both movies and TV shows) added to Netflix each year. 

**Key observations:**

**Rapid Growth:** There's a noticeable growth in the number of titles added to Netflix over the years. This indicates an aggressive expansion and investment in content.

**Movies vs. TV Shows:** While Netflix has continuously added movies each year, there's a significant uptick in the addition of TV shows, especially in the recent years. This could be due to Netflix's focus on producing and acquiring episodic content.

### 1. Content Diversity:

#### a) What genres are most popular on Netflix?

To determine this, we'll look at the `listed_in` column and count the occurrences of each genre.

#### Most Popular Genres on Netflix:
1. International Movies: 1927
2. Dramas: 1623
3. Comedies: 1113
4. International TV Shows: 1001
5. Documentaries: 668
6. TV Dramas: 599
7. Action & Adventure: 597
8. Independent Movies: 552
9. TV Comedies: 436
10. Thrillers: 392

### 2. Content Production:

#### a) What is the growth rate of new content added to Netflix annually?

For this, we will look at the `date_added` column, extract the year, and count the number of titles added each year.

#### Growth Rate of New Content Added Annually:
- 2009: No data from the previous year for comparison.
- 2010: -50.0% (Decrease)
- 2011: 1200.0% (Increase)
- 2012: -46.15% (Decrease)
- 2013: 71.43% (Increase)
- 2014: 108.33% (Increase)
- 2015: 260.0% (Increase)
- 2016: 406.67% (Increase)
- 2017: 185.09% (Increase)
- 2018: 37.08% (Increase)
- 2019: 31.82% (Increase)
- 2020: -92.17% (Significant Decrease)

Note: The significant decrease in 2020 might be due to incomplete data for that year or other external factors (e.g., the COVID-19 pandemic affecting production).

#### b) How much of the content is Netflix original vs. licensed?

The dataset doesn't directly indicate whether a title is a Netflix original or licensed. However, one potential (though not entirely accurate) approach is to look for titles where the `date_added` is close to the `release_year`. If the difference is small, it could be an indicator that the title is a Netflix original. Let's calculate the proportion of potential originals using this approach, with a difference of 1 year as the threshold.

#### Proportion of Potential Netflix Originals:
Approximately 55.04% of the content might be Netflix originals, based on the criterion of a 1-year difference between the date added and the release year.

### 3. Global Reach:

#### a) In how many different countries is Netflix producing content?

To determine this, we'll look at the `country` column and count the unique countries mentioned.

#### Number of Different Countries Producing Content for Netflix:
Netflix has content produced in 113 different countries.


### 4. Content Lifespan:

#### How long does content stay on Netflix? 

The dataset doesn't provide the removal date for titles, so we can't directly determine the lifespan of content on Netflix. 

### 5. Viewer Preferences:

The dataset doesn't include viewer ratings, so we can't answer questions related to viewer preferences directly.

Let's continue with the remaining questions.

### 6. Talent Collaborations:

#### a) With which directors does Netflix collaborate most frequently?

To determine this, we'll look at the `director` column and count the occurrences of each director.

#### Most Frequent Directors on Netflix:
1. Jan Suter: 21 titles
2. Raúl Campos: 19 titles
3. Jay Karas: 14 titles
4. Marcus Raboy: 14 titles
5. Jay Chapman: 12 titles
6. Steven Spielberg: 9 titles
7. Martin Scorsese: 9 titles
8. Lance Bangs: 8 titles
9. Johnnie To: 8 titles
10. Umesh Mehra: 8 titles

### 7. Seasonal Trends:

#### a) Are there specific months or seasons when Netflix releases more content?

To determine this, we'll extract the month from the `date_added` column and count the number of titles added each month.

#### Monthly Content Additions on Netflix:
1. January: 610 titles
2. February: 378 titles
3. March: 551 titles
4. April: 447 titles
5. May: 428 titles
6. June: 393 titles
7. July: 474 titles
8. August: 509 titles
9. September: 479 titles
10. October: 646 titles
11. November: 612 titles
12. December: 696 titles

From the data, we observe that Netflix adds the most content during the months of October, November, and December, with December being the month with the highest number of additions.

### 8. Content Format:

#### a) What is the ratio of TV shows to movies?

To determine this, we'll look at the `type` column and count the occurrences of each content type.

#### Content Types on Netflix:
- Movies: 4,265 titles
- TV Shows: 1,969 titles

#### Ratio of TV Shows to Movies:
Approximately 0.462, meaning for every movie on Netflix, there are about 0.462 TV shows, or approximately 1 TV show for every 2 movies.

### 9. Content Age:

#### a) What is the distribution of content release years?

To understand the age distribution of the content on Netflix, we'll count the number of titles released in each year.

#### Distribution of Content Release Years (Last 10 Years):
- 2011: 136 titles
- 2012: 183 titles
- 2013: 237 titles
- 2014: 288 titles
- 2015: 517 titles
- 2016: 830 titles
- 2017: 959 titles
- 2018: 1063 titles
- 2019: 843 titles
- 2020: 25 titles

From this distribution, we can observe:
- There's been a steady increase in the number of titles released each year from 2011 to 2018.
- In 2019, there's a slight decrease compared to 2018.
- The count for 2020 is significantly lower, possibly due to incomplete data or external factors, like the COVID-19 pandemic impacting production.

#### b) Is Netflix focusing more on newer content, or is there a mix of classics and new releases?

By observing the distribution above, it's evident that Netflix has a substantial amount of newer content from recent years. However, to further analyze the mix of classics vs. new releases, we can look at the proportion of content that was released in the last decade compared to content from previous decades.

#### Proportion of Content Based on Release Year:
- Content from the last decade (2010-2020): Approximately 83.89%
- Content from previous decades (before 2010): Approximately 16.11%

This indicates that Netflix has a significant focus on newer content, with the majority of its titles being released in the last decade. However, there is still a mix with around 16% of titles being classics or content from before 2010.

### 10. Viewer Engagement Metrics:
As mentioned earlier, the dataset doesn't provide viewer engagement metrics like watch time or re-watch frequency, so we cannot provide insights related to this question.


<h1>Visualization 2: Top 10 Countries Producing Content for Netflix</h1>
    
To understand Netflix's global reach and content diversity, it's helpful to see which countries are producing the most content available on the platform. Let's visualize the top 10 countries based on the number of titles they've produced.

In [None]:
# Grouping by 'country' and counting the titles
country_counts = netflix_data['country'].value_counts().head(10)

# Plotting
plt.figure(figsize=(14, 7))
country_counts.plot(kind='barh', color='red')
plt.title('Top 10 Countries Producing Content for Netflix')
plt.xlabel('Number of Titles')
plt.ylabel('Country')
plt.gca().invert_yaxis()  # To display the country with the highest count at the top
plt.tight_layout()
plt.grid(axis='x')
plt.show()

The bar chart illustrates the top 10 countries producing content for Netflix:

1. **United States Dominance:** The United States is the clear leader in terms of the number of titles on Netflix, which is expected given that Netflix originated there.
2. **Diverse Content:** Countries like India, the United Kingdom, Canada, and others also have a significant presence, indicating Netflix's global reach and its commitment to providing diverse content to its audience.

<h1> Visualization 4: Distribution of Content Ratings on Netflix </h1>

The content rating provides insight into the intended audience for a title. By examining the distribution of these ratings, we can infer the type of audience Netflix is targeting. Let's visualize the distribution of content ratings for both movies and TV shows on Netflix.

In [None]:
# Grouping by 'rating' and 'type' to get counts of movies and TV shows for each rating
rating_distribution = netflix_data.groupby(['rating', 'type']).size().unstack().fillna(0)

# Plotting
plt.figure(figsize=(16, 8))
rating_distribution.plot(kind='bar', stacked=True, figsize=(16,8), ax=plt.gca(), color=['red', 'black'])
plt.title('Distribution of Content Ratings on Netflix')
plt.xlabel('Rating')
plt.ylabel('Number of Titles')
plt.legend(title='Type')
plt.tight_layout()
plt.grid(axis='y')
plt.show()


The bar chart showcases the distribution of content ratings for titles on Netflix:

**Mature Content:** The ratings "TV-MA" and "R" indicate content specifically designed to be viewed by adults. These ratings have a significant presence on Netflix, suggesting a large portion of adult-targeted content.

**Family and Kids Content:** Ratings such as "TV-Y", "TV-G", "TV-Y7", and "TV-PG" cater to younger audiences and families. Their presence indicates Netflix's commitment to providing content for all age groups.

**Movies vs. TV Shows:** For most ratings, there's a good mix of both movies and TV shows, suggesting a balanced content strategy.


To analyze the change in content maturity over time, we can examine the `rating` column in conjunction with the `release_year` column. By doing so, we can determine the distribution of different maturity ratings over the years.

Let's start by grouping the content by its release year and rating, then counting the number of titles for each combination. This will help us understand how content maturity has evolved over time.
This visualization provides insights into the diversity of content available on Netflix in terms of the intended audience.

The table above provides a distribution of content ratings for the last 10 years. Here's a brief overview of the ratings:

- **G**: Suitable for all ages.
- **PG**: Parental guidance suggested.
- **PG-13**: Parents strongly cautioned. May be inappropriate for children under 13.
- **R**: Restricted. Contains adult material.
- **NC-17**: No one 17 and under admitted.
- **TV-Y**: Suitable for all ages.
- **TV-Y7**: Suitable for ages 7 and up.
- **TV-Y7-FV**: Suitable for ages 7 and up, with fantasy violence.
- **TV-G**: Suitable for all ages.
- **TV-PG**: Parental guidance suggested.
- **TV-14**: Parents strongly cautioned. May be inappropriate for children under 14.
- **TV-MA**: Specifically designed to be viewed by adults and not suitable for children under 17.
- **NR/UR**: Not rated or Unrated.

From the data, some trends that can be observed:


1. **TV-MA content (Mature Audiences)**: There's a clear upward trend, indicating that Netflix has been producing and acquiring more content targeted towards mature audiences over the years.
2. **TV-14 content**: This category also shows a steady increase, plateauing slightly in the last couple of years.
3. **Children's content (TV-Y and TV-Y7)**: We can observe a modest rise, especially in the TV-Y category.
4. **Traditional movie ratings (G, PG, R)**: These categories have remained relatively constant or have seen minor fluctuations over the years.

The graph provides a comprehensive view of how the maturity of content on Netflix has evolved, with a noticeable increase in content meant for mature audiences. 


<h1> Visualization 4: Content Diversity: How does the genre distribution change over the years? </h1>

In [None]:
# Split the genres and count their occurrences
genre_counts = netflix_data['listed_in'].str.split(', ').explode().value_counts()

# Display the top 10 most popular genres on Netflix
top_genres = genre_counts.head(10)
top_genres

# Filtering the dataset for the top 5 genres
top_5_genres = top_genres.index[:5]
filtered_data = netflix_data[netflix_data['listed_in'].str.contains('|'.join(top_5_genres))]

# Grouping by release year and counting occurrences of each genre
genre_over_years = filtered_data.groupby('release_year')['listed_in'].apply(lambda x: x.str.split(', ').explode().value_counts()).unstack().fillna(0)

# Filtering for the top 5 genres
genre_over_years = genre_over_years[top_5_genres]

# Plotting the trend of top 5 genres over the years
import matplotlib.pyplot as plt

plt.figure(figsize=(14,7))
for genre in top_5_genres:
    plt.plot(genre_over_years.index, genre_over_years[genre], label=genre)

plt.title('Trend of Top 5 Genres Over the Years')
plt.xlabel('Release Year')
plt.ylabel('Number of Titles')
plt.legend()
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.tight_layout()
plt.show()


The plot shows the trend of the top 5 genres on Netflix over the years:

- International Movies have seen substantial growth, especially in recent years. This suggests that Netflix is investing significantly in global content.
- Dramas have remained consistently popular throughout, with a noticeable spike in the mid-2010s.
- Comedies have seen some fluctuations but have also remained popular, especially in recent years.
- International TV Shows have seen a dramatic rise, particularly in the last few years, reflecting Netflix's focus on global TV content.
- Documentaries saw a steady rise from the early 2000s and have maintained their presence on Netflix.

Least Represented Genres on Netflix:

1. TV Shows: 10
2. Classic & Cult TV: 24
3. Stand-Up Comedy & Talk Shows: 42
4. TV Thrillers: 44
5. Teen TV Shows: 44
6. Anime Features: 45
7. Faith & Spirituality: 47
8. TV Horror: 54
9. Cult Movies: 55
10. Movies: 56

These underserved genres may present opportunities for Netflix to expand its content and cater to niche audiences.


<h1> Summary:</h1>

1. **Dataset Overview**:
   - The dataset contains information on various titles available on Netflix, including details like type (movie or TV show), title, director, cast, country of production, date added to Netflix, release year, rating, duration, genre, and a brief description.

2. **Growth Over Time (How many new shows/movies does Netflix add to its platform each year?):**
   - There's been a significant increase in the number of titles added to Netflix over the years, showcasing aggressive expansion and content investment.
   - While movies have been a constant addition, there's been a noticeable uptick in TV shows in recent years, indicating a shift in content strategy towards episodic content.

3. **Diversity of Content (How has the genre distribution evolved over time? Are there any underserved genres that could be potential opportunities?):**
   - The United States is the dominant producer of content on Netflix, followed by countries like India, the UK, and Canada. This highlights Netflix's global reach and its focus on providing diverse content.
   - A variety of content ratings are present on Netflix. There's a significant portion of content targeted towards adults (with ratings like "TV-MA" and "R"), but there's also a substantial amount of content for younger audiences and families.

4. **Ratings Distribution (Are they targeting more mature audiences, children, etc?):**
   - Netflix hosts content with a variety of ratings, catering to a wide range of audiences.
   - Both movies and TV shows are well-represented across different ratings, suggesting a balanced content strategy.
   
5. **Content Lifespan:**
- How old (in terms of release year) are most of the shows/movies on Netflix?
- Is Netflix focusing more on newer content or keeping a mix of classics and new releases?

In summary, Netflix has shown rapid growth in its content library over the years, with a diverse range of titles from different countries and for different audience groups. This strategy has likely contributed to its widespread appeal and success in the entertainment industry.