<center><img src="redpopcorn.jpg"></center>

**Netflix**! What started in 1997 as a DVD rental service has since exploded into one of the largest entertainment and media companies.

Given the large number of movies and series available on the platform, it is a perfect opportunity to flex your exploratory data analysis skills and dive into the entertainment industry.

You work for a production company that specializes in nostalgic styles. You want to do some research on movies released in the 1990's. You'll delve into Netflix data and perform exploratory data analysis to better understand this awesome movie decade!

You have been supplied with the dataset `netflix_data.csv`, along with the following table detailing the column names and descriptions. Feel free to experiment further after submitting!

## The data
### **netflix_data.csv**
| Column | Description |
|--------|-------------|
| `show_id` | The ID of the show |
| `type` | Type of show |
| `title` | Title of the show |
| `director` | Director of the show |
| `cast` | Cast of the show |
| `country` | Country of origin |
| `date_added` | Date added to Netflix |
| `release_year` | Year of Netflix release |
| `duration` | Duration of the show in minutes |
| `description` | Description of the show |
| `genre` | Show genre |

# EXPLORATORY DATA ANALYSIS ON THE NETFLIX MOVIES DATA

In [1]:
# Importing pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt

# Read in the Netflix CSV as a DataFrame
netflix_df = pd.read_csv("C:/Users/uwicl/Documents/00_DSTI/Courses/05_Python/Investigating-Netflix-Movies/01_Exploring Netflix Movies/netflix_data.csv")

### A. QUICK VIEW ON NETFLIX DATASET

In [2]:
# having a quick view the dataset from first 03 rows 
netflix_df.head(3)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,duration,description,genre
0,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,93,After a devastating earthquake hits Mexico Cit...,Dramas
1,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,78,"When an army recruit is found dead, his fellow...",Horror Movies
2,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,80,"In a postapocalyptic world, rag-doll robots hi...",Action


In [3]:
# having a quick view the dataset from last 03 rows 
netflix_df.tail(3)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,duration,description,genre
4809,s7782,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,88,"Dragged from civilian life, a former superhero...",Children
4810,s7783,Movie,Zozo,Josef Fares,"Imad Creidi, Antoinette Turk, Elias Gergi, Car...",Sweden,"October 19, 2020",2005,99,When Lebanon's Civil War deprives Zozo of his ...,Dramas
4811,s7784,Movie,Zubaan,Mozez Singh,"Vicky Kaushal, Sarah-Jane Dias, Raaghav Chanan...",India,"March 2, 2019",2015,111,A scrappy but poor boy worms his way into a ty...,Dramas


In [4]:
# Basic statistical analysis on the dataset
netflix_df.describe()

Unnamed: 0,release_year,duration
count,4812.0,4812.0
mean,2012.711554,99.566708
std,9.517978,30.889305
min,1942.0,1.0
25%,2011.0,88.0
50%,2016.0,99.0
75%,2018.0,116.0
max,2021.0,253.0


In [5]:
# checking for the dimension of the dataset
netflix_df.shape

(4812, 11)

## B. DATA PREPARATION

In [6]:
# checking the data types
netflix_df.dtypes

show_id         object
type            object
title           object
director        object
cast            object
country         object
date_added      object
release_year     int64
duration         int64
description     object
genre           object
dtype: object

In [7]:
# checking for the features infomation
netflix_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4812 entries, 0 to 4811
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       4812 non-null   object
 1   type          4812 non-null   object
 2   title         4812 non-null   object
 3   director      4812 non-null   object
 4   cast          4812 non-null   object
 5   country       4812 non-null   object
 6   date_added    4812 non-null   object
 7   release_year  4812 non-null   int64 
 8   duration      4812 non-null   int64 
 9   description   4812 non-null   object
 10  genre         4812 non-null   object
dtypes: int64(2), object(9)
memory usage: 413.7+ KB


In [8]:
# Remove any leading or trailing spaces at the columns names for easy access late
netflix_df.columns = netflix_df.columns.str.strip()

In [9]:
# checking for any duplicate 
netflix_df.duplicated().any()

False

In [10]:
# Checking for null values
netflix_df.isna().any()

show_id         False
type            False
title           False
director        False
cast            False
country         False
date_added      False
release_year    False
duration        False
description     False
genre           False
dtype: bool

## C. RESPONDING TO QUESTIONS IN THE NETFLIX MOVIE EXPLORATION ANALYSIS  

##### C.1 What was the most frequent movie duration in the 1990s? Save an approximate answer as an integer called duration (use 1990 as the decade's start year).

In [11]:
# Cleaning the 'type' column by replacing non-alphabetic characters with a single space
netflix_df['type']=netflix_df['type'].str.replace('[^a-zA-Z]',' ')

In [12]:
# Converting 'type' column values to lowercase
netflix_df['type']=netflix_df['type'].str.lower()

In [13]:
# Creating a subset containing only movies
netflix_movies=netflix_df[netflix_df['type']== "movie"]

In [14]:
# Filtering movies released between 1990 and 1999
netflix_movies_1990s=netflix_movies[(netflix_movies["release_year"]>=1990)\
                                       & (netflix_movies["release_year"]<=1999)]

In [15]:
# Retrieving the most frequent movie duration in the 1990s
duration=netflix_movies_1990s['duration'].value_counts().idxmax().astype(int)
print(f" The most frequent movie duration in the 1990s is {duration}")

 The most frequent movie duration in the 1990s is 94


##### C.2 A movie is considered short if it is less than 90 minutes. Count the number of short action movies released in the 1990s and save this integer as short_movie_count

In [16]:
# Cleaning the 'genre' column by replacing non-alphabetic characters with a single space
netflix_movies_1990s.loc[:,'genre']=netflix_movies_1990s['genre'].str.replace('[^a-zA-Z]',' ')

In [17]:
# Converting 'Genre' column values to lowercase 
netflix_movies_1990s.loc[:,'genre']=netflix_movies_1990s['genre'].str.lower()

In [18]:
# Creating a subset containing only action movies in 1990s
netflix_action_movies_1990s = netflix_movies_1990s[netflix_movies_1990s['genre'] == 'action']

In [19]:
# Retrieving action movies from the 1990s with a duration of less than 90 minutes
short_action_movies=netflix_action_movies_1990s[netflix_action_movies_1990s['duration']<90 ]

In [20]:
# Calculate the number of rows in short action movies
short_movie_count=len(short_action_movies)
print(f" The total number of short action movies is {short_movie_count}")

 The total number of short action movies is 7
