<center><img src="redpopcorn.jpg"></center>

**Netflix**! What started in 1997 as a DVD rental service has since exploded into one of the largest entertainment and media companies.

Given the large number of movies and series available on the platform, it is a perfect opportunity to flex your exploratory data analysis skills and dive into the entertainment industry.

You work for a production company that specializes in nostalgic styles. You want to do some research on movies released in the 1990's. You'll delve into Netflix data and perform exploratory data analysis to better understand this awesome movie decade!

You have been supplied with the dataset `netflix_data.csv`, along with the following table detailing the column names and descriptions. Feel free to experiment further after submitting!

## The data
### **netflix_data.csv**
| Column | Description |
|--------|-------------|
| `show_id` | The ID of the show |
| `type` | Type of show |
| `title` | Title of the show |
| `director` | Director of the show |
| `cast` | Cast of the show |
| `country` | Country of origin |
| `date_added` | Date added to Netflix |
| `release_year` | Year of Netflix release |
| `duration` | Duration of the show in minutes |
| `description` | Description of the show |
| `genre` | Show genre |

In [1]:
# Importing pandas and Numpy

import pandas as pd
import numpy as np


netflix_df = pd.read_csv("netflix_data.csv")
netflix_df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,duration,description,genre
0,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,93,After a devastating earthquake hits Mexico Cit...,Dramas
1,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,78,"When an army recruit is found dead, his fellow...",Horror Movies
2,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,80,"In a postapocalyptic world, rag-doll robots hi...",Action
3,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,123,A brilliant group of students become card-coun...,Dramas
4,s6,TV Show,46,Serdar Akar,"Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan...",Turkey,"July 1, 2017",2016,1,A genetics professor experiments with a treatm...,International TV


In [3]:
netflix_df.shape[0]  # Number of rows
netflix_df.shape[1]  # Number of columns

print(f" Netflix_df has {netflix_df.shape[0]} rows and {netflix_df.shape[1]} columns" )


 Netflix_df has 4812 rows and 11 columns


In [9]:
netflix_df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'duration', 'description', 'genre'],
      dtype='object')

In [11]:
netflix_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4812 entries, 0 to 4811
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       4812 non-null   object
 1   type          4812 non-null   object
 2   title         4812 non-null   object
 3   director      4812 non-null   object
 4   cast          4812 non-null   object
 5   country       4812 non-null   object
 6   date_added    4812 non-null   object
 7   release_year  4812 non-null   int64 
 8   duration      4812 non-null   int64 
 9   description   4812 non-null   object
 10  genre         4812 non-null   object
dtypes: int64(2), object(9)
memory usage: 413.7+ KB


#  We just need to use columns "type", "release_year", "duration", and "genre" To answer the first and second question,

# 1) What was the most frequent movie duration in the 1990s? 

# Save an approximate answer as an integer called duration (use 1990 as the decade's start year).

* First let us see how many categories the type column has

In [20]:
# Count how many times each unique value appears

val_counts_type = netflix_df["type"].value_counts()
print(val_counts_type)



type
Movie      4677
TV Show     135
Name: count, dtype: int64


*  First filter:  Filter rows where `type` = 'Movie'

*  This gives you a dataframe containing only movies.

In [23]:
movies_df = netflix_df[netflix_df["type"] == "Movie"]
movies_df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,duration,description,genre
0,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,93,After a devastating earthquake hits Mexico Cit...,Dramas
1,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,78,"When an army recruit is found dead, his fellow...",Horror Movies
2,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,80,"In a postapocalyptic world, rag-doll robots hi...",Action
3,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,123,A brilliant group of students become card-coun...,Dramas
5,s7,Movie,122,Yasir Al Yasiri,"Amina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed...",Egypt,"June 1, 2020",2019,95,"After an awful accident, a couple admitted to ...",Horror Movies
...,...,...,...,...,...,...,...,...,...,...,...
4807,s7779,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,88,Looking to survive in a world taken over by zo...,Comedies
4808,s7781,Movie,Zoo,Shlok Sharma,"Shashank Arora, Shweta Tripathi, Rahul Kumar, ...",India,"July 1, 2018",2018,94,A drug dealer starts having doubts about his t...,Dramas
4809,s7782,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,88,"Dragged from civilian life, a former superhero...",Children
4810,s7783,Movie,Zozo,Josef Fares,"Imad Creidi, Antoinette Turk, Elias Gergi, Car...",Sweden,"October 19, 2020",2005,99,When Lebanon's Civil War deprives Zozo of his ...,Dramas


* Check unique values in the "type" column to confirm only movies are included


In [25]:
val_counts_movie = movies_df["type"]. value_counts()
val_counts_movie

type
Movie    4677
Name: count, dtype: int64

* Second filter (Numpy filter): Filter rows where "release_year" = 1990

In [30]:
movies_90_df = movies_df[np.logical_and(movies_df["release_year"] >= 1990, movies_df["release_year"] <2000)]
movies_90_df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,duration,description,genre
6,s8,Movie,187,Kevin Reynolds,"Samuel L. Jackson, John Heard, Kelly Rowan, Cl...",United States,"November 1, 2019",1997,119,After one of his high school students attacks ...,Dramas
118,s167,Movie,A Dangerous Woman,Stephen Gyllenhaal,"Debra Winger, Barbara Hershey, Gabriel Byrne, ...",United States,"April 1, 2018",1993,101,At the center of this engrossing melodrama is ...,Dramas
145,s211,Movie,A Night at the Roxbury,John Fortenberry,"Will Ferrell, Chris Kattan, Dan Hedaya, Molly ...",United States,"December 1, 2019",1998,82,"After a run-in with Richard Grieco, dimwits Do...",Comedies
167,s239,Movie,A Thin Line Between Love & Hate,Martin Lawrence,"Martin Lawrence, Lynn Whitfield, Regina King, ...",United States,"December 1, 2020",1996,108,When a philandering club promoter sets out to ...,Comedies
194,s274,Movie,Aashik Awara,Umesh Mehra,"Saif Ali Khan, Mamta Kulkarni, Mohnish Bahl, S...",India,"June 1, 2017",1993,154,"Raised by a kindly thief, orphaned Jimmy goes ...",Dramas
...,...,...,...,...,...,...,...,...,...,...,...
4672,s7536,Movie,West Beirut,Ziad Doueiri,"Rami Doueiri, Mohamad Chamas, Rola Al Amin, Ca...",France,"October 19, 2020",1999,106,Three intrepid teens roam the streets of Beiru...,Dramas
4689,s7571,Movie,What's Eating Gilbert Grape,Lasse Hallström,"Johnny Depp, Leonardo DiCaprio, Juliette Lewis...",United States,"January 1, 2021",1993,118,"In a backwater Iowa town, young Gilbert is tor...",Classic Movies
4718,s7624,Movie,Wild Wild West,Barry Sonnenfeld,"Will Smith, Kevin Kline, Kenneth Branagh, Salm...",United States,"January 1, 2020",1999,106,"Armed with an ingenious arsenal, two top-notch...",Action
4746,s7682,Movie,Wyatt Earp,Lawrence Kasdan,"Kevin Costner, Dennis Quaid, Gene Hackman, Dav...",United States,"January 1, 2020",1994,191,Legendary lawman Wyatt Earp is continually at ...,Action


* Count how many movies came out each year (1990s subset)


In [33]:
val_counts_90 = movies_90_df["release_year"].value_counts()
val_counts_90

release_year
1997    26
1998    26
1999    26
1993    16
1995    16
1992    16
1996    15
1990    14
1991    14
1994    14
Name: count, dtype: int64

* The most frequent movie duration in the 1990s?

In [45]:
# If you need just one number (like for this assignment), use:

duration = movies_90_df["duration"].mode()[0]

print(f" The most frequent movie duration in the 1990s is {duration} ")


 The most frequent movie duration in the 1990s is 94 


# Alternative solution using Pandas boolean operators (& and |)


In [48]:
duration = int(netflix_df[
                  (netflix_df["type"] == "Movie") & 
                  (netflix_df["release_year"] >=1990) & 
                  (netflix_df["release_year"] < 2000)
               ]["duration"].mode()[0])
print(f" The most frequent movie duration in the 1990s is {duration}")

 The most frequent movie duration in the 1990s is 94


# 2) A movie is considered short if it is less than 90 minutes.

# Count the number of short action movies released in the 1990s and save this integer as short_movie_count.

# Filter short action movies using NumPy logical_and

In [89]:

short_movie = movies_90_df[np.logical_and(movies_90_df["genre"] == "Action", movies_90_df["duration"] < 90)]
short_movie
#print(type(short_movie))

short_movie_count = len(short_movie)
                        
print(f" The number of short action movies released in the 1990s is {short_movie_count}")

 The number of short action movies released in the 1990s is 7


# Alternative solution using Pandas boolean operators (& and |)

In [54]:
short_movie_count = len(netflix_df[
                  (netflix_df["type"] == "Movie") & 
                   (netflix_df["genre"] == "Action") &
                  (netflix_df["release_year"] >=1990) & 
                  (netflix_df["release_year"] < 2000) &
                  (netflix_df["duration"] < 90)
               ])
print(f" The number of short action movies released in the 1990s is {short_movie_count}")

 The number of short action movies released in the 1990s is 7


# Another way to solve

In [78]:
short_movie_count = movies_90_df[
                                  (movies_90_df["genre"] == "Action") &
                                  (movies_90_df["duration"] < 90)].shape[0]

print(f" The number of short action movies released in the 1990s is {short_movie_count}")

 The number of short action movies released in the 1990s is 7


# Another way to solve

In [72]:
short_movie_count = len(movies_90_df[
                                  (movies_90_df["genre"] == "Action") &
                                  (movies_90_df["duration"] < 90)])

print(f" The number of short action movies released in the 1990s is {short_movie_count}")

 The number of short action movies released in the 1990s is 7


# Investigating Netflix Movies

Exploratory data analysis of Netflix 1990s movies to identify trends in duration and short action films using Python and pandas.

---

## Project Overview
This project was completed as part of my DataCamp coursework after finishing **Introduction to Python** and **Intermediate Python**.  
The goal is to analyze a dataset (`netflix_data.csv`) and answer key questions about movies from the 1990s decade:  

1. What was the most frequent movie duration in the 1990s?  
2. How many **short action movies** (less than 90 minutes) were released in the 1990s?

---

## Skills Demonstrated
- Python programming fundamentals  
- Data manipulation with **pandas**  
- Exploratory Data Analysis (EDA)  
- Filtering and grouping data  
- Basic statistics and counting

---

## Dataset
- **File:** `netflix_data.csv`  
- **Source:** Provided by DataCamp for educational purposes  
- **Details:** Includes information about Netflix movies and TV shows (title, type, duration, release year, etc.)

---

## How to Run This Project

### Requirements
- Python 3.8 or higher
- pandas
- matplotlib (optional, for visualizations)
- Jupyter Notebook (optional, for interactive use)

### Steps
```bash
# Clone the repository
git clone https://github.com/yourusername/investigating-netflix-movies.git

# Go to the project folder
cd investigating-netflix-movies

# (Optional) Create a virtual environment and activate it

# Install dependencies
pip install pandas matplotlib jupyter
