# Part 1: Basic Pandas Operations

1)Load and explore the dataset:

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np

In [2]:
# Load the dataset using Pandas
df = pd.read_csv('amazon_prime_titles.csv')

In [3]:
# Display first and last 5 rows
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,"March 30, 2021",2017,,74 min,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...
3,s4,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,"March 30, 2021",2014,,69 min,Documentary,"Pink breaks the mold once again, bringing her ..."
4,s5,Movie,Monster Maker,Giles Foster,"Harry Dean Stanton, Kieran O'Brien, George Cos...",United Kingdom,"March 30, 2021",1989,,45 min,"Drama, Fantasy",Teenage Matt Banting wants to work with a famo...


In [4]:
df.tail()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
9663,s9664,Movie,Pride Of The Bowery,Joseph H. Lewis,"Leo Gorcey, Bobby Jordan",,,1940,7+,60 min,Comedy,New York City street principles get an East Si...
9664,s9665,TV Show,Planet Patrol,,"DICK VOSBURGH, RONNIE STEVENS, LIBBY MORRIS, M...",,,2018,13+,4 Seasons,TV Shows,"This is Earth, 2100AD - and these are the adve..."
9665,s9666,Movie,Outpost,Steve Barker,"Ray Stevenson, Julian Wadham, Richard Brake, M...",,,2008,R,90 min,Action,"In war-torn Eastern Europe, a world-weary grou..."
9666,s9667,TV Show,Maradona: Blessed Dream,,"Esteban Recagno, Ezequiel Stremiz, Luciano Vit...",,,2021,TV-MA,1 Season,"Drama, Sports","The series tells the story of Diego Maradona, ..."
9667,s9668,Movie,Harry Brown,Daniel Barber,"Michael Caine, Emily Mortimer, Joseph Gilgun, ...",,,2010,R,103 min,"Action, Drama, Suspense","Harry Brown, starring two-time Academy Award w..."


In [5]:
# Print the shape, column names, and info summary of the dataset
df.shape

(9668, 12)

In [6]:
df.columns

Index(['show_id', 'type', 'title', 'director', 'cast', 'country', 'date_added',
       'release_year', 'rating', 'duration', 'listed_in', 'description'],
      dtype='object')

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9668 entries, 0 to 9667
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   show_id       9668 non-null   object
 1   type          9668 non-null   object
 2   title         9668 non-null   object
 3   director      7586 non-null   object
 4   cast          8435 non-null   object
 5   country       672 non-null    object
 6   date_added    155 non-null    object
 7   release_year  9668 non-null   int64 
 8   rating        9331 non-null   object
 9   duration      9668 non-null   object
 10  listed_in     9668 non-null   object
 11  description   9668 non-null   object
dtypes: int64(1), object(11)
memory usage: 906.5+ KB


2) Data Cleaning:

In [8]:
# Check for missing values and count them per column
df.isnull().sum()

show_id            0
type               0
title              0
director        2082
cast            1233
country         8996
date_added      9513
release_year       0
rating           337
duration           0
listed_in          0
description        0
dtype: int64

In [9]:
# Drop all rows where the title or type is missing
df.dropna(subset=['title', 'type'], inplace=True)

In [10]:
df['date_added'] = pd.to_datetime(df['date_added'].str.strip())
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9668 entries, 0 to 9667
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   show_id       9668 non-null   object        
 1   type          9668 non-null   object        
 2   title         9668 non-null   object        
 3   director      7586 non-null   object        
 4   cast          8435 non-null   object        
 5   country       672 non-null    object        
 6   date_added    155 non-null    datetime64[ns]
 7   release_year  9668 non-null   int64         
 8   rating        9331 non-null   object        
 9   duration      9668 non-null   object        
 10  listed_in     9668 non-null   object        
 11  description   9668 non-null   object        
dtypes: datetime64[ns](1), int64(1), object(10)
memory usage: 981.9+ KB


3)Filtering and indexing:

In [11]:
# Show only movies released after 2015
df[df['release_year']>2015]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,2021-03-30,2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,2021-03-30,2017,,74 min,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...
6,s7,Movie,Hired Gun,Fran Strine,"Alice Cooper, Liberty DeVitto, Ray Parker Jr.,...",United States,2021-03-30,2017,,98 min,"Documentary, Special Interest","They are the ""First Call, A-list"" musicians, j..."
7,s8,Movie,Grease Live!,"Thomas Kail, Alex Rudzinski","Julianne Hough, Aaron Tveit, Vanessa Hudgens, ...",United States,2021-03-30,2016,,131 min,Comedy,"This honest, uncompromising comedy chronicles ..."
8,s9,Movie,Global Meltdown,Daniel Gilboy,"Michael Paré, Leanne Khol Young, Patrick J. Ma...",Canada,2021-03-30,2017,,87 min,"Action, Science Fiction, Suspense",A helicopter pilot and an environmental scient...
...,...,...,...,...,...,...,...,...,...,...,...,...
9656,s9657,Movie,Anaganaga Oka Nenu,Vamshi P,"Vijay, Sweta chaudhary, NNR CHOWDARY, Hariya",,NaT,2021,18+,135 min,"Action, Drama, Suspense",Peter(hero) Who went to the forest to find a c...
9660,s9661,Movie,The Man in the Hat,"John-Paul Davidson, Stephen Warbeck","Ciaran Hinds, Stephen Dillane, Maïwenn",,NaT,2021,13+,96 min,Comedy,The Man in the Hat journeys through France in ...
9662,s9663,Movie,River,Emily Skye,"Mary Cameron Rogers, Alexandra Rose, Rob Marsh...",,NaT,2021,16+,93 min,"Drama, Science Fiction, Suspense","River is a grounded Sci-Fi mystery Thriller, t..."
9664,s9665,TV Show,Planet Patrol,,"DICK VOSBURGH, RONNIE STEVENS, LIBBY MORRIS, M...",,NaT,2018,13+,4 Seasons,TV Shows,"This is Earth, 2100AD - and these are the adve..."


In [12]:
# Filter all shows from the country India
df[df['country'].str.lower() == 'india']

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,2021-03-30,2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...
107,s108,Movie,Whoop!,Kreeti Gogia,Abish Mathew,India,NaT,2018,18+,64 min,"Arts, Entertainment, and Culture",Abish Mathew is the world's greatest stand-up ...
142,s143,Movie,Wedding Cha Shinema,Saleel Kulkarni,"Shivraj Waichal, Rucha Inamdar, Mukta Barve, P...",India,NaT,2019,ALL,138 min,"Comedy, International",An aspiring filmmaker reluctantly takes up the...
178,s179,Movie,Viswasam,Siva,"Ajith Kumar, Nayanthara",India,NaT,2019,13+,151 min,"Action, Drama, International","A village ruffian, who settles disputes in his..."
180,s181,Movie,Virus,Aashiq Abu,"Revathy, Kunchako Boban, Parvathy Thiruvoth",India,NaT,2019,ALL,149 min,"Drama, Science Fiction, Suspense","Virus is a fiction based on true events, revol..."
...,...,...,...,...,...,...,...,...,...,...,...,...
7505,s7506,Movie,The Priest (Telugu),Jofin T. Chacko,"Mammootty, Manju Warrier, Ameya Mathew, Nikhil...",India,NaT,2021,13+,145 min,Suspense,"Father Carmen, a priest, joins hands with the ..."
8366,s8367,Movie,Aashiqui 2,Mohit Suri,"Aditya Roy Kapoor, Shraddha Kapoor, Shaad Rand...",India,NaT,2013,ALL,127 min,"International, Romance",Rahul loses his fans and fame due to alcoholis...
8569,s8570,Movie,The Priest (Tamil),Jofin T. Chacko,"Mammootty, Manju Warrier, Ameya Mathew, Nikhil...",India,NaT,2021,13+,145 min,Suspense,"Father Carmen, a priest, joins hands with the ..."
8825,s8826,Movie,Sunny (4K UHD),Ranjith Sankar,Jayasurya,India,NaT,2021,PG-13,93 min,"Drama, Suspense","The life of a failed musician, sunny takes a d..."


In [13]:
# Select only Movie type entries with duration over 90 minutes
df2 = df[df['duration'].str.contains('min', na=False)]
df2['duration_minutes'] = pd.to_numeric(
    df['duration'].str.extract(r'(\d+)')[0],
    errors='coerce'
)
df2[((df2['type'].str.lower() == 'movie') & (df2['duration_minutes'] > 90))]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,duration_minutes
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,2021-03-30,2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...,113
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,2021-03-30,2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,110
6,s7,Movie,Hired Gun,Fran Strine,"Alice Cooper, Liberty DeVitto, Ray Parker Jr.,...",United States,2021-03-30,2017,,98 min,"Documentary, Special Interest","They are the ""First Call, A-list"" musicians, j...",98
7,s8,Movie,Grease Live!,"Thomas Kail, Alex Rudzinski","Julianne Hough, Aaron Tveit, Vanessa Hudgens, ...",United States,2021-03-30,2016,,131 min,Comedy,"This honest, uncompromising comedy chronicles ...",131
9,s10,Movie,David's Mother,Robert Allan Ackerman,"Kirstie Alley, Sam Waterston, Stockard Channing",United States,2021-04-01,1994,,92 min,Drama,Sally Goodson is a devoted mother to her autis...,92
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9656,s9657,Movie,Anaganaga Oka Nenu,Vamshi P,"Vijay, Sweta chaudhary, NNR CHOWDARY, Hariya",,NaT,2021,18+,135 min,"Action, Drama, Suspense",Peter(hero) Who went to the forest to find a c...,135
9659,s9660,Movie,10 Things I Hate About You,Gil Junger,"Heath Ledger, Julia Stiles, Joseph Gordon-Levi...",,NaT,1999,PG-13,97 min,"Comedy, Drama, Romance","On the first day at his new school, Cameron in...",97
9660,s9661,Movie,The Man in the Hat,"John-Paul Davidson, Stephen Warbeck","Ciaran Hinds, Stephen Dillane, Maïwenn",,NaT,2021,13+,96 min,Comedy,The Man in the Hat journeys through France in ...,96
9662,s9663,Movie,River,Emily Skye,"Mary Cameron Rogers, Alexandra Rose, Rob Marsh...",,NaT,2021,16+,93 min,"Drama, Science Fiction, Suspense","River is a grounded Sci-Fi mystery Thriller, t...",93


4)Column operations:

In [14]:
# Create a new column called `release_decade` based on the release_year
df['release_decade'] = df['release_year'].astype(str).str[:-1]+'0s'
df

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,release_decade
0,s1,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,2021-03-30,2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...,2010s
1,s2,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,2021-03-30,2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,2010s
2,s3,Movie,Secrets of Deception,Josh Webber,"Tom Sizemore, Lorenzo Lamas, Robert LaSardo, R...",United States,2021-03-30,2017,,74 min,"Action, Drama, Suspense",After a man discovers his wife is cheating on ...,2010s
3,s4,Movie,Pink: Staying True,Sonia Anderson,"Interviews with: Pink, Adele, Beyoncé, Britney...",United States,2021-03-30,2014,,69 min,Documentary,"Pink breaks the mold once again, bringing her ...",2010s
4,s5,Movie,Monster Maker,Giles Foster,"Harry Dean Stanton, Kieran O'Brien, George Cos...",United Kingdom,2021-03-30,1989,,45 min,"Drama, Fantasy",Teenage Matt Banting wants to work with a famo...,1980s
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9663,s9664,Movie,Pride Of The Bowery,Joseph H. Lewis,"Leo Gorcey, Bobby Jordan",,NaT,1940,7+,60 min,Comedy,New York City street principles get an East Si...,1940s
9664,s9665,TV Show,Planet Patrol,,"DICK VOSBURGH, RONNIE STEVENS, LIBBY MORRIS, M...",,NaT,2018,13+,4 Seasons,TV Shows,"This is Earth, 2100AD - and these are the adve...",2010s
9665,s9666,Movie,Outpost,Steve Barker,"Ray Stevenson, Julian Wadham, Richard Brake, M...",,NaT,2008,R,90 min,Action,"In war-torn Eastern Europe, a world-weary grou...",2000s
9666,s9667,TV Show,Maradona: Blessed Dream,,"Esteban Recagno, Ezequiel Stremiz, Luciano Vit...",,NaT,2021,TV-MA,1 Season,"Drama, Sports","The series tells the story of Diego Maradona, ...",2020s


In [15]:
# Display unique values in type, rating, and country
df['type'].unique()

array(['Movie', 'TV Show'], dtype=object)

In [16]:
df['rating'].unique()

array([nan, '13+', 'ALL', '18+', 'R', 'TV-Y', 'TV-Y7', 'NR', '16+',
       'TV-PG', '7+', 'TV-14', 'TV-NR', 'TV-G', 'PG-13', 'TV-MA', 'G',
       'PG', 'NC-17', 'UNRATED', '16', 'AGES_16_', 'AGES_18_', 'ALL_AGES',
       'NOT_RATE'], dtype=object)

In [17]:
all_countries = df['country'].str.split(',').explode().str.strip()
unique_countries = all_countries.unique()
unique_countries

array(['Canada', 'India', 'United States', 'United Kingdom', 'France',
       'Spain', nan, 'Italy', 'Germany', 'Japan', 'China', 'Denmark',
       'Czech Republic', 'Netherlands', 'Ireland', 'Thailand', 'Brazil',
       'Switzerland', 'Australia', 'Belgium', 'Chile', 'Argentina',
       'Mexico', 'Sweden', 'New Zealand', 'Portugal', 'Hungary', 'Iran',
       'Luxembourg', 'South Africa', 'Austria', 'Monaco', 'Egypt',
       'United Arab Emirates', 'Singapore', 'South Korea', 'Afghanistan',
       'Colombia', 'Norway', 'Kosovo', 'Kazakhstan', 'Malaysia', 'Poland',
       'Albania', 'Georgia', 'Hong Kong'], dtype=object)

# Part 2: GroupBy Operations

5)Aggregations:

In [18]:
# Group by type and count the number of titles in each category
df.groupby('type')['title'].count()

type
Movie      7814
TV Show    1854
Name: title, dtype: int64

In [19]:
# Group by country and find the top 5 countries with the most titles
df.groupby('country')['title'].count().sort_values(ascending=False).head(5)

country
United States                    253
India                            229
United Kingdom                    28
Canada                            16
United Kingdom, United States     12
Name: title, dtype: int64

In [20]:
# Group by rating and find average release year for each rating type
df.groupby('rating')['release_year'].mean()

rating
13+         2003.767123
16          2005.000000
16+         2013.248222
18+         2012.567176
7+          2003.932468
AGES_16_    2017.000000
AGES_18_    2017.000000
ALL         2012.048107
ALL_AGES    2020.000000
G           2002.924731
NC-17       2020.333333
NOT_RATE    1985.666667
NR          1999.883408
PG          1990.747036
PG-13       2004.882952
R           2005.527723
TV-14       2014.865385
TV-G        2006.888889
TV-MA       2017.415584
TV-NR       2015.790476
TV-PG       2010.402367
TV-Y        2009.824324
TV-Y7       2011.205128
UNRATED     2014.333333
Name: release_year, dtype: float64

6)Multiple aggregations:

In [21]:
# Group by release_year and show:
# - Total number of titles

df.groupby('release_year').agg(
    total_no_of_titles = ('title', 'count')
)

Unnamed: 0_level_0,total_no_of_titles
release_year,Unnamed: 1_level_1
1920,3
1922,2
1923,1
1924,1
1925,8
...,...
2017,562
2018,623
2019,929
2020,962


In [22]:
# - Count of Movies and TV Shows separately
df.groupby('release_year')['type'].value_counts().unstack(fill_value=0)

type,Movie,TV Show
release_year,Unnamed: 1_level_1,Unnamed: 2_level_1
1920,3,0
1922,2,0
1923,1,0
1924,1,0
1925,8,0
...,...,...
2017,404,158
2018,438,185
2019,730,199
2020,736,226


7) Advanced:

In [23]:
# Create a new column `is_recent` True if `release_year` > 2015
df['is_recent'] = df['release_year']>2015
df.tail()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,release_decade,is_recent
9663,s9664,Movie,Pride Of The Bowery,Joseph H. Lewis,"Leo Gorcey, Bobby Jordan",,NaT,1940,7+,60 min,Comedy,New York City street principles get an East Si...,1940s,False
9664,s9665,TV Show,Planet Patrol,,"DICK VOSBURGH, RONNIE STEVENS, LIBBY MORRIS, M...",,NaT,2018,13+,4 Seasons,TV Shows,"This is Earth, 2100AD - and these are the adve...",2010s,True
9665,s9666,Movie,Outpost,Steve Barker,"Ray Stevenson, Julian Wadham, Richard Brake, M...",,NaT,2008,R,90 min,Action,"In war-torn Eastern Europe, a world-weary grou...",2000s,False
9666,s9667,TV Show,Maradona: Blessed Dream,,"Esteban Recagno, Ezequiel Stremiz, Luciano Vit...",,NaT,2021,TV-MA,1 Season,"Drama, Sports","The series tells the story of Diego Maradona, ...",2020s,True
9667,s9668,Movie,Harry Brown,Daniel Barber,"Michael Caine, Emily Mortimer, Joseph Gilgun, ...",,NaT,2010,R,103 min,"Action, Drama, Suspense","Harry Brown, starring two-time Academy Award w...",2010s,False


In [24]:
# Group by country and count how many recent shows are there per country
df.groupby('country')['is_recent'].sum()


country
Afghanistan, France                          1
Australia                                    2
Australia, Colombia, United Kingdom          1
Australia, United States, Germany            1
Austria                                      1
                                            ..
United States, United Arab Emirates          0
United States, United Kingdom                1
United States, United Kingdom, Canada        1
United States, United Kingdom, Germany       0
United States, United Kingdom, Kazakhstan    0
Name: is_recent, Length: 86, dtype: int64

# Part 3: Analysis Questions

In [25]:
# 1. What is the most common rating for TV Shows?
df['rating'].value_counts().head(1)

# This is the most common rating because. I have count the values  on the basis of rating value_counts that give the first element is the most frequently occurring element.

13+    2117
Name: rating, dtype: int64

In [26]:
# Which country has the most content on amazon_prime?
df['country'].value_counts().head(1)

# This is the most content on the basis of country on Prime because. I have count the values on the basis of country value_counts that give the first element is the most frequently occurring element.

United States    253
Name: country, dtype: int64

In [27]:
# Are most Amazon_prime titles older or recent (after 2015)?
df['is_recent'].value_counts()

# Recent, you can see below

True     5039
False    4629
Name: is_recent, dtype: int64

In [28]:
# Whats the average release year for Amazon prime Movies?
average_release_year_movies = df[df['type'].str.lower()=='movie']['release_year'].mean()
print(f"Average release year for Amazon_prime Movies: {average_release_year_movies:.2f}")

Average release year for Amazon_prime Movies: 2006.87


Conclusion:
The analysis reveals that the majority of Amazon_prime content comes from the United States, with movies and TV shows spanning many decades. Most titles are recent, released after 2015. The most common rating for TV shows tends to be 13+(TV-MA), indicating mature content is prevalent on the platform.