# 🎬 IMDb Dataset Analysis with Pandas

This notebook demonstrates how to work with the IMDb dataset using `pandas` and `numpy`. You'll learn how to:

- Load data
- Index, slice and filter rows/columns
- Apply custom functions
- Handle null values
- Sort and modify data

## 📥 Step 1: Import Libraries

We start by importing essential libraries for data handling and analysis.

## 📂 Step 2: Load the IMDb Dataset

Read the CSV file using `pd.read_csv` to create a DataFrame.

## 📄 Step 3: Series and DataFrame Operations

Explore how to access individual columns and slices of the dataset.

## 🧾 Step 4: Input and Selection

Use `.head()` and column selection to peek into your dataset.

## 📌 Step 5: Indexing Rows and Columns

Use `.iloc` for positional indexing and `.loc` for label-based indexing.

## 🔍 Step 6: Conditional Selection

Filter rows based on conditions like ratings or genre.

## 🎯 Step 7: Selecting Subsets

Use `.loc` to select specific rows and columns simultaneously.

## 🧱 Step 8: Setting Index

Change the DataFrame index to a column like `Title` for easier referencing.

## ⚙️ Step 9: DataFrame Operations

Perform statistical analysis like mean, std, min, max using `.describe()`.

## 🔢 Step 10: Unique Values and Counts

Check distinct genres and count their frequencies using `.unique()` and `.value_counts()`.

## 🛠️ Step 11: Applying Custom Functions

Create and apply a custom function using `.apply()` to classify ratings.

## 🧾 Step 12: Column and Index Names

Get all column names and row indices of the DataFrame.

## 🔃 Step 13: Sorting and Ordering

Sort the data by rating in descending order using `.sort_values()`.

## 🧼 Step 14: Null Value Checks

Check for missing data using `.isnull().sum()`.

## 🔁 Step 15: Value Replacement

Replace unwanted values like `'N/A'` with more meaningful ones.

## 🗑️ Step 16: Dropping Rows and Columns

Use `.drop()` to remove columns or rows with missing values.

---

✅ This concludes the tutorial. You can now upload this notebook to GitHub with a professional structure!

In [1]:
import pandas as pd
import numpy as np

data = pd.read_csv('/content/IMDb_All_Genres_etf_clean1.csv')
data.head()

Unnamed: 0,Movie_Title,Year,Director,Actors,Rating,Runtime(Mins),Censor,Total_Gross,main_genre,side_genre
0,Kantara,2022,Rishab Shetty,"Rishab Shetty, Sapthami Gowda, Kishore Kumar G...",9.3,148,UA,Gross Unkown,Action,"Adventure, Drama"
1,The Dark Knight,2008,Christopher Nolan,"Christian Bale, Heath Ledger, Aaron Eckhart, M...",9.0,152,UA,$534.86M,Action,"Crime, Drama"
2,The Lord of the Rings: The Return of the King,2003,Peter Jackson,"Elijah Wood, Viggo Mortensen, Ian McKellen, Or...",9.0,201,U,$377.85M,Action,"Adventure, Drama"
3,Inception,2010,Christopher Nolan,"Leonardo DiCaprio, Joseph Gordon-Levitt, Ellio...",8.8,148,UA,$292.58M,Action,"Adventure, Sci-Fi"
4,The Lord of the Rings: The Two Towers,2002,Peter Jackson,"Elijah Wood, Ian McKellen, Viggo Mortensen, Or...",8.8,179,UA,$342.55M,Action,"Adventure, Drama"


In [5]:
print("\nColumn as Series (Title):")
print(data['Movie_Title'].head())



Column as Series (Title):
0                                          Kantara
1                                  The Dark Knight
2    The Lord of the Rings: The Return of the King
3                                        Inception
4            The Lord of the Rings: The Two Towers
Name: Movie_Title, dtype: object


In [7]:
print("\nDataFrame with only a few columns:")
print(data[['Movie_Title', 'Year', 'Rating']].head())



DataFrame with only a few columns:
                                     Movie_Title  Year  Rating
0                                        Kantara  2022     9.3
1                                The Dark Knight  2008     9.0
2  The Lord of the Rings: The Return of the King  2003     9.0
3                                      Inception  2010     8.8
4          The Lord of the Rings: The Two Towers  2002     8.8


In [9]:
print("\nFirst 5 rows using .head():")
print(data.head(5))



First 5 rows using .head():
                                     Movie_Title  Year           Director  \
0                                        Kantara  2022      Rishab Shetty   
1                                The Dark Knight  2008  Christopher Nolan   
2  The Lord of the Rings: The Return of the King  2003      Peter Jackson   
3                                      Inception  2010  Christopher Nolan   
4          The Lord of the Rings: The Two Towers  2002      Peter Jackson   

                                              Actors  Rating  Runtime(Mins)  \
0  Rishab Shetty, Sapthami Gowda, Kishore Kumar G...     9.3            148   
1  Christian Bale, Heath Ledger, Aaron Eckhart, M...     9.0            152   
2  Elijah Wood, Viggo Mortensen, Ian McKellen, Or...     9.0            201   
3  Leonardo DiCaprio, Joseph Gordon-Levitt, Ellio...     8.8            148   
4  Elijah Wood, Ian McKellen, Viggo Mortensen, Or...     8.8            179   

  Censor   Total_Gross main_genre

In [10]:
print("\nSelecting a column (Series) - 'Rating':")
print(data['Rating'].head())



Selecting a column (Series) - 'Rating':
0    9.3
1    9.0
2    9.0
3    8.8
4    8.8
Name: Rating, dtype: float64


In [11]:
print("\nSelecting first row using .iloc:")
print(data.iloc[0])



Selecting first row using .iloc:
Movie_Title                                                Kantara
Year                                                          2022
Director                                             Rishab Shetty
Actors           Rishab Shetty, Sapthami Gowda, Kishore Kumar G...
Rating                                                         9.3
Runtime(Mins)                                                  148
Censor                                                          UA
Total_Gross                                           Gross Unkown
main_genre                                                  Action
side_genre                                       Adventure,  Drama
Name: 0, dtype: object


In [12]:
print("\nSelecting specific value [Row 0, 'Title'] using .loc:")
print(data.loc[0, 'Movie_Title'])



Selecting specific value [Row 0, 'Title'] using .loc:
Kantara


In [13]:
print("\nMovies with Rating > 8:")
print(data[data['Rating'] > 8].head())



Movies with Rating > 8:
                                     Movie_Title  Year           Director  \
0                                        Kantara  2022      Rishab Shetty   
1                                The Dark Knight  2008  Christopher Nolan   
2  The Lord of the Rings: The Return of the King  2003      Peter Jackson   
3                                      Inception  2010  Christopher Nolan   
4          The Lord of the Rings: The Two Towers  2002      Peter Jackson   

                                              Actors  Rating  Runtime(Mins)  \
0  Rishab Shetty, Sapthami Gowda, Kishore Kumar G...     9.3            148   
1  Christian Bale, Heath Ledger, Aaron Eckhart, M...     9.0            152   
2  Elijah Wood, Viggo Mortensen, Ian McKellen, Or...     9.0            201   
3  Leonardo DiCaprio, Joseph Gordon-Levitt, Ellio...     8.8            148   
4  Elijah Wood, Ian McKellen, Viggo Mortensen, Or...     8.8            179   

  Censor   Total_Gross main_genre    

In [16]:
print("\nMovies of Genre 'Action':")
print(data[data['main_genre'] == 'Action'].head())



Movies of Genre 'Action':
                                     Movie_Title  Year           Director  \
0                                        Kantara  2022      Rishab Shetty   
1                                The Dark Knight  2008  Christopher Nolan   
2  The Lord of the Rings: The Return of the King  2003      Peter Jackson   
3                                      Inception  2010  Christopher Nolan   
4          The Lord of the Rings: The Two Towers  2002      Peter Jackson   

                                              Actors  Rating  Runtime(Mins)  \
0  Rishab Shetty, Sapthami Gowda, Kishore Kumar G...     9.3            148   
1  Christian Bale, Heath Ledger, Aaron Eckhart, M...     9.0            152   
2  Elijah Wood, Viggo Mortensen, Ian McKellen, Or...     9.0            201   
3  Leonardo DiCaprio, Joseph Gordon-Levitt, Ellio...     8.8            148   
4  Elijah Wood, Ian McKellen, Viggo Mortensen, Or...     8.8            179   

  Censor   Total_Gross main_genre  

In [17]:
print("\nSubset with rows 0-4 and columns 'Title', 'Rating':")
print(data.loc[0:4, ['Movie_Title', 'Rating']])



Subset with rows 0-4 and columns 'Title', 'Rating':
                                     Movie_Title  Rating
0                                        Kantara     9.3
1                                The Dark Knight     9.0
2  The Lord of the Rings: The Return of the King     9.0
3                                      Inception     8.8
4          The Lord of the Rings: The Two Towers     8.8


In [18]:
print("\nSetting 'Title' as index:")
data_indexed = data.set_index('Movie_Title')
print(data_indexed.head())



Setting 'Title' as index:
                                               Year           Director  \
Movie_Title                                                              
Kantara                                        2022      Rishab Shetty   
The Dark Knight                                2008  Christopher Nolan   
The Lord of the Rings: The Return of the King  2003      Peter Jackson   
Inception                                      2010  Christopher Nolan   
The Lord of the Rings: The Two Towers          2002      Peter Jackson   

                                                                                          Actors  \
Movie_Title                                                                                        
Kantara                                        Rishab Shetty, Sapthami Gowda, Kishore Kumar G...   
The Dark Knight                                Christian Bale, Heath Ledger, Aaron Eckhart, M...   
The Lord of the Rings: The Return of the King  Elijah 

In [19]:
print("\nDescriptive statistics:")
print(data.describe())



Descriptive statistics:
              Year       Rating  Runtime(Mins)
count  5562.000000  5562.000000    5562.000000
mean   2002.792521     6.755861     112.226717
std      16.143990     0.937133      21.612655
min    1920.000000     1.000000      45.000000
25%    1997.000000     6.200000      97.000000
50%    2007.000000     6.800000     108.000000
75%    2014.000000     7.400000     123.000000
max    2022.000000     9.300000     321.000000


In [20]:
print("\nMean Rating:")
print(data['Rating'].mean())



Mean Rating:
6.755861201006832


In [21]:
print("\nUnique genres:")
print(data['main_genre'].unique())



Unique genres:
['Action' 'Animation' 'Biography' 'Adventure' 'Western' 'Drama' 'Crime'
 'Comedy' 'Horror' 'Mystery' 'Film-Noir' 'Fantasy' 'Musical']


In [22]:
print("\nGenre value counts:")
print(data['main_genre'].value_counts())



Genre value counts:
main_genre
Action       1577
Comedy       1350
Drama        1027
Crime         447
Biography     355
Animation     321
Adventure     296
Horror        142
Mystery        26
Fantasy        13
Western         4
Film-Noir       3
Musical         1
Name: count, dtype: int64


In [27]:
print("\nApplying custom function to categorize ratings:")
data['Rating_Category'] = data['Rating'].apply(rating_category)
print(data[['Movie_Title', 'Rating', 'Rating_Category']].head())




Applying custom function to categorize ratings:
                                     Movie_Title  Rating Rating_Category
0                                        Kantara     9.3       Excellent
1                                The Dark Knight     9.0       Excellent
2  The Lord of the Rings: The Return of the King     9.0       Excellent
3                                      Inception     8.8       Excellent
4          The Lord of the Rings: The Two Towers     8.8       Excellent


In [28]:
print("\nColumn Names:")
print(data.columns)
print("\nIndex:")
print(data.index)



Column Names:
Index(['Movie_Title', 'Year', 'Director', 'Actors', 'Rating', 'Runtime(Mins)',
       'Censor', 'Total_Gross', 'main_genre', 'side_genre', 'Rating_Category'],
      dtype='object')

Index:
RangeIndex(start=0, stop=5562, step=1)


In [29]:
print("\nChecking for null values:")
print(data.isnull().sum())



Checking for null values:
Movie_Title        0
Year               0
Director           0
Actors             0
Rating             0
Runtime(Mins)      0
Censor             0
Total_Gross        0
main_genre         0
side_genre         0
Rating_Category    0
dtype: int64


In [30]:
print("\nData sorted by Rating (descending):")
print(data.sort_values(by='Rating', ascending=False).head())



Data sorted by Rating (descending):
                   Movie_Title  Year                Director  \
0                      Kantara  2022           Rishab Shetty   
3448  The Shawshank Redemption  1994          Frank Darabont   
2253             The Godfather  1972    Francis Ford Coppola   
3449            Hababam Sinifi  1975           Ertem Egilmez   
2254                  Aynabaji  2016  Amitabh Reza Chowdhury   

                                                 Actors  Rating  \
0     Rishab Shetty, Sapthami Gowda, Kishore Kumar G...     9.3   
3448  Tim Robbins, Morgan Freeman, Bob Gunton, Willi...     9.3   
2253  Marlon Brando, Al Pacino, James Caan, Diane Ke...     9.2   
3449  Kemal Sunal, Münir Özkul, Halit Akçatepe, Tari...     9.2   
2254  Chanchal Chowdhury, Masuma Rahman Nabila, Part...     9.0   

      Runtime(Mins)     Censor   Total_Gross main_genre           side_genre  \
0               148         UA  Gross Unkown     Action    Adventure,  Drama   
3448           

In [31]:
print("\nChecking for null values:")
print(data.isnull().sum())



Checking for null values:
Movie_Title        0
Year               0
Director           0
Actors             0
Rating             0
Runtime(Mins)      0
Censor             0
Total_Gross        0
main_genre         0
side_genre         0
Rating_Category    0
dtype: int64


In [34]:
print("\nReplacing 'N/A' in 'Censor' column with 'Unrated' (if exists):")
if 'Censor' in data.columns:
    data['Censor'] = data['Censor'].replace('N/A', 'Unrated')
    print(data['Censor'].value_counts())



Replacing 'N/A' in 'Censor' column with 'Unrated' (if exists):
Censor
UA           1118
A            1101
U            1023
R             926
Not Rated     495
PG-13         405
18            136
PG            120
16             71
13             53
UA 16+         22
15+            18
7              17
UA 13+         12
G               9
(Banned)        8
UA 7+           7
All             5
12+             5
Unrated         4
U/A             2
18+             2
12              1
M/PG            1
NC-17           1
Name: count, dtype: int64


In [35]:
print("\nDropping column 'Rating_Category':")
data = data.drop('Rating_Category', axis=1)
print(data.head())



Dropping column 'Rating_Category':
                                     Movie_Title  Year           Director  \
0                                        Kantara  2022      Rishab Shetty   
1                                The Dark Knight  2008  Christopher Nolan   
2  The Lord of the Rings: The Return of the King  2003      Peter Jackson   
3                                      Inception  2010  Christopher Nolan   
4          The Lord of the Rings: The Two Towers  2002      Peter Jackson   

                                              Actors  Rating  Runtime(Mins)  \
0  Rishab Shetty, Sapthami Gowda, Kishore Kumar G...     9.3            148   
1  Christian Bale, Heath Ledger, Aaron Eckhart, M...     9.0            152   
2  Elijah Wood, Viggo Mortensen, Ian McKellen, Or...     9.0            201   
3  Leonardo DiCaprio, Joseph Gordon-Levitt, Ellio...     8.8            148   
4  Elijah Wood, Ian McKellen, Viggo Mortensen, Or...     8.8            179   

  Censor   Total_Gross mai

In [36]:
print("\nDropping rows with any null values:")
data_dropped = data.dropna()
print(data_dropped.head())



Dropping rows with any null values:
                                     Movie_Title  Year           Director  \
0                                        Kantara  2022      Rishab Shetty   
1                                The Dark Knight  2008  Christopher Nolan   
2  The Lord of the Rings: The Return of the King  2003      Peter Jackson   
3                                      Inception  2010  Christopher Nolan   
4          The Lord of the Rings: The Two Towers  2002      Peter Jackson   

                                              Actors  Rating  Runtime(Mins)  \
0  Rishab Shetty, Sapthami Gowda, Kishore Kumar G...     9.3            148   
1  Christian Bale, Heath Ledger, Aaron Eckhart, M...     9.0            152   
2  Elijah Wood, Viggo Mortensen, Ian McKellen, Or...     9.0            201   
3  Leonardo DiCaprio, Joseph Gordon-Levitt, Ellio...     8.8            148   
4  Elijah Wood, Ian McKellen, Viggo Mortensen, Or...     8.8            179   

  Censor   Total_Gross ma