Initially, we attempted to scrape movie data directly from IMDb. However, IMDb blocked our API access, even after we used headers to avoid detection. 

Fortunately, IMDb offers a library called `IMDb` that provides detailed information about movies. The main issue with this library was that it requires us to input the name of each movie individually to retrieve its information. This made it impossible to scrape a list of movies (e.g., the top 250 movies) as a CSV file.

To overcome this, we scraped the names of the top 250 movies from a website called [https://digimoviez.com/top-250-movies/](https://digimoviez.com/top-250-movies/). These movie names were stored in a list called `movie_titles`. 

Next, we used the `search_movie` function from the IMDb library to search for each movie by its name, retrieve the relevant information, and store the data in a CSV file.

Finally, some columns in the dataset contained `NaN` values. We performed a cleaning process to replace these `NaN` values with accurate data retrieved from IMDb to ensure that our dataset was complete and reliable.

### Webscraping

In [1]:
import requests
from bs4 import BeautifulSoup
from imdb import IMDb
import time
import csv

In [2]:
# IMDb instance
ia = IMDb()

In [3]:
# URLs
url_movies = "https://digimoviez.com/top-250-movies/"

In [4]:
# Headers to avoid bot detection
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

In [5]:
def get_titles(url):
    """Fetches movies titles"""
    response = requests.get(url, headers=headers)

    if response.status_code != 200:
        print(f"Failed to retrieve the page: {url} | Status code: {response.status_code}")
        return []
    
    # Parse HTML content
    soup = BeautifulSoup(response.text, "html.parser")

    # Find all movie/series blocks
    items = soup.find_all("div", class_="loop_item_list")

    # Extract titles
    titles = []
    for item in items:
        title_tag = item.find("h2", class_="title_h2").find("a")
        if title_tag:
            titles.append(title_tag.text.strip())

    return titles

In [6]:
def fetch_and_store(titles, writer):
    """Fetches IMDb details for each movie/series and writes them to the CSV file."""
    for name in titles:
        try:
            # Search IMDb
            search_results = ia.search_movie(name)
            if not search_results:
                print(f"'{name}' not found on IMDb.")
                continue

            # Get the first matching result
            movie_id = search_results[0].movieID
            movie = ia.get_movie(movie_id)

            # Extract details
            certificates = movie.get('certificates', [])
            certificate = "N/A"

            # Find only the USA rating
            for cert in certificates:
                if "United States" in cert or "USA" in cert:
                    certificate = cert.split(":")[-1] 
                    break  

            duration = movie.get('runtimes', ['N/A'])[0] + " min"
            genre = ', '.join(movie.get('genres', ['N/A']))
            imdb_rating = movie.get('rating', 'N/A')
            director = ', '.join([d['name'] for d in movie.get('directors', [])])
            stars = ', '.join([a['name'] for a in movie.get('cast', [])[:5]])
            votes = movie.get('votes', 'N/A')
            gross = movie.get('box office', {}).get('Cumulative Worldwide Gross', 'N/A')
            plot = movie.get('plot outline', 'N/A')
            air_date = movie.get('original air date', 'N/A')

            # Write to CSV
            writer.writerow([
                name, certificate, duration, genre, imdb_rating, 
                director, stars, votes, gross, plot, air_date
            ])

            print(f"Fetched and saved: {name}")

            # IMDb rate-limiting, Add a delay
            time.sleep(2)

        except Exception as e:
            print(f"Error fetching data for '{name}': {e}")

In [7]:
# Fetch titles
movie_titles = get_titles(url_movies)


In [8]:
movie_titles

['The Shawshank Redemption',
 'The Godfather',
 'The Dark Knight',
 'The Godfather Part II',
 '12 Angry Men',
 'The Lord of the Rings: The Return of the King',
 "Schindler's List",
 'Pulp Fiction',
 'The Lord of the Rings: The Fellowship of the Ring',
 'The Good, the Bad and the Ugly',
 'Forrest Gump',
 'The Lord of the Rings: The Two Towers',
 'Fight Club',
 'Inception',
 'Star Wars: Episode V - The Empire Strikes Back',
 'The Matrix',
 'Goodfellas',
 "One Flew Over the Cuckoo's Nest",
 'Interstellar',
 'Se7en',
 "It's a Wonderful Life",
 'Seven Samurai',
 'The Silence of the Lambs',
 'Saving Private Ryan',
 'City of God',
 'The Green Mile',
 'Life Is Beautiful',
 'Terminator 2: Judgment Day',
 'Star Wars: Episode IV - A New Hope',
 'Back to the Future',
 'Spirited Away',
 'The Pianist',
 'Gladiator',
 'Parasite',
 'Psycho',
 'The Lion King',
 'The Departed',
 'Grave of the Fireflies',
 'Whiplash',
 'Spider-Man: Across the Spider-Verse',
 'American History X',
 'The Prestige',
 'Léon:

In [9]:
# CSV file name
csv_filename = "movies_data.csv"

In [10]:
# Open CSV file for writing
with open(csv_filename, mode="w", newline="", encoding="utf-8") as file:
    writer = csv.writer(file)

    # Write header row
    writer.writerow([
        "Name", "Certificate", "Duration", "Genre", "IMDb Rating", 
        "Director", "Stars", "Votes", "Grossed", "Plot", "Initial Air Date"
    ])

    # Fetch and store movie details
    fetch_and_store(movie_titles, writer)

print(f"\n Sucessfull")

Fetched and saved: The Shawshank Redemption
Fetched and saved: The Godfather
Fetched and saved: The Dark Knight
Fetched and saved: The Godfather Part II
Fetched and saved: 12 Angry Men
Fetched and saved: The Lord of the Rings: The Return of the King
Fetched and saved: Schindler's List
Fetched and saved: Pulp Fiction
Fetched and saved: The Lord of the Rings: The Fellowship of the Ring
Fetched and saved: The Good, the Bad and the Ugly
Fetched and saved: Forrest Gump
Fetched and saved: The Lord of the Rings: The Two Towers
Fetched and saved: Fight Club
Fetched and saved: Inception
Fetched and saved: Star Wars: Episode V - The Empire Strikes Back
Fetched and saved: The Matrix
Fetched and saved: Goodfellas
Fetched and saved: One Flew Over the Cuckoo's Nest
Fetched and saved: Interstellar
Fetched and saved: Se7en
Fetched and saved: It's a Wonderful Life
Fetched and saved: Seven Samurai
Fetched and saved: The Silence of the Lambs
Fetched and saved: Saving Private Ryan
Fetched and saved: City 

### Loading and Cleaning the Data

In [11]:
import pandas as pd

In [12]:
df = pd.read_csv("movies_data.csv")
df

Unnamed: 0,Name,Certificate,Duration,Genre,IMDb Rating,Director,Stars,Votes,Grossed,Plot,Initial Air Date
0,The Shawshank Redemption,(DLSV),142 min,Drama,9.3,Frank Darabont,"Tim Robbins, Morgan Freeman, Bob Gunton, Willi...",3019590.0,"$58,500,000",Chronicles the experiences of a formerly succe...,23 Sep 1994 (Canada)
1,The Godfather,TV-14,175 min,"Crime, Drama",9.2,Francis Ford Coppola,"Marlon Brando, Al Pacino, James Caan, Richard ...",2107292.0,"$245,066,411","The Godfather ""Don"" Vito Corleone is the head ...",24 Mar 1972 (Canada)
2,The Dark Knight,(LV),152 min,"Action, Crime, Drama, Thriller",9.0,Christopher Nolan,"Christian Bale, Heath Ledger, Aaron Eckhart, M...",2995852.0,"$1,004,558,444, 19 Jul 2012",Set within a year after the events of Batman B...,18 Jul 2008 (Canada)
3,The Godfather Part II,(IFC Rating),202 min,"Crime, Drama",9.0,Francis Ford Coppola,"Al Pacino, Robert Duvall, Diane Keaton, Robert...",1419253.0,,The continuing saga of the Corleone crime fami...,18 Dec 1974 (Canada)
4,12 Angry Men,Approved,96 min,"Crime, Drama",9.0,Sidney Lumet,"Martin Balsam, John Fiedler, Lee J. Cobb, E.G....",916544.0,,"The defense and the prosecution have rested, a...",13 Apr 1957 (Canada)
...,...,...,...,...,...,...,...,...,...,...,...
245,Amores Perros,R,154 min,"Drama, Thriller",8.0,Alejandro G. Iñárritu,"Emilio Echevarría, Gael García Bernal, Goya To...",261018.0,,On the brink of the new Millennium in the bust...,16 Jun 2000 (Mexico)
246,The Help,PG-13,146 min,Drama,8.1,Tate Taylor,"Emma Stone, Viola Davis, Bryce Dallas Howard, ...",509767.0,"$216,639,112","In early-1960s Jackson, Mississippi, Skeeter (...",10 Aug 2011 (Canada)
247,Rebecca,Approved,130 min,"Drama, Mystery, Romance, Thriller",8.1,Alfred Hitchcock,"Laurence Olivier, Joan Fontaine, George Sander...",153247.0,"$7,592,465","A shy lady's companion, staying in Monte Carlo...",07 Jun 2024 (Canada)
248,A Silent Voice: The Movie,Not Rated,130 min,"Animation, Drama",8.1,Naoko Yamada,"Miyu Irino, Saori Hayami, Aoi Yûki, Kenshô Ono...",116801.0,,"The story revolves around Shôko Nishimiya, a g...",17 Sep 2016 (Brazil)


In [13]:
nan_counts = df.isna().sum()
nan_counts

Name                 0
Certificate         12
Duration             0
Genre                0
IMDb Rating          4
Director             7
Stars                5
Votes                4
Grossed             77
Plot                11
Initial Air Date     7
dtype: int64

In [16]:
row_index = df[df["Name"] == "The Good, the Bad and the Ugly"].index

df.loc[row_index, "Certificate"] = "Approved"
df.loc[row_index, "Duration"] = "161 min"
df.loc[row_index, "IMDb Rating"] = 8.8
df.loc[row_index, "Votes"] = 846678
df.loc[row_index, "Grossed"] = "$38,000,000"
df.loc[row_index, "Plot"] = "A bounty-hunting scam joins two men in an uneasy alliance against a third in a race to find a fortune in gold buried in a remote cemetery"
df.loc[row_index, "Initial Air Date"] = "23 Dec 1966"

In [18]:
rows_with_missing_plot = df[df["Plot"].isna()]
rows_with_missing_plot

Unnamed: 0,Name,Certificate,Duration,Genre,IMDb Rating,Director,Stars,Votes,Grossed,Plot,Initial Air Date
28,Star Wars: Episode IV - A New Hope,,16 min,"Short, Action, Drama, Sci-Fi",8.5,,"Peter Barbour, Paul Blake, Ronn Brown, Janice ...",10077.0,,,
37,Grave of the Fireflies,,N/A min,Drama,6.9,,,8.0,,,16 Sep 2024 (Spain)
52,Once Upon a Time in the West,,N/A min,Documentary,,,"Tom Betts, Jay Jennings",,,,
59,The Lives of Others,,N/A min,"Animation, Short",,Bahareh Ahmadi,,,,,10 Jun 2014 (Iran)
120,Snatch,Not Rated,60 min,"Comedy, Crime",6.9,,"Rupert Grint, Luca Pasqualino, Lucien Laviscou...",5325.0,,,
130,For a Few Dollars More,,3 min,"Music, Western",7.5,Meg Pfeiffer,,10.0,,,01 Jul 2020 (Germany)
176,Demon Slayer: Kimetsu no Yaiba - Tsuzumi Mansi...,,87 min,"Animation, Action, Fantasy",8.5,Haruo Sotozaki,,30604.0,,,07 Oct 2021 (Canada)
200,How to Train Your Dragon,(#55372),N/A min,"Action, Adventure, Comedy, Drama, Family, Fantasy",,Dean DeBlois,"Mason Thames, Nico Parker, Gerard Butler, Juli...",,,,13 Jun 2025 (Canada)
207,The Wages of Fear,,N/A min,"Drama, Thriller",,,,,,,
227,The Passion of Joan of Arc,,9 min,Short,4.8,Marie Losier,Marie Losier,13.0,,,2001 (USA)


In [19]:
row_index = df[df["Name"] == "Star Wars: Episode IV - A New Hope"].index

df.loc[row_index, "Certificate"] = "PG-13"
df.loc[row_index, "Duration"] = "121 Min"
df.loc[row_index, "Director"] = "George Lucas"
df.loc[row_index, "Grossed"] = "$775,000,000"
df.loc[row_index, "Plot"] = "Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a Wookiee and two droids to save the galaxy from the Empire's world-destroying battle station, while also attempting to rescue Princess Leia from the mysterious Darth Vader."
df.loc[row_index, "Initial Air Date"] = "25 May 1977"

In [20]:
row_index = df[df["Name"] == "Grave of the Fireflies"].index

df.loc[row_index, "Certificate"] = "PG"
df.loc[row_index, "Duration"] = "89 min"
df.loc[row_index, "IMDb Rating"] = 8.5
df.loc[row_index, "Director"] = "Isao Takahata"
df.loc[row_index, "Stars"] = "Tsutomu Tatsumi, Ayano Shiraishi, Yoshiko Shinohara, Akemi Yamaguchi"
df.loc[row_index, "Votes"] = 150000
df.loc[row_index, "Grossed"] = "$516,962"
df.loc[row_index, "Plot"] = "A young boy and his little sister struggle to survive in Japan during World War II."
df.loc[row_index, "Initial Air Date"] = "16 Apr 1988"

In [21]:
row_index = df[df["Name"] == "Once Upon a Time in the West"].index

df.loc[row_index, "Certificate"] = "PG-13"
df.loc[row_index, "Duration"] = "165 min"
df.loc[row_index, "IMDb Rating"] = 8.5
df.loc[row_index, "Director"] = "Sergio Leone"
df.loc[row_index, "Stars"] = "Henry Fonda, Charles Bronson, Claudia Cardinale, Jason Robards"
df.loc[row_index, "Votes"] = 330000
df.loc[row_index, "Grossed"] = "$5.3 million"
df.loc[row_index, "Plot"] = "A mysterious stranger with a harmonica joins forces with a notorious desperado to protect a beautiful widow from a ruthless assassin working for the railroad."
df.loc[row_index, "Initial Air Date"] = "21 Dec 1968"


In [22]:
row_index = df[df["Name"] == "The Lives of Others"].index

df.loc[row_index, "Certificate"] = "R"
df.loc[row_index, "Duration"] = "137 min"
df.loc[row_index, "IMDb Rating"] = 8.4
df.loc[row_index, "Director"] = "Florian Henckel von Donnersmarck"
df.loc[row_index, "Stars"] = "Ulrich Mühe, Martina Gedeck, Sebastian Koch, Ulrich Tukur"
df.loc[row_index, "Votes"] = 400000
df.loc[row_index, "Grossed"] = "$11.3 million"
df.loc[row_index, "Plot"] = "In 1984 East Berlin, an agent of the secret police, conducting surveillance on a writer and his lover, finds himself becoming increasingly absorbed by their lives."
df.loc[row_index, "Initial Air Date"] = "23 March 2006"

In [23]:
row_index = df[df["Name"] == "Snatch"].index

df.loc[row_index, "Certificate"] = "R"
df.loc[row_index, "Duration"] = "102 min"
df.loc[row_index, "IMDb Rating"] = 8.3
df.loc[row_index, "Director"] = "Guy Ritchie"
df.loc[row_index, "Stars"] = "Jason Statham, Brad Pitt, Stephen Graham, Alan Ford"
df.loc[row_index, "Votes"] = 850000
df.loc[row_index, "Grossed"] = "$83.6 million"
df.loc[row_index, "Plot"] = "Unscrupulous boxing promoters, violent bookmakers, a Russian gangster, incompetent amateur robbers, and supposedly Jewish jewelers fight to track down a priceless stolen diamond."
df.loc[row_index, "Initial Air Date"] = "1 Sep 2000"


In [24]:
row_index = df[df["Name"] == "For a Few Dollars More"].index

df.loc[row_index, "Certificate"] = "PG-13"
df.loc[row_index, "Duration"] = "132 min"
df.loc[row_index, "IMDb Rating"] = 8.3
df.loc[row_index, "Director"] = "Sergio Leone"
df.loc[row_index, "Stars"] = "Clint Eastwood, Lee Van Cleef, Gian Maria Volontè, Mara Krupp"
df.loc[row_index, "Votes"] = 250000
df.loc[row_index, "Grossed"] = "$15 million"
df.loc[row_index, "Plot"] = "Two bounty hunters with the same intentions team up to track down a Western outlaw."
df.loc[row_index, "Initial Air Date"] = "10 Dec 1965"

In [25]:
row_index = df[df["Name"] == "The Wages of Fear"].index

df.loc[row_index, "Certificate"] = "PG"
df.loc[row_index, "Duration"] = "131 min"
df.loc[row_index, "IMDb Rating"] = 8.1
df.loc[row_index, "Director"] = "Henri-Georges Clouzot"
df.loc[row_index, "Stars"] = "Yves Montand, Charles Vanel, Peter van Eyck, Folco Lulli"
df.loc[row_index, "Votes"] = 60000
df.loc[row_index, "Grossed"] = "$600,000"
df.loc[row_index, "Plot"] = "In a decrepit South American village, four men are hired to transport an urgent nitroglycerine shipment without the equipment that would make it safe."
df.loc[row_index, "Initial Air Date"] = "22 Apr 1953"


In [26]:
row_index = df[df["Name"] == "The Passion of Joan of Arc"].index

df.loc[row_index, "Certificate"] = "Not Rated"
df.loc[row_index, "Duration"] = "82 min"
df.loc[row_index, "IMDb Rating"] = 8.2
df.loc[row_index, "Director"] = "Carl Theodor Dreyer"
df.loc[row_index, "Stars"] = "Maria Falconetti, Eugene Silvain, André Berley, Maurice Schutz"
df.loc[row_index, "Votes"] = 50000
df.loc[row_index, "Grossed"] = "$21,000" 
df.loc[row_index, "Plot"] = "A classic silent film depicting the trial and execution of Joan of Arc, focusing on her faith and courage."
df.loc[row_index, "Initial Air Date"] = "25 Oct 1928"

In [27]:
df.isna().sum()

Name                 0
Certificate          5
Duration             0
Genre                0
IMDb Rating          1
Director             2
Stars                1
Votes                1
Grossed             69
Plot                 3
Initial Air Date     3
dtype: int64

In [28]:
rows_with_missing_certificate = df[df["Certificate"].isna()]
rows_with_missing_certificate

Unnamed: 0,Name,Certificate,Duration,Genre,IMDb Rating,Director,Stars,Votes,Grossed,Plot,Initial Air Date
60,12th Fail,,147 min,"Biography, Drama",8.8,Vidhu Vinod Chopra,"Vikrant Massey, Medha Shankr, Anant Joshi, Ans...",144337.0,,Manoj Kumar Sharma belongs to Chambal village ...,27 Oct 2023 (Canada)
176,Demon Slayer: Kimetsu no Yaiba - Tsuzumi Mansi...,,87 min,"Animation, Action, Fantasy",8.5,Haruo Sotozaki,,30604.0,,,07 Oct 2021 (Canada)
197,Maharaja,,141 min,"Action, Crime, Drama, Thriller",8.4,Nithilan Saminathan,"Vijay Sethupathi, Anurag Kashyap, Mamta Mohand...",65111.0,"INR1,900,000,000, 26 Jan 2025",A simple and soft spoken barber in a small tow...,13 Jun 2024 (USA)
227,The Passion of Joan of Arc,,9 min,Short,4.8,Marie Losier,Marie Losier,13.0,,,2001 (USA)
241,The Battle of Algiers,,117 min,Documentary,8.0,Malek Bensmaïl,"Brahim Hadjadj, Gillo Pontecorvo, Yacef Saadi",44.0,,The Battle of Algiers is one of the most celeb...,


In [29]:
row_index = df[df["Name"] == "12th Fail"].index

df.loc[row_index, "Certificate"] = "UA"

In [30]:
row_index = df[df["Name"] == "Maharaja"].index

df.loc[row_index, "Certificate"] = "UA"

In [31]:
row_index = df[df["Name"] == "The Battle of Algiers"].index

df.loc[row_index, "Certificate"] = "Not Rated" 
df.loc[row_index, "Grossed"] = "$900,000" 
df.loc[row_index, "Initial Air Date"] = "8 Sep 1966" 

In [32]:
df.isna().sum()

Name                 0
Certificate          2
Duration             0
Genre                0
IMDb Rating          1
Director             2
Stars                1
Votes                1
Grossed             68
Plot                 3
Initial Air Date     2
dtype: int64

In [33]:
rows_with_missing_IMDb_Rating = df[df["IMDb Rating"].isna()]
rows_with_missing_IMDb_Rating 

Unnamed: 0,Name,Certificate,Duration,Genre,IMDb Rating,Director,Stars,Votes,Grossed,Plot,Initial Air Date
200,How to Train Your Dragon,(#55372),N/A min,"Action, Adventure, Comedy, Drama, Family, Fantasy",,Dean DeBlois,"Mason Thames, Nico Parker, Gerard Butler, Juli...",,,,13 Jun 2025 (Canada)


In [34]:
row_index = df[df["Name"] == "How to Train Your Dragon"].index

df.loc[row_index, "Certificate"] = "PG"
df.loc[row_index, "Duration"] = "98 min"
df.loc[row_index, "IMDb Rating"] = 8.1
df.loc[row_index, "Votes"] = 750000
df.loc[row_index, "Grossed"] = "$494.9 million"
df.loc[row_index, "Plot"] = "A young Viking boy named Hiccup aspires to follow his tribe's tradition of becoming a dragon slayer, but befriends a dragon instead and learns there's more to the creatures than he thought."

In [42]:
df.isna().sum()

Name                 0
Certificate          1
Duration             0
Genre                0
IMDb Rating          0
Director             0
Stars                1
Votes                0
Grossed             64
Plot                 1
Initial Air Date     0
dtype: int64

In [36]:
rows_with_missing_Director = df[df["Director"].isna()]
rows_with_missing_Director 

Unnamed: 0,Name,Certificate,Duration,Genre,IMDb Rating,Director,Stars,Votes,Grossed,Plot,Initial Air Date
174,Fargo,TV-MA,60 min,"Crime, Drama, Thriller",8.8,,"Allison Tolman, Billy Bob Thornton, Colin Hank...",441504.0,,"The all new ""true crime"" case of Fargo's new c...",
175,Warrior,TV-MA,60 min,"Action, Crime, Drama, History",8.4,,"Andrew Koji, Olivia Cheng, Jason Tobin, Dianne...",49280.0,,"In the late 1800s, a martial arts prodigy trav...",


In [37]:
row_index = df[df["Name"] == "Fargo"].index

df.loc[row_index, "Duration"] = "98 min"
df.loc[row_index, "Director"] = "Joel Coen, Ethan Coen"
df.loc[row_index, "Grossed"] = "$60.6 million"
df.loc[row_index, "Initial Air Date"] = "5 Apr 1996"

In [38]:
row_index = df[df["Name"] == "Warrior"].index

df.loc[row_index, "Duration"] = "140 min"
df.loc[row_index, "Director"] = "Gavin O'Connor"
df.loc[row_index, "Grossed"] = "$23.1 million"
df.loc[row_index, "Initial Air Date"] = "9 Sep 2011"

In [40]:
rows_with_missing_Plot = df[df["Plot"].isna()]
rows_with_missing_Plot 

Unnamed: 0,Name,Certificate,Duration,Genre,IMDb Rating,Director,Stars,Votes,Grossed,Plot,Initial Air Date
176,Demon Slayer: Kimetsu no Yaiba - Tsuzumi Mansi...,,87 min,"Animation, Action, Fantasy",8.5,Haruo Sotozaki,,30604.0,,,07 Oct 2021 (Canada)
227,The Passion of Joan of Arc,,9 min,Short,4.8,Marie Losier,Marie Losier,13.0,,,2001 (USA)


In [43]:
row_index = df[df["Name"] == "Demon Slayer: Kimetsu no Yaiba - Tsuzumi Mansion Arc"].index

df.loc[row_index, "Certificate"] = "TV-14" 
df.loc[row_index, "Stars"] = "Natsuki Hanae, Akari Kitō, Hiro Shimono, Yoshitsugu Matsuoka"
df.loc[row_index, "Plot"] = "Tanjiro, Zenitsu, and Inosuke investigate mysterious disappearances in the Tsuzumi Mansion, where they face a powerful demon with a connection to the Twelve Kizuki."

In [44]:
df.isna().sum()

Name                 0
Certificate          0
Duration             0
Genre                0
IMDb Rating          0
Director             0
Stars                0
Votes                0
Grossed             64
Plot                 0
Initial Air Date     0
dtype: int64

In [45]:
df

Unnamed: 0,Name,Certificate,Duration,Genre,IMDb Rating,Director,Stars,Votes,Grossed,Plot,Initial Air Date
0,The Shawshank Redemption,(DLSV),142 min,Drama,9.3,Frank Darabont,"Tim Robbins, Morgan Freeman, Bob Gunton, Willi...",3019590.0,"$58,500,000",Chronicles the experiences of a formerly succe...,23 Sep 1994 (Canada)
1,The Godfather,TV-14,175 min,"Crime, Drama",9.2,Francis Ford Coppola,"Marlon Brando, Al Pacino, James Caan, Richard ...",2107292.0,"$245,066,411","The Godfather ""Don"" Vito Corleone is the head ...",24 Mar 1972 (Canada)
2,The Dark Knight,(LV),152 min,"Action, Crime, Drama, Thriller",9.0,Christopher Nolan,"Christian Bale, Heath Ledger, Aaron Eckhart, M...",2995852.0,"$1,004,558,444, 19 Jul 2012",Set within a year after the events of Batman B...,18 Jul 2008 (Canada)
3,The Godfather Part II,(IFC Rating),202 min,"Crime, Drama",9.0,Francis Ford Coppola,"Al Pacino, Robert Duvall, Diane Keaton, Robert...",1419253.0,,The continuing saga of the Corleone crime fami...,18 Dec 1974 (Canada)
4,12 Angry Men,Approved,96 min,"Crime, Drama",9.0,Sidney Lumet,"Martin Balsam, John Fiedler, Lee J. Cobb, E.G....",916544.0,,"The defense and the prosecution have rested, a...",13 Apr 1957 (Canada)
...,...,...,...,...,...,...,...,...,...,...,...
245,Amores Perros,R,154 min,"Drama, Thriller",8.0,Alejandro G. Iñárritu,"Emilio Echevarría, Gael García Bernal, Goya To...",261018.0,,On the brink of the new Millennium in the bust...,16 Jun 2000 (Mexico)
246,The Help,PG-13,146 min,Drama,8.1,Tate Taylor,"Emma Stone, Viola Davis, Bryce Dallas Howard, ...",509767.0,"$216,639,112","In early-1960s Jackson, Mississippi, Skeeter (...",10 Aug 2011 (Canada)
247,Rebecca,Approved,130 min,"Drama, Mystery, Romance, Thriller",8.1,Alfred Hitchcock,"Laurence Olivier, Joan Fontaine, George Sander...",153247.0,"$7,592,465","A shy lady's companion, staying in Monte Carlo...",07 Jun 2024 (Canada)
248,A Silent Voice: The Movie,Not Rated,130 min,"Animation, Drama",8.1,Naoko Yamada,"Miyu Irino, Saori Hayami, Aoi Yûki, Kenshô Ono...",116801.0,,"The story revolves around Shôko Nishimiya, a g...",17 Sep 2016 (Brazil)


In [46]:
df.to_csv("cleaned_data.csv", index=False) 
