<a href="https://colab.research.google.com/github/Arash-Kamboj/IMDb-s-Top-50-Movies-Webscraping-Project-/blob/main/IMDb's_Top_50_Movies_Webscraping_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this project, I gathered and extracted relevant information from the IMDb website's 'Top 250' movie list, including movie rankings, titles, release years, IMDb ratings, vote counts, and genres, using web scraping techniques and data parsing in Python.

---



In [None]:
import requests
from bs4 import BeautifulSoup
url = "https://www.imdb.com/search/title/?groups=top_250&sort=user_rating"

response = requests.get(url)
print(response.status_code)
soup = BeautifulSoup(response.content, 'html.parser')
movie_containers = soup.find_all('div',{'class':"lister-item mode-advanced"})

print(len(movie_containers))
movie_data = {}
for container in movie_containers:
    movie_info = {}
    ranking = container.h3.span.text.replace('.', '')
    movie_info['ranking'] = ranking
    name = container.h3.a.text
    movie_info['name'] = name
    year = container.h3.find('span', {'class':'lister-item-year text-muted unbold'}).text.strip('()')
    movie_info['year'] = year
    imdb = container.strong.text
    movie_info['imdb_rating'] = float(imdb)

    vote = container.find('span', {'name':'nv'}).text
    movie_info['votes'] = vote
    genres = container.p.find('span', {'class' : 'genre'}).text.strip()
    movie_info['genre'] = genres
    movie_data[name] = movie_info
print(movie_data)

In the following program, I created a dictionary of the data provided and wrote a Python program to performed operations as mentioned below:

**Sorting Movie Data by IMDb Ratings:**
After extracting movie data, sort it based on IMDb ratings in descending order using the sorted() function and a custom sorting key using the lambda function.

**Selecting Top 5 Movies:**
1. Create a list named top_5_movies containing the top 5 movies with the highest IMDb ratings by slicing the sorted movie data.
2. Displaying Top Movies Information:
Iterate through the top_5_movies list to print details about each of the top movies.


---



In [4]:
movie_data = {'The Shawshank Redemption': {'ranking': '1', 'name': 'The Shawshank Redemption', 'year': '1994', 'imdb_rating': 9.3, 'votes': '2,791,397', 'genre': 'Drama'}, 'The Godfather': {'ranking': '2', 'name': 'The Godfather', 'year': '1972', 'imdb_rating': 9.2, 'votes': '1,944,229', 'genre': 'Crime, Drama'}, 'The Dark Knight': {'ranking': '3', 'name': 'The Dark Knight', 'year': '2008', 'imdb_rating': 9.0, 'votes': '2,771,569', 'genre': 'Action, Crime, Drama'}, "Schindler's List": {'ranking': '4', 'name': "Schindler's List", 'year': '1993', 'imdb_rating': 9.0, 'votes': '1,403,821', 'genre': 'Biography, Drama, History'}, 'The Godfather Part II': {'ranking': '5', 'name': 'The Godfather Part II', 'year': '1974', 'imdb_rating': 9.0, 'votes': '1,320,870', 'genre': 'Crime, Drama'}, 'The Lord of the Rings: The Return of the King': {'ranking': '6', 'name': 'The Lord of the Rings: The Return of the King', 'year': '2003', 'imdb_rating': 9.0, 'votes': '1,911,924', 'genre': 'Action, Adventure, Drama'}, '12 Angry Men': {'ranking': '7', 'name': '12 Angry Men', 'year': '1957', 'imdb_rating': 9.0, 'votes': '829,179', 'genre': 'Crime, Drama'}, 'Pulp Fiction': {'ranking': '8', 'name': 'Pulp Fiction', 'year': '1994', 'imdb_rating': 8.9, 'votes': '2,141,402', 'genre': 'Crime, Drama'}, 'Spider-Man: Across the Spider-Verse': {'ranking': '9', 'name': 'Spider-Man: Across the Spider-Verse', 'year': '2023', 'imdb_rating': 8.8, 'votes': '243,641', 'genre': 'Animation, Action, Adventure'}, 'Inception': {'ranking': '10', 'name': 'Inception', 'year': '2010', 'imdb_rating': 8.8, 'votes': '2,460,643', 'genre': 'Action, Adventure, Sci-Fi'}, 'The Lord of the Rings: The Fellowship of the Ring': {'ranking': '11', 'name': 'The Lord of the Rings: The Fellowship of the Ring', 'year': '2001', 'imdb_rating': 8.8, 'votes': '1,940,118', 'genre': 'Action, Adventure, Drama'}, 'Fight Club': {'ranking': '12', 'name': 'Fight Club', 'year': '1999', 'imdb_rating': 8.8, 'votes': '2,225,606', 'genre': 'Drama'}, 'Forrest Gump': {'ranking': '13', 'name': 'Forrest Gump', 'year': '1994', 'imdb_rating': 8.8, 'votes': '2,171,468', 'genre': 'Drama, Romance'}, 'Il buono, il brutto, il cattivo': {'ranking': '14', 'name': 'Il buono, il brutto, il cattivo', 'year': '1966', 'imdb_rating': 8.8, 'votes': '787,484', 'genre': 'Adventure, Western'}, 'The Lord of the Rings: The Two Towers': {'ranking': '15', 'name': 'The Lord of the Rings: The Two Towers', 'year': '2002', 'imdb_rating': 8.8, 'votes': '1,725,289', 'genre': 'Action, Adventure, Drama'}, 'Jai Bhim': {'ranking': '16', 'name': 'Jai Bhim', 'year': '2021', 'imdb_rating': 8.8, 'votes': '209,719', 'genre': 'Crime, Drama, Mystery'}, 'Interstellar': {'ranking': '17', 'name': 'Interstellar', 'year': '2014', 'imdb_rating': 8.7, 'votes': '1,978,037', 'genre': 'Adventure, Drama, Sci-Fi'}, 'Goodfellas': {'ranking': '18', 'name': 'Goodfellas', 'year': '1990', 'imdb_rating': 8.7, 'votes': '1,211,198', 'genre': 'Biography, Crime, Drama'}, 'The Matrix': {'ranking': '19', 'name': 'The Matrix', 'year': '1999', 'imdb_rating': 8.7, 'votes': '1,985,257', 'genre': 'Action, Sci-Fi'}, "One Flew Over the Cuckoo's Nest": {'ranking': '20', 'name': "One Flew Over the Cuckoo's Nest", 'year': '1975', 'imdb_rating': 8.7, 'votes': '1,041,219', 'genre': 'Drama'}, 'Star Wars: Episode V - The Empire Strikes Back': {'ranking': '21', 'name': 'Star Wars: Episode V - The Empire Strikes Back', 'year': '1980', 'imdb_rating': 8.7, 'votes': '1,338,664', 'genre': 'Action, Adventure, Fantasy'}, 'Oppenheimer': {'ranking': '22', 'name': 'Oppenheimer', 'year': '2023', 'imdb_rating': 8.6, 'votes': '399,297', 'genre': 'Biography, Drama, History'}, 'Se7en': {'ranking': '23', 'name': 'Se7en', 'year': '1995', 'imdb_rating': 8.6, 'votes': '1,727,769', 'genre': 'Crime, Drama, Mystery'}, 'The Silence of the Lambs': {'ranking': '24', 'name': 'The Silence of the Lambs', 'year': '1991', 'imdb_rating': 8.6, 'votes': '1,489,061', 'genre': 'Crime, Drama, Thriller'}, 'The Green Mile': {'ranking': '25', 'name': 'The Green Mile', 'year': '1999', 'imdb_rating': 8.6, 'votes': '1,356,698', 'genre': 'Crime, Drama, Fantasy'}, 'Saving Private Ryan': {'ranking': '26', 'name': 'Saving Private Ryan', 'year': '1998', 'imdb_rating': 8.6, 'votes': '1,445,823', 'genre': 'Drama, War'}, 'Terminator 2: Judgment Day': {'ranking': '27', 'name': 'Terminator 2: Judgment Day', 'year': '1991', 'imdb_rating': 8.6, 'votes': '1,138,887', 'genre': 'Action, Sci-Fi'}, 'Star Wars': {'ranking': '28', 'name': 'Star Wars', 'year': '1977', 'imdb_rating': 8.6, 'votes': '1,410,486', 'genre': 'Action, Adventure, Fantasy'}, 'Sen to Chihiro no kamikakushi': {'ranking': '29', 'name': 'Sen to Chihiro no kamikakushi', 'year': '2001', 'imdb_rating': 8.6, 'votes': '806,573', 'genre': 'Animation, Adventure, Family'}, 'Cidade de Deus': {'ranking': '30', 'name': 'Cidade de Deus', 'year': '2002', 'imdb_rating': 8.6, 'votes': '779,885', 'genre': 'Crime, Drama'}, 'La vita è bella': {'ranking': '31', 'name': 'La vita è bella', 'year': '1997', 'imdb_rating': 8.6, 'votes': '720,584', 'genre': 'Comedy, Drama, Romance'}, "It's a Wonderful Life": {'ranking': '32', 'name': "It's a Wonderful Life", 'year': '1946', 'imdb_rating': 8.6, 'votes': '477,856', 'genre': 'Drama, Family, Fantasy'}, 'Shichinin no samurai': {'ranking': '33', 'name': 'Shichinin no samurai', 'year': '1954', 'imdb_rating': 8.6, 'votes': '357,156', 'genre': 'Action, Drama'}, 'Seppuku': {'ranking': '34', 'name': 'Seppuku', 'year': '1962', 'imdb_rating': 8.6, 'votes': '63,281', 'genre': 'Action, Drama, Mystery'}, 'Gladiator': {'ranking': '35', 'name': 'Gladiator', 'year': '2000', 'imdb_rating': 8.5, 'votes': '1,558,002', 'genre': 'Action, Adventure, Drama'}, 'The Prestige': {'ranking': '36', 'name': 'The Prestige', 'year': '2006', 'imdb_rating': 8.5, 'votes': '1,391,963', 'genre': 'Drama, Mystery, Sci-Fi'}, 'The Departed': {'ranking': '37', 'name': 'The Departed', 'year': '2006', 'imdb_rating': 8.5, 'votes': '1,376,163', 'genre': 'Crime, Drama, Thriller'}, 'Back to the Future': {'ranking': '38', 'name': 'Back to the Future', 'year': '1985', 'imdb_rating': 8.5, 'votes': '1,259,666', 'genre': 'Adventure, Comedy, Sci-Fi'}, 'Django Unchained': {'ranking': '39', 'name': 'Django Unchained', 'year': '2012', 'imdb_rating': 8.5, 'votes': '1,629,532', 'genre': 'Drama, Western'}, 'Gisaengchung': {'ranking': '40', 'name': 'Gisaengchung', 'year': '2019', 'imdb_rating': 8.5, 'votes': '882,754', 'genre': 'Drama, Thriller'}, 'Alien': {'ranking': '41', 'name': 'Alien', 'year': '1979', 'imdb_rating': 8.5, 'votes': '916,536', 'genre': 'Horror, Sci-Fi'}, 'Whiplash': {'ranking': '42', 'name': 'Whiplash', 'year': '2014', 'imdb_rating': 8.5, 'votes': '927,771', 'genre': 'Drama, Music'}, 'Léon': {'ranking': '43', 'name': 'Léon', 'year': '1994', 'imdb_rating': 8.5, 'votes': '1,205,981', 'genre': 'Action, Crime, Drama'}, 'The Usual Suspects': {'ranking': '44', 'name': 'The Usual Suspects', 'year': '1995', 'imdb_rating': 8.5, 'votes': '1,116,458', 'genre': 'Crime, Drama, Mystery'}, 'The Pianist': {'ranking': '45', 'name': 'The Pianist', 'year': '2002', 'imdb_rating': 8.5, 'votes': '874,314', 'genre': 'Biography, Drama, Music'}, 'The Lion King': {'ranking': '46', 'name': 'The Lion King', 'year': '1994', 'imdb_rating': 8.5, 'votes': '1,102,361', 'genre': 'Animation, Adventure, Drama'}, 'American History X': {'ranking': '47', 'name': 'American History X', 'year': '1998', 'imdb_rating': 8.5, 'votes': '1,155,786', 'genre': 'Crime, Drama'}, 'Psycho': {'ranking': '48', 'name': 'Psycho', 'year': '1960', 'imdb_rating': 8.5, 'votes': '696,715', 'genre': 'Horror, Mystery, Thriller'}, 'The Intouchables': {'ranking': '49', 'name': 'The Intouchables', 'year': '2011', 'imdb_rating': 8.5, 'votes': '895,241', 'genre': 'Biography, Comedy, Drama'}, 'Casablanca': {'ranking': '50', 'name': 'Casablanca', 'year': '1942', 'imdb_rating': 8.5, 'votes': '590,567', 'genre': 'Drama, Romance, War'}}
sorted_movie_data = sorted(movie_data.items(), key=lambda x: x[1]['imdb_rating'], reverse=True)
top_5_movies = sorted_movie_data[:5]
for movie_name, movie_info in top_5_movies:
    print(f"Movie: {movie_name}")
    print(f"IMDb Rating: {movie_info['imdb_rating']}")
    print(f"Year: {movie_info['year']}")
    print(f"Votes: {movie_info['votes']}")
    print(f"Genre: {movie_info['genre']}")
    print("=" * 30)


Movie: The Shawshank Redemption
IMDb Rating: 9.3
Year: 1994
Votes: 2,791,397
Genre: Drama
Movie: The Godfather
IMDb Rating: 9.2
Year: 1972
Votes: 1,944,229
Genre: Crime, Drama
Movie: The Dark Knight
IMDb Rating: 9.0
Year: 2008
Votes: 2,771,569
Genre: Action, Crime, Drama
Movie: Schindler's List
IMDb Rating: 9.0
Year: 1993
Votes: 1,403,821
Genre: Biography, Drama, History
Movie: The Godfather Part II
IMDb Rating: 9.0
Year: 1974
Votes: 1,320,870
Genre: Crime, Drama


**Analyze User Reviews for a Movie**

For a movie (https://www.imdb.com/title/tt13375076/reviews/?ref_=ttexr_ql_2) from the list and fetch its user reviews. Display a subset of user reviews along with their sentiments. Analyze user sentiment to determine whether reviews are positive, negative, or neutral.

**Implementation Steps**

In this section, I will explore how to gather user reviews from an IMDb page, check for the presence of negative sentiment using a list of predefined bad words, and provide feedback based on the analysis.
-

In [5]:
url="https://www.imdb.com/title/tt13375076/reviews/?ref_=ttexr_ql_2"

response = requests.get(url)
print(response.status_code)
soup = BeautifulSoup(response.content, 'html.parser')

reviews = soup.find_all('div',{'class':"lister-item-content"})
print(len(reviews))
user_reviews = []
for r in reviews:
    ur = r.find("div", {"class":"text show-more__control"}).text
    user_reviews.append(ur)
print(user_reviews)
bad_words = ['not good', 'pathetic', 'cannot','poor', 'disappointed', 'disappointment', 'bad', 'uninspired', 'negative']

def contains_bad_words(comment_text):
    comment_text_lower = comment_text.lower()
    for word in bad_words:
        if word in comment_text_lower:
            return True
    return False
user_reviews = enumerate(user_reviews, start=1)
for idx, rev in user_reviews:
    if contains_bad_words(rev):
        print(f"Quote {idx}: Contains bad words")
    else:
        print(f"Quote {idx}: Good")

200
25
["Like clockwork we have at least 2 possession films each year around the same time. Easter and Halloween it seems. They're usually hit or miss with the same cliches strewn about with maybe some attempts at being slightly innovative. While this doesn't break away too much from the plot formula, Crowe's powerhouse performance does and amplifies the overall film to enticing levels. He has clearly found his late career niche with thrillers/horrors of late and it works extremely well. The scares are serviceable and the story is as well. Visually it's one of the best possession films I've seen in quite sometime with fantastic settings and terrific practical and cgi features. This doesn't mean the cliches don't hinder it at times but overall a very welcome film for the genre.", "Father Gabriele Amorth is dispatched by The Vatican to assist a family, who's youngest member has become possessed by a powerful Demon, Amorth unearths all manner of secrets, some of which The Vatican would pr

**Summary**

With this project, I extended the web scraping implementation to include sorting the extracted movie data based on IMDb ratings and displaying the top 5 movies with the highest ratings. By using the sorted() function, I organized the movies in descending order and then presented key details such as the movie name, IMDb rating, release year, number of votes, and genre for each of the top 5 movies.