**Steps to implement web scraping in python to extract IMDb movie ratings and its ratings:**

*   Import the required modules.



In [45]:
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd



*   Access the HTML content from the webpage by assigning the URL and creating a soap object.



In [46]:
# URL of the IMDb Top Rated Movies page
url = "https://www.imdb.com/chart/top/"

# Headers to mimic a browser request
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36'}

# Send a GET request to the URL
response = requests.get(url, headers=headers)

# Parse the HTML content using BeautifulSoup
beauty = BeautifulSoup(response.content, 'html.parser')



*   Extract the movie ratings and their details. Here, we are extracting data from the BeautifulSoup object using Html tags like href, title, etc.



In [47]:
# Find the <script> tag with type="application/ld+json"
script_tag = beauty.find('script', type='application/ld+json')

# Extract the JSON data from the script tag
json_data = json.loads(script_tag.string)

# Print the JSON data (or process it as needed)
movies = [item["item"] for item in json_data["itemListElement"][:1]]
print(json.dumps(movies, indent=1))

[
 {
  "@type": "Movie",
  "url": "https://www.imdb.com/title/tt0111161/",
  "name": "The Shawshank Redemption",
  "description": "A banker convicted of uxoricide forms a friendship over a quarter century with a hardened convict, while maintaining his innocence and trying to remain hopeful through simple compassion.",
  "image": "https://m.media-amazon.com/images/M/MV5BMDAyY2FhYjctNDc5OS00MDNlLThiMGUtY2UxYWVkNGY2ZjljXkEyXkFqcGc@._V1_.jpg",
  "aggregateRating": {
   "@type": "AggregateRating",
   "bestRating": 10,
   "worstRating": 1,
   "ratingValue": 9.3,
   "ratingCount": 3070035
  },
  "contentRating": "R",
  "genre": "Drama",
  "duration": "PT2H22M"
 }
]




*   After extracting the movie details, create an empty list and store the details in a dictionary, and then add them to a list.



In [48]:
result = []
data = json_data['itemListElement']
for index in data:

  movie_name = index['item']['name'].replace("&apos;", "'")
  movie_genre = index['item']['genre']
  movie_rating = index['item']['aggregateRating']['ratingValue']
  movie_duration = index['item']['duration'].replace("PT", "")
  final_data = {"Movie": movie_name,
                "Genre": movie_genre,
                "Rating": movie_rating,
                "Duration": movie_duration
                }
  result.append(final_data)



*   Now our list is filled with top IMBD movies along with their details. Then display the list of movie details.



In [49]:
for movie in result:
    print(movie['Movie'], movie['Genre'], '(' + str(movie['Rating']) + ')', movie['Duration'])

The Shawshank Redemption Drama (9.3) 2H22M
The Godfather Crime, Drama (9.2) 2H55M
The Dark Knight Action, Crime, Drama (9) 2H32M
The Godfather Part II Crime, Drama (9) 3H22M
12 Angry Men Crime, Drama (9) 1H36M
The Lord of the Rings: The Return of the King Adventure, Drama, Fantasy (9) 3H21M
Schindler's List Biography, Drama, History (9) 3H15M
Pulp Fiction Crime, Drama (8.8) 2H34M
The Lord of the Rings: The Fellowship of the Ring Adventure, Drama, Fantasy (8.9) 2H58M
Il buono, il brutto, il cattivo Adventure, Drama, Western (8.8) 2H58M
Forrest Gump Drama, Romance (8.8) 2H22M
The Lord of the Rings: The Two Towers Adventure, Drama, Fantasy (8.8) 2H59M
Fight Club Crime, Drama, Thriller (8.8) 2H19M
Inception Action, Adventure, Sci-Fi (8.8) 2H28M
Star Wars: Episode V - The Empire Strikes Back Action, Adventure, Fantasy (8.7) 2H4M
The Matrix Action, Sci-Fi (8.7) 2H16M
GoodFellas Biography, Crime, Drama (8.7) 2H25M
Interstellar Adventure, Drama, Sci-Fi (8.7) 2H49M
One Flew Over the Cuckoo's Ne



*   By using the following lines of code the same data can be saved into a .csv file be further used as a dataset.



In [50]:
#saving the result as dataframe
#then converting into .csv file
df = pd.DataFrame(result)
df.to_csv('imdb_top_250_movies.csv',index=False)