## Let’s start writing the script by requesting the content of this single web page:
http://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1. 

#### The URL has several parameters after the question mark:

- **release_date** — Shows only the movies released in a **specific year.**
- **sort** — Sorts the movies on the page. **sort=num_votes**,desc translates to sort by number of votes in a descending order.
- **page** — Specifies the **page number**.
- **ref_** — Takes us to the the next or the previous page.The reference is the page we are currently on.**adv_nxt and adv_prv**are **two possible values**.They translate to advance to the next page, and advance to the previous page, respectively.

![Imgur](https://i.imgur.com/szahWTe.png)

#### In the following code cell we will:

- Import the **get()** function from the **requests** module.
- Assign the **address** of the web page to a variable named **url.**
- Request the server the **content** of the web page by using **get()**, and store the server’s response in the variable response.
- Print a small part of **response‘s content** by accessing its **.text attribute** (response is now a Response object).

In [1]:
from requests import get
url = 'http://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1'
response = get(url)
print(response.text[:500])




<!DOCTYPE html>
<html
    xmlns:og="http://ogp.me/ns#"
    xmlns:fb="http://www.facebook.com/2008/fbml">
    <head>
         
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">

    <meta name="apple-itunes-app" content="app-id=342792525, app-argument=imdb:///?src=mdot">



        <script type="text/javascript">var IMDbTimer={starttime: new Date().getTime(),pt:'java'};</script>

<script>
    if (typeof uet == 'function') {
      uet("bb", "LoadTitle"


#### Lets check the status code

In [2]:
response.status_code

200

## Using Beautifulsoup to parse the Html content 

#### In the following code cell we will:

- Import the BeautifulSoup class creator from the package bs4.
- Parse response.text by creating a BeautifulSoup object, and assign this object to html_soup. 
- The 'html.parser' argument indicates that we want to do the parsing using Python’s built-in HTML parser.

In [3]:
from bs4 import BeautifulSoup
html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)

bs4.BeautifulSoup

#### If you inspect the HTML lines of the containers of interest, you’ll notice that the class attribute has two values: lister-item and mode-advanced. 

- Now let’s use the **find_all()** method to extract all the div containers that have a class attribute of **lister-item mode-advanced:**

In [4]:
movie_containers = html_soup.find_all('div', class_ = 'lister-item mode-advanced')
print(type(movie_containers))
print(len(movie_containers))

<class 'bs4.element.ResultSet'>
50


#### Now we’ll select only the first container, and extract, by turn, each item of interest:

- The name of the movie.
- The year of release.
- The IMDB rating.
- The Metascore.
- The number of votes.

![Imgur](https://i.imgur.com/Nh6UjKX.jpg)

## Extracting the data for a single movie

- We can access the first container, which contains information about a **single movie**, by using list notation on movie_containers.

In [5]:
first_movie = movie_containers[0]

In [18]:
print(first_movie)

<div class="lister-item mode-advanced">
<div class="lister-top-right">
<div class="ribbonize" data-caller="filmosearch" data-tconst="tt3315342"></div>
</div>
<div class="lister-item-image float-left">
<a href="/title/tt3315342/"> <img alt="Logan" class="loadlate" data-tconst="tt3315342" height="98" loadlate="https://m.media-amazon.com/images/M/MV5BYzc5MTU4N2EtYTkyMi00NjdhLTg3NWEtMTY4OTEyMzJhZTAzXkEyXkFqcGdeQXVyNjc1NTYyMjg@._V1_UX67_CR0,0,67,98_AL_.jpg" src="https://m.media-amazon.com/images/G/01/imdb/images/nopicture/large/film-184890147._CB470041630_.png" width="67"/>
</a> </div>
<div class="lister-item-content">
<h3 class="lister-item-header">
<span class="lister-item-index unbold text-primary">1.</span>
<a href="/title/tt3315342/">Logan</a>
<span class="lister-item-year text-muted unbold">(2017)</span>
</h3>
<p class="text-muted">
<span class="certificate">A</span>
<span class="ghost">|</span>
<span class="runtime">137 min</span>
<span class="ghost">|</span>
<span class="genre">
Act

![Imgur](https://i.imgur.com/hWr7u0D.jpg)

#### If we run first_movie.div, we only get the content of the first div tag:

In [6]:
first_movie.div

<div class="lister-top-right">
<div class="ribbonize" data-caller="filmosearch" data-tconst="tt3315342"></div>
</div>

- Accessing the first **"anchor tag"** doesn’t take us to the movie’s name.
- The first **a** is somewhere within the second div:

In [7]:
first_movie.a

<a href="/title/tt3315342/"> <img alt="Logan" class="loadlate" data-tconst="tt3315342" height="98" loadlate="https://m.media-amazon.com/images/M/MV5BYzc5MTU4N2EtYTkyMi00NjdhLTg3NWEtMTY4OTEyMzJhZTAzXkEyXkFqcGdeQXVyNjc1NTYyMjg@._V1_UX67_CR0,0,67,98_AL_.jpg" src="https://m.media-amazon.com/images/G/01/imdb/images/nopicture/large/film-184890147._CB470041630_.png" width="67"/>
</a>

#### However, accessing the first h3 tag brings us very close:

In [8]:
first_movie.h3

<h3 class="lister-item-header">
<span class="lister-item-index unbold text-primary">1.</span>
<a href="/title/tt3315342/">Logan</a>
<span class="lister-item-year text-muted unbold">(2017)</span>
</h3>

 - From here, we can use **attribute notation** to access the first **a inside the h3tag**

In [9]:
first_movie.h3.a

<a href="/title/tt3315342/">Logan</a>

In [10]:
first_name = first_movie.h3.a.text

In [11]:
print(first_name)

Logan


## Lets find out the year of the movie’s release

- We move on with extracting the year.
- This data is stored within the <span> tag below the <a> that contains the name.

![Imgur](https://i.imgur.com/UOLcNd4.png)

- Dot notation will only access the **first span element.** 
- We’ll search by the distinctive mark of the **second "span".**
- We’ll use the **find()** method which is almost the same as **find_all()**, except that it only returns the first match.
- In fact, find() is equivalent to **find_all(limit = 1)**.
- The limit argument **limits** the output to the first match.

#### The distinguishing mark consists of the values lister-item-year text-muted unbold assigned to the class attribute.
#### So we look for the first "span" with these values within the "h3" tag:

In [12]:
first_year = first_movie.h3.find('span', class_ = 'lister-item-year text-muted unbold')
first_year

<span class="lister-item-year text-muted unbold">(2017)</span>

#### From here, we just access the text using attribute notation:

In [13]:
first_year = first_year.text
first_year

'(2017)'

## Lets find out the Imdb Rating:

- If you inspect the IMDB rating using DevTools, you’ll notice that the rating is contained within a <strong> tag.

![Imgur](https://i.imgur.com/FRePWmp.png)

In [14]:
first_movie.strong

<strong>8.1</strong>

####  We’ll access the text, convert it to the float type, and assign it to the variable first_imdb:

In [15]:
first_imdb = float(first_movie.strong.text)
first_imdb

8.1

## Lets find the Metascore

- If we inspect the Metascore using **DevTools**, we’ll notice that we can find it within a span tag.

![Imgur](https://i.imgur.com/yVMIUbm.png)

#### Problem:
- There are many **span** tags before that.
- You can see one right above the **strong** tag. 
- We’d better use the distinctive values of the class attribute **(metascore favorable).**

In [16]:
first_mscore = first_movie.find('span', class_ = 'metascore favorable')
first_mscore = int(first_mscore.text)
print(first_mscore)

77


### Note
- If you copy-paste those values from **DevTools’ tab**, there will be two white space characters between **metascore and favorable.** 
- Make sure there will be only **one whitespace character** when you pass the **values as arguments** to the class_ parameter.
- Otherwise, **find()** won’t find anything.

## Lets find the number of votes

- The number of votes is contained within a **span tag.**
- Its distinctive mark is a **name attribute** with **the value nv.**

![Imgur](https://i.imgur.com/3W4WGBA.png)

In [17]:
first_votes = first_movie.find('span', attrs = {'name':'nv'})
first_votes

<span data-value="573425" name="nv">573,425</span>

In [18]:
first_votes = first_movie.find('span', attrs = {'name':'nv'})
first_votes

<span data-value="573425" name="nv">573,425</span>

#### Points to Remember:
- We could use **.text** notation to access the **span tag’s** content. 
- It would be better though if we accessed the value of the **data-value attribute.**
- This way we can convert the extracted datapoint to an int without having to **strip a comma.**

- You can treat a **Tag object** just like a dictionary.
- The HTML attributes are the **dictionary’s keys.**
- The values of the **HTML attributes** are the values of the dictionary’s keys.

In [19]:
first_votes['data-value']

'573425'

#### Let’s convert that value to an integer, and assign it to first_votes:

In [20]:
first_votes = int(first_votes['data-value'])

## The script for a single page

- Before piecing together what we’ve done so far, we have to make sure that we’ll **extract the data** only from the **containers that have a Metascore.**
- We need to add a condition to **skip movies** without a Metascore.

![Imgur](https://i.imgur.com/Nt3OrwK.jpg)

#### We need to add a condition to skip movies without a Metascore
- Using **DevTools** again, we see that the **Metascore section** is contained within a **"div" tag.**
- The class attribute has two values:**inline-block and ratings-metascore.**
- The distinctive one is clearly **ratings-metascore**.

![Imgur](https://i.imgur.com/Pve6KHt.png)

In [21]:
eighth_movie_mscore = movie_containers[7].find('div', class_ = 'ratings-metascore')
type(eighth_movie_mscore)

bs4.element.Tag

### In the next code block we:

- Declare some **list variables** to have something to store the extracted data in.
- Loop through each container in **movie_containers** (the variable which contains all the 50 movie containers).
- Extract the **data points** of interest only if the container has a **Metascore.**

In [22]:
# Lists to store the scraped data in
names = []
years = []
imdb_ratings = []
metascores = []
votes = []
# Extract data from individual movie container
for container in movie_containers:
# If the movie has Metascore, then extract:
    if container.find('div', class_ = 'ratings-metascore') is not None:
# The name
        name = container.h3.a.text
        names.append(name)
    # The year
        year = container.h3.find('span', class_ = 'lister-item-year').text
        years.append(year)
    # The IMDB rating
        imdb = float(container.strong.text)
        imdb_ratings.append(imdb)
    # The Metascore
        m_score = container.find('span', class_ = 'metascore').text
        metascores.append(int(m_score))
    # The number of votes
        vote = container.find('span', attrs = {'name':'nv'})['data-value']
        votes.append(int(vote))

### Lets check the data collected so far:

In [23]:
import pandas as pd
test_df = pd.DataFrame({'movie': names,
'year': years,
'imdb': imdb_ratings,
'metascore': metascores,
'votes': votes
})
print(test_df.info())
test_df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 43 entries, 0 to 42
Data columns (total 5 columns):
movie        43 non-null object
year         43 non-null object
imdb         43 non-null float64
metascore    43 non-null int64
votes        43 non-null int64
dtypes: float64(1), int64(2), object(2)
memory usage: 1.8+ KB
None


Unnamed: 0,movie,year,imdb,metascore,votes
0,Logan,(2017),8.1,77,573425
1,Thor: Ragnarok,(2017),7.9,74,503658
2,Guardians of the Galaxy Vol. 2,(2017),7.6,67,499410
3,Wonder Woman,(2017),7.4,76,497611
4,Dunkirk,(2017),7.9,94,477577
5,Star Wars: Episode VIII - The Last Jedi,(2017),7.1,85,472072
6,Spider-Man: Homecoming,(2017),7.5,73,447821
7,Get Out,(I) (2017),7.7,84,416123
8,Blade Runner 2049,(2017),8.0,81,387307
9,Baby Driver,(2017),7.6,86,377328


## The script for multiple pages
- **Scraping multiple pages** is a bit more challenging. 
- We’ll build upon our **one-page script** by doing three more things:

      - Making all the requests we want from within the loop.
      - Controlling the loop’s rate to avoid bombarding the 
        server with requests.
      - Monitoring the loop while it runs.

### Points to remember
- We’ll scrape the **first 4 pages** of each year in the interval **2000-2017**. 
- 4 pages for each of the 18 years makes for a **total of 72 pages.**
- Each page has **50 movies**, so we’ll scrape data for **3600 movies** at most. 
- But **not** all the movies have a **Metascore**, so the number will be lower than that. 
- Even so, we are still very likely to get **data for over 2000 movies.**

- As we are making the requests, we’ll only have to vary the values of only two parameters of the URL: 
**the release_date parameter** and **page.**

In [24]:
pages = [str(i) for i in range(1,5)]
years_url = [str(i) for i in range(2000,2018)]

## Controlling the crawl -  rate

- If we avoid hammering the s**erver with tens of requests** per second, then we are much less likely to get our **IP address banned**. 

#### We’ll control the loop’s rate by using 

- **sleep():** from Python’s time module.
    - It will pause the execution of the loop for a specified amount of seconds.
    
- **randint():** from the Python’s random module. 
    - we’ll vary the amount of waiting time between requests.
    - It randomly generates integers within a specified interval.


In [25]:
from time import sleep
from random import randint

## Monitoring the loop as it’s still going

-  The greater the **number of pages**, the more helpful the **monitoring** becomes.
- It can be very **helpful** in the **testing and debugging process.**

#### For our script, we’ll make use of this feature, and monitor the following parameters:

- The frequency (speed) of requests, so we make sure our program is **not overloading the server** by dividing the **number of requests by the time elapsed**
- The number of requests, so we can halt the loop in case the **number of expected requests** is exceeded.
- The status code of our requests, so we make sure the **server is sending** back the **proper responses.**

### Steps Followed:

- Set a starting time using the **time() function** from the time module, and assign the value to start_time.
- Assign 0 to the variable requests which we’ll use to count the **number of requests.**
- Start a loop, and then with each iteration:
    - Simulate a **request.**
    - Increment the **number of requests** by 1.
    - Pause the loop for a time interval between **8 and 15 seconds.**
    - Calculate the **elapsed time** since the first request, and assign the value to elapsed_time.
    - Print the **number of requests** and the **frequency.**

In [26]:
from time import time
start_time = time()
requests = 0
for _ in range(5):
# A request would go here
    requests += 1
    sleep(randint(1,3))
    elapsed_time = time() - start_time
    print('Request: {}; Frequency: {} requests/s'.format(requests, requests/elapsed_time))

Request: 1; Frequency: 0.3331971989631756 requests/s
Request: 2; Frequency: 0.49942764667495376 requests/s
Request: 3; Frequency: 0.4994856250501156 requests/s
Request: 4; Frequency: 0.4994540668484628 requests/s
Request: 5; Frequency: 0.4995029745901937 requests/s


- Since we’re going to make **72 requests**, our work will look a bit untidy as the **output accumulates.**
- To avoid that, we’ll **clear** the output after **each iteration**, and **replace** it with information about the **most recent request.** 
- To do that we’ll use the **clear_output()function** from the **IPython’s core.display module.**

In [27]:
from IPython.core.display import clear_output
start_time = time()
requests = 0
for _ in range(5):
# A request would go here
    requests += 1
    sleep(randint(1,3))
    current_time = time()
    elapsed_time = current_time - start_time
    print('Request: {}; Frequency: {} requests/s'.format(requests, requests/elapsed_time))
clear_output(wait = True)

Request: 1; Frequency: 0.3328359671679609 requests/s
Request: 2; Frequency: 0.49900931107204033 requests/s
Request: 3; Frequency: 0.42797243012458563 requests/s
Request: 4; Frequency: 0.49910599685522944 requests/s
Request: 5; Frequency: 0.45377494675238506 requests/s


In [28]:
from warnings import warn 
warn("Warning Simulation")

  


- If you run the code from a country where **English is not the main language,** it’s very likely that you’ll get some of the movie names translated into the **main language of that country.**

#### This may happen if you’re using a VPN while you’re making the GET requests.

In [None]:
headers = {"Accept-Language": "en-US, en;q=0.5"}
# The q parameter indicates the degree to which we prefer a certain language. If not specified,
# then the values is set to 1 by default, like in the case of en-US.

## Piecing everything together

## Now let’s piece together everything we’ve done so far! 
In the following code cell, we start by:

- **Redeclaring the lists** variables so they become empty again.
- Preparing the monitoring of the loop.
Then, we’ll:

- Loop through the **years_url list** to vary the **release_date parameter of the URL.**
- For each element in years_url, loop through the **pages list** to vary the **page parameter of the URL.**
- Make the **GET requests** within the pages loop (and give the headers parameter the right value to make sure we get only English content).
- **Pause the loop** for a time interval between **8 and 15 seconds.**
- Monitor each request as discussed before.
- **Throw a warning** for **non-200 status codes.**
- Break the loop if the **number of requests** is greater than expected.
- Convert the **response‘s HTML content** to a **BeautifulSoup object**.
- Extract all **movie containers** from this **BeautifulSoup object.**
- **Loop** through all these **containers.**
- **Extract the data** if a container has a **Metascore.**

In [32]:
# Redeclaring the lists to store data in
names = []
years = []
imdb_ratings = []
metascores = []
votes = []

# Preparing the monitoring of the loop
start_time = time()
requests = 0

# For every year in the interval 2000-2017
for year_url in years_url:

    # For every page in the interval 1-4
    for page in pages:

        # Make a get request
        response = get('http://www.imdb.com/search/title?release_date=' + year_url +
        '&sort=num_votes,desc&page=' + page, headers = headers)

        # Pause the loop
        sleep(randint(8,15))

        # Monitor the requests
        requests += 1
        elapsed_time = time() - start_time
        print('Request:{}; Frequency: {} requests/s'.format(requests, requests/elapsed_time))
        clear_output(wait = True)

        # Throw a warning for non-200 status codes
        if response.status_code != 200:
            warn('Request: {}; Status code: {}'.format(requests, response.status_code))

        # Break the loop if the number of requests is greater than expected
        if requests > 72:
            warn('Number of requests was greater than expected.')
            break

        # Parse the content of the request with BeautifulSoup
        page_html = BeautifulSoup(response.text, 'html.parser')

        # Select all the 50 movie containers from a single page
        mv_containers = page_html.find_all('div', class_ = 'lister-item mode-advanced')

        # For every movie of these 50
        for container in mv_containers:
            # If the movie has a Metascore, then:
            if container.find('div', class_ = 'ratings-metascore') is not None:

                # Scrape the name
                name = container.h3.a.text
                names.append(name)

                # Scrape the year
                year = container.h3.find('span', class_ = 'lister-item-year').text
                years.append(year)

                # Scrape the IMDB rating
                imdb = float(container.strong.text)
                imdb_ratings.append(imdb)

                # Scrape the Metascore
                m_score = container.find('span', class_ = 'metascore').text
                metascores.append(int(m_score))

                # Scrape the number of votes
                vote = container.find('span', attrs = {'name':'nv'})['data-value']
                votes.append(int(vote))

Request:72; Frequency: 0.06953925173061648 requests/s


#### Now let’s merge the data into a pandas DataFrame to examine what we’ve managed to scrape.

## Examining the scraped data

#### In the next code block we:

- **Merge the data** into a **pandas DataFrame.**
- Print some informations about the **newly created DataFrame.**
- Show the **first 10 entries.**

In [33]:
movie_ratings = pd.DataFrame({'movie': names,
'year': years,
'imdb': imdb_ratings,
'metascore': metascores,
'votes': votes
})
print(movie_ratings.info())
movie_ratings.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3276 entries, 0 to 3275
Data columns (total 5 columns):
movie        3276 non-null object
year         3276 non-null object
imdb         3276 non-null float64
metascore    3276 non-null int64
votes        3276 non-null int64
dtypes: float64(1), int64(2), object(2)
memory usage: 128.0+ KB
None


Unnamed: 0,movie,year,imdb,metascore,votes
0,Gladiator,(2000),8.5,67,1228428
1,Memento,(2000),8.4,80,1041639
2,Snatch,(2000),8.3,55,726707
3,Requiem for a Dream,(2000),8.3,68,708268
4,X-Men,(2000),7.4,64,541810
5,Cast Away,(2000),7.8,73,478440
6,American Psycho,(2000),7.6,64,437159
7,Unbreakable,(2000),7.3,62,358201
8,Meet the Parents,(2000),7.0,73,294992
9,Mission: Impossible II,(2000),6.1,59,293940


### Observation:
- The output of info() shows we collected data for well over 2000 movies.
- We can also see that there are no null values in our dataset whatsoever.

## Cleaning the scraped data

#### our data cleaning will consist of:

- Reordering the columns.
- Cleaning the year column and convert the values to integers.
- Checking the extreme rating values to determine if all the ratings are within the expected intervals.
- Normalizing one of the ratings type (or both) for generating a comparative histogram.

In [34]:
movie_ratings = movie_ratings[['movie', 'year', 'imdb', 'metascore', 'votes']]
movie_ratings.head()

Unnamed: 0,movie,year,imdb,metascore,votes
0,Gladiator,(2000),8.5,67,1228428
1,Memento,(2000),8.4,80,1041639
2,Snatch,(2000),8.3,55,726707
3,Requiem for a Dream,(2000),8.3,68,708268
4,X-Men,(2000),7.4,64,541810


#### Now let’s convert all the values in the year column to integers.

In [35]:
movie_ratings['year'].unique()

array(['(2000)', '(I) (2000)', '(2001)', '(2002)', '(2003)', '(2004)',
       '(I) (2004)', '(2005)', '(I) (2005)', '(2006)', '(I) (2006)',
       '(2007)', '(I) (2007)', '(2008)', '(I) (2008)', '(2009)',
       '(I) (2009)', '(2010)', '(I) (2010)', '(2011)', '(I) (2011)',
       '(2012)', '(I) (2012)', '(2013)', '(I) (2013)', '(2014)',
       '(I) (2014)', '(II) (2014)', '(2015)', '(I) (2015)', '(II) (2015)',
       '(2016)', '(II) (2016)', '(I) (2016)', '(IX) (2016)', '(2017)',
       '(I) (2017)'], dtype=object)

- Counting from the end toward beginning, we can see that the years are always located from the **fifth character to the second.**
- We’ll use the **.str() method** to select only that interval. 
- We’ll also convert the result to an integer using the **astype() method:**

In [36]:
movie_ratings.loc[:, 'year'] = movie_ratings['year'].str[-5:-1].astype(int)

In [37]:
movie_ratings['year'].head(3)

0    2000
1    2000
2    2000
Name: year, dtype: int32

#### we’ll check the minimum and maximum values of each type of rating. We can do this very quickly by using pandas’ describe() method.

In [38]:
movie_ratings.describe().loc[['min', 'max'], ['imdb', 'metascore']]

Unnamed: 0,imdb,metascore
min,4.1,24.0
max,9.0,100.0


#### There are no unexpected outliers.

#### Let’s normalize the imdb column to a 100-points scale.

In [39]:
movie_ratings['n_imdb'] = movie_ratings['imdb'] * 10
movie_ratings.head(3)

Unnamed: 0,movie,year,imdb,metascore,votes,n_imdb
0,Gladiator,2000,8.5,67,1228428,85.0
1,Memento,2000,8.4,80,1041639,84.0
2,Snatch,2000,8.3,55,726707,83.0


In [40]:
movie_ratings.to_csv('movie_ratings.csv')

## Plotting and analyzing the distributions

In the following code cell we:

- Import the **matplotlib.pyplot** submodule.
- Run the Jupyter magic **%matplotlib** to activate Jupyter’s matplotlib mode and add inline to have our graphs displayed inside the notebook.
- Create a **figure object** with 3 axes.
- Plot the distribution of each **unnormalized rating** on an individual ax.
- Plot the **normalized distributions** of the two ratings on the same ax.
- Hide the **top and right spines** of all the three axes.

In [41]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(nrows = 1, ncols = 3, figsize = (16,4))
ax1, ax2, ax3 = fig.axes
ax1.hist(movie_ratings['imdb'], bins = 10, range = (0,10)) # bin range = 1
ax1.set_title('IMDB rating')
ax2.hist(movie_ratings['metascore'], bins = 10, range = (0,100)) # bin range = 10
ax2.set_title('Metascore')
ax3.hist(movie_ratings['n_imdb'], bins = 10, range = (0,100), histtype = 'step')
ax3.hist(movie_ratings['metascore'], bins = 10, range = (0,100), histtype = 'step')
ax3.legend(loc = 'upper left')
ax3.set_title('The Two Normalized Distributions')
for ax in fig.axes:
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
plt.show()

No handles with labels found to put in legend.


<Figure size 1600x400 with 3 Axes>

### Observations:
- we can see that most ratings are between 6 and 8. 
- There are few movies with a rating greater than 8, and even fewer with a rating smaller than 4.
- This indicates that both very good movies and very bad movies are rarer.