## Web Scraping
Web Scraping is the extraction of data from a website, and in this case, the Python library called **Beautiful Soup** will be used. The scraper loads the HTML code of the page the user wants to collect data from, then the scraper will either extract all the data on the page or the user will go through the process of selecting the specific data they want from the page. That is done by looking at the website’s HTML code and selecting the the specific element or tag that the desired information is in. 

### Data to Scrape
In this practical we will look at how to do web scraping on imdb.com to fetch information about movies with different genres using Python BeautifulSoup and requests. IMDB (Internet Movie Database) website is owned by Amazon, is one of the best platforms for finding information about films, television shows, web series, etc.

The data that we want to extract from it are:
* Movie title
* Release date
* Genre
* Movie length
* Movie certification
* Rating
* Metascore
* Description
* Votes

To extract all of this data, our scrapper will need to go inside each film’s webpage. Now let's start scrapping.

## Load Libraries
Before we begin, we need to import the libraries that will be used for this practical.

In [None]:
# Load packages
from bs4 import BeautifulSoup
import requests
import pandas as pd

## Getting URLs of different pages
The first thing we need to do is to get URLs of different movie genres, for example, the genres include Adventure, Animation, Drama, Comedy, Horror, etc.


## Parsing Movie Information
Now let's parse the movie information from IMDB. We will work with one genre first.

## Creating a scraping function
Now let's create a function that does the same as above but it can be reused several times for different URLs.

In [2]:
def get_movies(url):
    
    resp = request.get(url)
    content = BeautifulSoup(resp.content, 'lxml')
    
    movie_list = []
    
    for movie in content.select('.lister item-content'):
    
        try:
            # Creating a python dictionary
            data = {
                
                "title":movie.select('.lister-item-header')[0].get_text().strip(),
                "year":movie.select('.lister-item-year')[0].get_text().strip(),
                "certificate":movie.select('.certificate')[0].get_text().strip(),
                "time":movie.select('.runtime')[0].get_text().strip(),
                "genre":movie.select('.genre')[0].get_text().strip(),
                "rating":movie.select('.ratings-imdb-rating')[0].get_text().strip(),
                "metascore":movie.select('.ratings-metascore')[0].get_text().strip(),
                "simple_desc":movie.select('.text-muted')[2].get_text().strip(),
                "votes":movie.select('.sort-num_votes-visible')[0].get_text().strip(),
                
            }
        except IndexError:
            continue
            
        movie_list.append(data)
        
    return pd.DataFrame(movie_list)

In [3]:
:

SyntaxError: invalid syntax (1305497275.py, line 1)

## Scraping movies of different genres
The **get_movies()** function we write above can parse details from the IMDB web page of different genre URLs and can save them as a CSV file. So by using this function it is possible to scrape all genres that can be saved as separate CSV files. So let's see how this can be done.