# Web Scraping
**Web scraping** or **web harvesting** is either a manual or an automated process through which data is available on websites is extracted. The content of a page may be parsed, searched, reformatted, its data copied into a spreadsheet or loaded into a database. Web scrapers typically take something out of a page, to make use of it for another purpose somewhere else. 


## Prerequisites 
Python, as a versatile language, is very useful in helping automate this process. But, knowing Python and related packages and syntax is not sufficient. A programmer that uses Python must also have at least a basic understanding of **HTML** and **CSS**.

HTML stands for Hypertext Markup Language and every website on the internet uses it to display information. If you right click on a website and select "View Page Source" you can see the raw HTML of a web page. This is the information that Python will be looking at to grab information from.

CSS stands for Cascading Style Sheets, this is what gives "style" to a website, including colors and fonts, and even some animations! CSS uses tags such as id or class to connect an HTML element to a CSS feature, such as a particular color.

In some instances, there might be some **JavaScript** that is used to define the interactive elements of a webpage. But, as long as you stick to the HTML code, you just be able to bypass any JS code.

## Best Practices
There are certain rules to follow in order to be able to web scrape successfully. Most important, is to keep in mind that you should have permission to be able to web scrape.  Check a websites terms and conditions for more info.

Keep in mind that if you are sending requests to a website that does allow for automated web scrapping, you might get your IP blocked. Some websites might have software that blocks scraping.

Every website is unique, so you are not able to clone your code for other websites.

Also, because websites change all the time, you must be sure to design your code so that you are able to both adapt it and keep in in the same mindset.

# Web Scraping Libraries

There are a number of popular modules which are employed for web scraping with Python, but we are going to just focus on the three most popular.

## Requests: HTTP for Humans
... or simply known as *requests*.

**Requests** is a Python library used for making various types of HTTP requests like GET, POST, etc. Because of its simplicity and ease of use, it comes with the motto of HTTP for Humans.

[Documentation](https://docs.python-requests.org/en/latest/)

## lxml
We know the requests library cannot parse the HTML retrieved from a web page. It combines the speed and power of Element trees with the simplicity of Python. It works well when we’re aiming to scrape large datasets. The combination of requests and lxml is very common in web scraping. It also allows you to extract data from HTML using XPath and CSS selectors.

[Documentation](https://lxml.de/)

## Beautiful Soup
BeautifulSoup is perhaps the most widely used Python library for web scraping. It creates a parse tree for parsing HTML and XML documents. One of the primary reasons the Beautiful Soup library is so popular is that it is easier to work with and well suited for beginners. We can also combine Beautiful Soup with other parsers like lxml. But all this ease of use comes with a cost – it is slower than lxml. Even while using lxml as a parser, it is slower than pure lxml.

[Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

In [1]:
import requests
import lxml
import bs4