# Introduction to Web Scraping

Web scraping is a technique to automatically access and extract large amounts of information from a website, which can save a huge amount of time and effort. In this guide, we’ll be touring the essential stack of Python web scraping libraries. We’ll show you the tricks to get the job done with just a few lines of code.

### The Main things we need to understand:

- Rule of web scraping
- Limitation of web scraping
- Basic HTML and CSS

### Rule of web scraping

- Always check a website’s Terms and Conditions before you scrape it. Be careful to read the statements about legal use of data. Usually, the data you scrape should not be used for commercial purposes.

- If you make too many requests too quickly, the target website might block you. To be on the safe side, make one request per second.

- Some sites automatically block IP addresses that send too many requests. By rotating your IP addresses, you can make the scraper requests look like they are coming from different computers.

### The limitation of web scraping

- In general, web scraping is legal as long as you use it in an ethical way and don’t violate the website’s Terms of Service. Instead of copying and pasting data from a website, you can use an API to fetch data from a web server. With APIs, you can avoid parsing HTML and instead access the data directly using formats like JSON and XML.

- A slight change or update to the website may completely break your scraper. In this case, you have to rewrite your CSS locator.

### Basic HTML and CSS

- HTML is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS), we can define how HTML elements are displayed. HTML and CSS are the fundamental technologies for building web pages.

- CSS is a language that describes the style of an HTML document. CSS describes how HTML elements should be displayed. This tutorial will teach you CSS from basic to advanced.

- JavaScript is the programming language of HTML and the Web programming language that adds interactivity to your website. 

### The main libraries for web scraping in Python

- Requests: Requests is a simple and elegant Python HTTP library. It provides methods for accessing Web resources via HTTP.

In [1]:
pip install requests

Note: you may need to restart the kernel to use updated packages.


- lxml: lxml is a Python library which allows for easy handling of XML and HTML files, and can also be used for web scraping. It is a very fast and extensible tool, and apart from parsing XML, it can also be used for creating and modifying XML and HTML files.

In [2]:
pip install lxml

Note: you may need to restart the kernel to use updated packages.


- beautifulsoup4: Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.

In [3]:
pip install bs4

Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25ldone
[?25h  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1272 sha256=944681bd1cd789e6c99eb3a658b8f19f01909b93eacd49c30fd9b61506b3d567
  Stored in directory: /home/nidhood/.cache/pip/wheels/75/78/21/68b124549c9bdc94f822c02fb9aa3578a669843f9767776bca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1
Note: you may need to restart the kernel to use updated packages.


In [None]:
import requests
import bs4
