Skip to content

Commit a4d4f86

Browse files
authoredJun 8, 2020
Update README.md
1 parent 0d7d97c commit a4d4f86

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed
 

‎README.md

+13
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,16 @@ To run our example scraper, you are going to need these libraries:
2929
If you’re here that means you are interested in finding out more about how to scrape and enjoy all the data that you gather. However, before we dive into it, we first need to understand what web scraping is. In general terms, scraping is the process of acquiring a web page with all of its information and then extracting selected fields for further processing. Usually the purpose of gathering that information is so that a person could easily monitor it. Some examples could be reviews, prices, weather reports, billboard hits,and so on.
3030

3131
## Be polite
32+
33+
Just as you are polite and caring in the real world, you should be such online as well. Before you start scraping, make sure that the website you’re targeting allows it. You can do that by checking its Robots.txt file. If the site doesn’t condone crawling or scraping of its content, be kind and respect the owner’s wishes. Failing to do so might get your IP blocked or even lead to legal action taken against you, so be wary. Moreover, check if the site you’re targeting has an API. If it does, just use that – it will be easier to get the needed data, and you won’t put unnecessary load on the sites infrastructures.
34+
35+
## Let’s get to it
36+
37+
In the following tutorial, you will not only see how a basic scraper is written but will also learn how to adjust it to your own needs. Moreover, you will learn how to do it via a proxy!
38+
39+
As mentioned, we will be using these libraries:
40+
Requests
41+
BeautifulSoup 4
42+
The page we’re going to scrape is http://books.toscrape.com/. It doesn’t have robots.txt, but I think we can agree that the name of the site is asking you to scrape it. But before we carry on with the coding part, let's inspect the website first.
43+
44+
So, this is what the main page of the website looks like. We can see it contains books, their titles, prices, ratings, availability information, and a list of genres in the sidebar.

0 commit comments

Comments
 (0)