An simple example of how to perform web scraping by using the Scrapy framework and the Yelp website as target.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
First you need to have the latest version of pip installed in your computer to be able to install de project dependencies, you can check the pip installation guide if you do not already have installed.
yelp-scraper depends on the Scrapy Python framework, you can install the latest version by using the following command on your terminal:
$ (sudo) pip install scrapy
From here I'am assuming that you already have all prerequisites installed and properly configured in your machine.
Clone the repo or download it.
Open your terminal and change to into the project folder:
$ cd ~/<folder>/yelp-scaper
Where <folder> is where you downloaded or cloned the repo.
Then you can start the scraping process by using the following command:
$ scrapy crawl yelp -a find='something' -a near='somewhere'
Note: All arguments must be preceded by the -a argument, this is required by Scrapy.
find: This argument is required. The possible values for this argument are the same which you can use in the Yelp website, for example:
- Restaurants, Nightlife, Air Conditioning & Heating, Contractors, Electricians, Home Cleaners, Landscapers, Locksmiths, Movers, Painters, Plumbers.
near: This argument is required. The possible values for this argument are the same which you can use in the Yelp website, for example:
- London, San Francisco, etc...
max_results: This argument is optional, and your default value is 3. This argument allows you to limit the amount of results that the Scrapy will scrape from the website.
Feel free to make your suggestion and/or contribution.
This project is licensed under the MIT License - see the LICENSE file for details