Skip to content

An simple example of how to perform web scraping by using the Scrapy framework and the Yelp website as target

License

Notifications You must be signed in to change notification settings

Eustacio/yelp-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

yelp-scraper

An simple example of how to perform web scraping by using the Scrapy framework and the Yelp website as target.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

First you need to have the latest version of pip installed in your computer to be able to install de project dependencies, you can check the pip installation guide if you do not already have installed.

yelp-scraper depends on the Scrapy Python framework, you can install the latest version by using the following command on your terminal:

$ (sudo) pip install scrapy

How to setup and run the project

From here I'am assuming that you already have all prerequisites installed and properly configured in your machine.

Setup

Clone the repo or download it.

Running

Open your terminal and change to into the project folder:

$ cd ~/<folder>/yelp-scaper

Where <folder> is where you downloaded or cloned the repo.

Then you can start the scraping process by using the following command:

$ scrapy crawl yelp -a find='something' -a near='somewhere'
Arguments

Note: All arguments must be preceded by the -a argument, this is required by Scrapy.

find: This argument is required. The possible values for this argument are the same which you can use in the Yelp website, for example:

  • Restaurants, Nightlife, Air Conditioning & Heating, Contractors, Electricians, Home Cleaners, Landscapers, Locksmiths, Movers, Painters, Plumbers.

near: This argument is required. The possible values for this argument are the same which you can use in the Yelp website, for example:

  • London, San Francisco, etc...

max_results: This argument is optional, and your default value is 3. This argument allows you to limit the amount of results that the Scrapy will scrape from the website.

Contributing

Feel free to make your suggestion and/or contribution.

License

This project is licensed under the MIT License - see the LICENSE file for details

About

An simple example of how to perform web scraping by using the Scrapy framework and the Yelp website as target

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages