# Scrapy Project Sample

This is my first project demonstrating the use of scrapy spiders to scrap data using a sample URL: http://chubbygrub.com/

## Step 1: Brief outline of initializing the scrapy spider

Install the Scrapy package using `conda install scrapy`. In this case, we'll specify XPath, as our query will utilize XPath language. CSS can be used as well along with the selector gadget on chrome.

On the terminal, create new Scrapy project by `cd` into desired directory. Then type `>scrapy startproject chubby`. Project name in this case is chubby.

Define an item. We will be navigating into the `items.py` file under the chubby folder using VS code. Upon opening the file, we need to define the item, i.e. what information headers would you want to scrape?

```python
import scrapy

class ChubbyItem(scrapy.Item):
    # Define the fields for your item here like:
    # name = scrapy.Field()
    calories = scrapy.Field()
    carbs = scrapy.Field()
    category = scrapy.Field()
    name = scrapy.Field()
    restaurant = scrapy.Field()
```

## Step 2: Identifying the target/sample:

```python
foods = [
    {
        'calories': '0',
        'carbs': '0',
        'category': 'Drinks',
        'fat': '0',
        'name': 'A&W® Diet Root Beer',
        'restaurant': 'A&W Restaurants'
    },
    {
        'calories': '0',
        'carbs': '0',
        'category': 'Drinks',
        'fat': '0',
        'name': 'A&W® Diet Root Beer',
        'restaurant': 'A&W Restaurants'
    },
    ...
]
```

## Step 3: Using the documentation and Xpath helper, select the right information

```python
import scrapy
from chubby.items import ChubbyItem

class ChubbySpider(scrapy.Spider):
    name = "chubby"
    start_urls = [
        "http://www.chubbygrub.com/"
    ]

    def parse(self, response): # Define parse() function. 
        #follow url to restaurant page for each restaurant to obtain the menu
        for url in response.xpath('/html/body/div[4]/div/div/div/a/@href'):
        #initiate another parse to 'execute' the queries on the new url
            yield response.follow(url, self.parse_author)
    
    #We are now on each individual restaurant menu page.
    def parse_author(self, response):
        items = []
        for i in response.xpath('//*[@id="items"]/tbody/tr'):
            item = ChubbyItem()
            item['calories'] = i.xpath('./td[3]/text()').get()
            item['name'] = i.xpath('./td[1]/text()').get()
            item['carbs'] = i.xpath('./td[5]/text()').get()
            item['category'] = i.xpath('./td[2]/a/text()').get()
            #restaurant does not follow the path specified in for loop but using the absolute path works
            item['restaurant'] = response.xpath('/html/body/div[2]/div[1]/div/div/h1/span/text()').get()
            items.append(item)
        return items
 
```

## Step 4: Run the spider

Under terminal when `chubby_spider.py` is opened in VS code, type `>scrapy crawl chubby` to run the spider. Errors encountered are fairly common. Hence it is advisable to try out using `>scrapy shell 'website-name'` with the Xpath queries first to understand how it works first before crawling each time.

## Step 5: Save the output into a `.csv` file

Still under the terminal, type the following where `items.csv` is the desired file name. 
<blockquote>
```
> scrapy crawl craigslist -o items.csv -t csv
```
</blockquote>