# Scraping Craigslist

### Introduction

In this lesson, we'll learn how to work with Selenium to scrape craigslist. As we'll see, Selenium can allow us to perform many (if not all) of the operations that we can perform when navigating the web by hand.  Let's get started.

> For the code below to work, we should have Firefox installed on our computer.  It may also help to reference [this website](https://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path).

### Loading up Selenium

To get started, let's load up selenium, and get it to open up a craigslist. We can do so with the following code.

In [181]:
from selenium import webdriver

driver = webdriver.Firefox()
craigslist_url = "https://newyork.craigslist.org/search/apa"
driver.get(craigslist_url)

The main tool that we're using above is the webdriver.  We initialize a Firefox webdriver object, and then make the request to craigslist.  This will open up the firefox brower to that page, if Firefox is installed.

### Selecting Elements

After navigating to the webpage we can select elements.  We can do so with the `find_elements_by` methods.  For example, here let's `find_elements_by_class_name`.  We can use this to find information about apartment listings.

We'll first make sure that we wait one second, and then find the `result-info` boxes.

In [None]:
driver.implicitly_wait(1)

infos = driver.find_elements_by_class_name("result-info")

In Craigslist, this contains some key information about the apartment listing.

<img src="./result-info-box.png" width="40%">

In [182]:
first_info = infos[0]

So our first step is to select the first of the `result-info` boxes.

From here, there are html elements inside of our first element that can tell us the price, the neighborhood, and the title.  We can get these children elements by calling either `find_elements_by_class_name` or `find_element_by_class_name`. 

```python
date_el = first_info.find_element_by_class_name('result-date')
title_el = first_info.find_element_by_class_name('result-title')
```

And we can get information from these selected elements by calling the `text` method.

```python
date_el.text
```

### Looping Through Elements

Once we know how to select information from one listing, we can loop through and select the information from multiple listings.

Here is all of the code.

In [None]:
from selenium import webdriver

driver = webdriver.Firefox()
craigslist_url = "https://newyork.craigslist.org/search/apa"
driver.get(craigslist_url)

driver.implicitly_wait(1)

infos = driver.find_elements_by_class_name("result-info")
listings = []
for info in infos:
    
    date_el = info.find_element_by_class_name('result-date')
    title_el = info.find_element_by_class_name('result-title')
    hood_text = ''
    housing_text = ''
    hood_els = info.find_elements_by_class_name("result-hood")
    housing_els = info.find_elements_by_class_name("housing")
    if len(hood_els) > 0:
        hood_text = hood_els[0].text
    if len(housing_els) > 0:
        housing_text = housing_els[0].text
    listing_ob = {'date': date_el.text, 'title': title_el.text, 
                  'hood': hood_text, 'href': title_el.get_property('href'), 
                  'housing': housing_text}
    listings.append(listing_ob)
driver.close()

### Bonus Points 

Here's some other things that we can do with selenium.  We can click on a selected element with the `.click()` method.

We can even fill out a form on a webpage by selecting elements and then using the `send_keys` method.

In [190]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
craigslist_url = "https://newyork.craigslist.org/search/apa"
driver.get(craigslist_url)


query_box = driver.find_elements_by_class_name('querybox')[0]
search_box = query_box.find_elements_by_tag_name('input')[0]
search_box.send_keys('3 br')

query_box.find_elements_by_class_name('searchbtn')[0].click()

### Summary

In this lesson, we saw how to use Selenium to scrape information from webpages.  We started by navigating to a webpage with the following: 
```python
driver = webdriver.Firefox
driver.get("https://newyork.craigslist.org/search/apa")
```

Then we selected elements with a call to:
    
```python
query_box = driver.find_elements_by_class_name('querybox')[0]
```

From there, we saw that we could select child elements by calling `selected_element.find_elements_by...`.

Finally, we saw that we can fill out forms using selenium with calls to `send_keys('text')` and `selected_element.click()`.

[Geckodriver](https://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path)