## Logging on

Use Selenium to visit https://webapps1.chicago.gov/buildingrecords/ and accept the agreement.

> Think about when you use `.find_element_...` and when you use `.find_elementSSS_...`

In [1]:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://webapps1.chicago.gov/buildingrecords/")

In [2]:
driver.find_element_by_id("rbnAgreement1").click()

In [3]:
submit_button = driver.find_element_by_xpath("/html/body/div/div[4]/form/div[4]/div/button")
submit_button.click()

## Searching

Search for **400 E 41ST ST**.

In [4]:
textbox = driver.find_element_by_xpath("/html/body/div/div[4]/form/div[1]/div/input")
# Type in the textbox
textbox.send_keys("400 E 41ST ST.")

In [5]:
submit_button = driver.find_element_by_xpath("/html/body/div/div[4]/form/div[2]/div/button")
submit_button.click()

## Saving tables with pandas

Use pandas to save a CSV of all **permits** to `Permits - 400 E 41ST ST.csv`. Note that there are **different sections of the page**, not just one long permits table.

In [6]:
import pandas as pd
tables = pd.read_html(driver.page_source)



In [7]:
print(tables[0])
permits = tables[0]

     PERMIT #  DATE ISSUED                                DESCRIPTION OF WORK
0   100845718          NaN  ERECT TWO SCAFFOLDS FROM 10/14/2019 TO 10/14/2...
1   100778302          NaN  PERMIT EXPIRES ON 10/17/2018 Erection Starts: ...
2   100721255          NaN  PERMIT EXPIRES ON 10/24/2017 ERECTION STARTS: ...
3   100693399          NaN  INSTALLATION OF LOW VOLTAGE BURGLAR ALARM INTE...
4   100665436          NaN  PERMIT EXPIRES ON 10/24/2016 ERECTION STARTS: ...
5   100610771          NaN  PERMIT EXPIRES ON 10/28/2015 ERECTION STARTS: ...
6   100581991          NaN  TRACE AND REPAIR BROKEN UNDERGROUND FEED TO EX...
7   100479194          NaN     INTERNALLY LIT SIGN CABINET ON SOUTH ELEVATION
8   100385721          NaN  RPACE CONCRETE SLAB WITH NEW AT GROUNGD FLOOR ...
9   100267298          NaN  INTERIOR ALTERATIONS TO MEDICAL OFFICE SUITE 1...
10  100218969          NaN  Revision to Permit. Removing walls at elevator...
11  100195892          NaN  INTERIOR ALTERATIONS TO 1ST FLOOR TE

In [8]:
permits.to_csv(r'/Users/biancapallaro/Documents/Foundations/Permits_-_400_E_41ST_ST.csv', index = False, header=True)

## Saving tables the long way

Save a CSV of all DOB inspections to `Inspections - 400 E 41ST ST.csv`, but **you also need to save the URL to the inspection**. As a result, you won't be able to use pandas, you'll need to use a loop and create a list of dictionaries.

You can use Selenium (my recommendation) or you can feed the source to BeautifulSoup. You should have approximately 157 rows.

You'll probably need to find the table first, then the rows inside, then the cells inside of each row. You'll probably use lots of list indexing. I might recommend XPath for finding the table.

*Tip: If you get a "list index out of range" error, it's probably due to an issue involving `thead` vs `tbody` elements. What are they? What are they for? What's in them? There are a few ways to troubleshoot it.*

In [9]:
items = driver.find_elements_by_id("resultstable_inspections")
big_list = []
for item in items:
    table= item.find_elements_by_tag_name('tbody')
    for inside_table in table:
        rows= inside_table.find_elements_by_tag_name('tr')
        for row in rows:
            dictionary={}
            dictionary['inspection'] = row.find_elements_by_tag_name('td')[0].text
            dictionary['inspection_date'] = row.find_elements_by_tag_name('td')[1].text
            dictionary['status'] = row.find_elements_by_tag_name('td')[2].text
            dictionary['type_description'] = row.find_elements_by_tag_name('td')[3].text
            links = row.find_elements_by_tag_name('a')
            for link in links:
                dictionary['url'] = link.get_attribute('href')
            print(dictionary)
            big_list.append(dictionary)
        

{'inspection': '13175960', 'inspection_date': '11/30/2020', 'status': 'FAILED', 'type_description': 'ANNUAL INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=13175960'}
{'inspection': '12770690', 'inspection_date': '05/30/2019', 'status': 'PASSED', 'type_description': 'BOILER ANNUAL INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=12770690'}
{'inspection': '12670542', 'inspection_date': '05/21/2019', 'status': 'FAILED', 'type_description': 'CONSERVATION ANNUAL', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=12670542'}
{'inspection': '12277260', 'inspection_date': '08/27/2018', 'status': 'FAILED', 'type_description': 'CONSERVATION ANNUAL', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=12277260'}
{'inspection': '12418304', 'inspection_date': '05/30/2018', 'status': 'PASSED', 'type_description': 'BOILER A

{'inspection': '10560743', 'inspection_date': '05/23/2012', 'status': 'PASSED', 'type_description': 'ANNUAL INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=10560743'}
{'inspection': '9995366', 'inspection_date': '04/09/2012', 'status': 'PASSED', 'type_description': 'ANNUAL INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=9995366'}
{'inspection': '10411517', 'inspection_date': '03/07/2012', 'status': 'PASSED', 'type_description': 'BOILER ANNUAL INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=10411517'}
{'inspection': '10530867', 'inspection_date': '02/21/2012', 'status': 'FAILED', 'type_description': 'ANNUAL INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=10530867'}
{'inspection': '10230414', 'inspection_date': '10/27/2011', 'status': 'FAILED', 'type_description': 'ANNUAL INSPECT

{'inspection': '2017660', 'inspection_date': '01/15/2009', 'status': 'PASSED', 'type_description': 'DOB NEW CONSTRUCTION INSP', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=2017660'}
{'inspection': '2017661', 'inspection_date': '01/15/2009', 'status': 'PASSED', 'type_description': 'DOB NEW CONSTRUCTION INSP', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=2017661'}
{'inspection': '2020266', 'inspection_date': '01/15/2009', 'status': 'PASSED', 'type_description': 'DOB NEW CONSTRUCTION INSP', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=2020266'}
{'inspection': '1613901', 'inspection_date': '01/15/2009', 'status': 'PARTIAL PASSED', 'type_description': 'DOB VENT/FURNACE INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=1613901'}
{'inspection': '1613908', 'inspection_date': '01/15/2009', 'status': 'PARTIAL PASSED',

{'inspection': '1621186', 'inspection_date': '05/16/2007', 'status': 'PARTIAL PASSED', 'type_description': 'PERMIT INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=1621186'}
{'inspection': '1613899', 'inspection_date': '05/10/2007', 'status': 'PARTIAL PASSED', 'type_description': 'DOB PLUMBING INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=1613899'}
{'inspection': '1881043', 'inspection_date': '04/20/2007', 'status': 'PARTIAL PASSED', 'type_description': 'CONSTRUCTION EQUIPMENT PERMIT', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=1881043'}
{'inspection': '1652466', 'inspection_date': '04/06/2007', 'status': 'PASSED', 'type_description': 'BOILER ANNUAL INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=1652466'}
{'inspection': '1613909', 'inspection_date': '04/06/2007', 'status': 'PARTIAL P

{'inspection': '25836', 'inspection_date': '07/09/2001', 'status': 'PASSED', 'type_description': 'SIGN ANNUAL INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=25836'}
{'inspection': '130126', 'inspection_date': '05/09/1997', 'status': 'CLOSED', 'type_description': 'FIRE PREVENTION PUMPS LEGACY', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=130126'}
{'inspection': '9475223', 'inspection_date': '01/28/1997', 'status': 'CLOSED', 'type_description': 'ELEVATOR LEGACY INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=9475223'}
{'inspection': '9457580', 'inspection_date': '01/21/1997', 'status': 'CLOSED', 'type_description': 'ELEVATOR LEGACY INSPECTION', 'url': 'https://webapps1.chicago.gov/buildingrecords/inspectiondetails?addr=364923&insp=9457580'}
{'inspection': '130125', 'inspection_date': '03/21/1996', 'status': 'CLOSED', 'type_description': 'F

In [10]:
import pandas as pd
df2 = pd.DataFrame(big_list)
df2.head()

Unnamed: 0,inspection,inspection_date,status,type_description,url
0,13175960,11/30/2020,FAILED,ANNUAL INSPECTION,https://webapps1.chicago.gov/buildingrecords/i...
1,12770690,05/30/2019,PASSED,BOILER ANNUAL INSPECTION,https://webapps1.chicago.gov/buildingrecords/i...
2,12670542,05/21/2019,FAILED,CONSERVATION ANNUAL,https://webapps1.chicago.gov/buildingrecords/i...
3,12277260,08/27/2018,FAILED,CONSERVATION ANNUAL,https://webapps1.chicago.gov/buildingrecords/i...
4,12418304,05/30/2018,PASSED,BOILER ANNUAL INSPECTION,https://webapps1.chicago.gov/buildingrecords/i...


In [11]:
df2.to_csv(r'/Users/biancapallaro/Documents/Foundations/inspections.csv', index = False, header=True)


### Loopity loops

> If you used Selenium for the last question, copy the code and use it as a starting point for what we're about to do!

If you click the inspection number, it'll open up a new window that shows you details of the violations from that visit. Count the number of violations for each visit and save it in a new column called **num_violations**.

Save this file as `Inspections - 400 E 41ST ST - with counts.csv`.

Since it opens in a new window, we have to say "Hey Selenium, pay attention to that new window!" We do that with `driver.switch_to.window(driver.window_handles[-1])` (each window gets a `window_handle`, and we're just asking the driver to switch to the last one.). A rough sketch of what your code will look like is here:

```python
# Click the link that opens the new window

# Switch to the new window/tab
driver.switch_to.window(driver.window_handles[-1])

# Do your scraping in here

# Close the new window/tab
driver.close()

# Switch back to the original window/tab
driver.switch_to.window(driver.window_handles[0])
```

You'll want to play around with them individually before you try it with the whole set - the ones that pass are very different pages than the ones with violations! There are a few ways to get the number of violations, some easier than others.

In [12]:
items = driver.find_elements_by_id("resultstable_inspections")
for item in items:
    table= item.find_elements_by_tag_name('tbody')
    for inside_table in table:
        rows= inside_table.find_elements_by_tag_name('tr')
        for row in rows:
            links = row.find_elements_by_tag_name('a')
            for link in links:
                link.click()
                driver.switch_to.window(driver.window_handles[-1])
                table = driver.find_elements_by_id("resultstable")
                for things in table:
                    everything = things.find_elements_by_tag_name('tbody')
                    for trs in everything:
                        tr_count = trs.find_elements_by_tag_name('tr')
                        print(len(tr_count))
                        driver.close()
                        driver.switch_to.window(driver.window_handles[0])
                
                    

10


StaleElementReferenceException: Message: stale element reference: element is not attached to the page document
  (Session info: chrome=87.0.4280.88)
