# Chicago Building Permits, Part 1

You'll find the [Selenium-Playwright conversion reference](https://jonathansoma.com/everything/scraping/selenium-playwright-conversion/) helpful for clicking, entering text, and selecting from dropdowns.

**Use Playwright or Selenium to visit https://webapps1.chicago.gov/buildingrecords/ and accept the agreement.**

In [9]:
from playwright.async_api import async_playwright
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = False)
page = await browser.new_page()

In [10]:
url = "https://webapps1.chicago.gov/buildingrecords/"
await page.goto(url)

<Response url='https://webapps1.chicago.gov/buildingrecords/' request=<Request url='https://webapps1.chicago.gov/buildingrecords/' method='GET'>>

In [14]:
#await page.locator("rbnAgreement1").select_option("agreement")
#await page.getByLabel('I accept the terms of this license').check();
#expect(await page.getByLabel('I accept the terms of this license').isChecked()).toBeTruthy()

await page.getByLabel('LG').check();

AttributeError: 'Page' object has no attribute 'getByLabel'

# Part 1: Building permits

**Topics:** Completing forms, reading tables, saving one CSV for every row in a dataframe

## Searching

Search for **400 E 41ST ST**. It might be useful later to save it as a variable called `address`.

The Playwright documentation [suggests using `.fill` to fill in text form fields](https://playwright.dev/docs/input), but it won't work in this case: the website won't let you click submit until it "sees" you type on the keyboard. [Luckily there's a SECOND way of doing it with Playwright](https://playwright.dev/docs/input#type-characters).

## Saving the permits table

Use pandas to save a CSV of all **permits** to `permits-400 E 41ST ST.csv`. Note that there are **different sections of the page**, not just one long permits table!

> - *Tip: When using `.read_html`, try using `flavor='lxml'` and comparing the results to `flavor='html5lib'`. Which works better?*
> - *Tip: You might need to install `html5lib` using `pip`. If so, you'll need to restart the notebook using **Kernel > Restart** before it will work.*

## Moving it all into one cell

Now let's try getting a permits for the address `3444 North Elaine Place`.

Move the code from the sections above into **one cell** that both searches *and* saves.

* **Tip:** You CAN'T just click the back button to go back to the search page! You **must** click the **Search** link in the top left-hand corner (you can do it manually this time)
* **Tip:** If you get the error **No tables found** it's because unlike playwright, pandas doesn't have a timeout where it waits for the page to load. You can use `await page.locator("table").wait_for()` to wait for the table to show up before processing with pandas.

## Turning your code into a function

Convert that code into a function called `get_permits` that takes does the following:
 
1. Takes an address
2. Prints that it's about to work on the address
3. Clicks the "Search" button on the top left of the page
4. Downloads the permits to a CSV.

Test it by running the code below:

```python
# Skip the await/async parts if you're using Selenium
await get_permits('25 W Randolph St')
```

Confirm that `permits-25 W Randolph St.csv` is saved and *it isn't the same content as the other permit files.*

* **Tip:** When you use `await` inside of a function, instead of `def` you name it with `async def`. And then when you use it you call it with `await`. We're fancy playwright people now!
* **Tip:** We usually use `.locator` with fancy ids and classes and stuff, but `.get_by_text` is going to be a lot easier for the search button! If you get a "multiple elements found" issue [check the documentation](https://playwright.dev/python/docs/api/class-page#page-get-by-text)

## Loops in pandas without `.apply`

It's honestly a pain to use `.apply` with async functions, so here's a secret: if you want to do a loop with a pandas dataframe, you can use `.iterrows()`.

```python
for index, row in buildings_df.iterrows():
    print(row['address'])
```

Use `.iterrows()` to run the async `get_permits` function for each row of our dataframe. *If we were scraping the page with Selenium or requests + BeautifulSoup we wouldn't need to do this.*

# Part 2: More complex tables

Now we want to save a CSV of all DOB inspections.

This is more complicated than the last one because **we also need to save the URL to the inspection** (see how the inspection number is a link?). As a result, you won't be able to use pandas! Instead, you'll need to use a loop and create a list of dictionaries.

You can use Playwright itself or you can feed the source to BeautifulSoup. You should have approximately 160 rows.

## Search for `400 E 41ST ST` and save a dataframe of inspection details

Include the following fields:

* Inspection number
* Inspection date
* Status
* Description
* URL link

**Tip:** Again, pandas won't work since you need the `href` from the link.

**Tip:** You'll probably need to find the table first, then the rows inside, then the cells inside of each row. You'll probably use lots of list indexing.

**Tip:** Your data will have a lot of `\t` and `\n` in it. You can (somewhat) clean that in pandas with `.str.strip()`

### Use `.str.strip()` to somewhat clean up the inspection number and inspection date

### Make sure the inspection date *looks like a date*

Instead of this:

```
20221004\n\t\t\t\t\t\t10/04/2022
```

We want this:

```
10/04/2022
```

You can either go back and scrape slightly differently, use regex, split the string... lots of options here.

### Save the dataframe to `inspections-400 E 41ST ST.csv`

Use a variable for the `filename` to make your life easier later on.

## Put it all into one cell and run it for `3444 North Elaine Place`

Start from the point of clicking the **Search** button, and put this bit of code on the first line:

```python
address = '3444 North Elaine Place'
```

* **Tip:** I kept getting 0 rows after moving all the code one cell, probably because the page loads in a weird way before it populates the tables. My solution was a hack: if I used `time.sleep(1)` to pause for a moment after submitting but before scraping the rows, everything worked out okay.

## Move this into a function

Move this into a function named `get_inspections`, test with the following code:

```python
await get_inspections('25 W Randolph St')
```

Confirm the `inspections-25 W Randolph St.csv` file is created and has the correct information in it.

Refer to the previous function section for tips and things to be careful about.

* **Tip:** This one has a LOT of inspections, which seems to take longer to load. You might need to change your `time.sleep` line a bit....

If you did this with playwright, try it again with BeautifulSoup. What's the difference in speed?