# Welcome to the Real World 

Okay, but what happens when its not a nice clean training example? In todays lesson, we'll go over some of the ways that web scraping can get messy and work around solutions. 

- Thinking through how any one pass through your loop might be different than others
    - Does the webpage layout look different for certain options?
    - Show example for Philadelphia County UJS - there are no evictions available for just that county: https://ujsportal.pacourts.us/
- Error handling to deal with slow websites or edge cases
    - try/except logic
    - time.sleep()
- Picking up where you leave off by adding function arguments
- Workshop
    - Build upon session 4 example by adding error handling and pickup-where-you-left-off functionality

## Changes between loops

In an ideal world, each time we loop through our variables, the web page will stay the same. However, sometimes new elements show up on the page based on the state/county/variable you want. In our KFF premium plan example, when a zip code covers multiple counties, a county dropdown menu will show up so you can select which county you want data for.

![IL counties](images/il-counties.png)

Since there are ~40,000 zip codes in the US, we only have one zip code in our data for each county. This means that we need to make sure that when there are multiple counties associated with a zip code, we need to make sure we select the correct one we want, or else we'll have error in our data! This is where the trusty `try: except:` statement comes in handy. We can specify we want to TRY to select a county from the dropdown, and if that doesn't work we just say `pass` and move on. 

@Judah feel free to elaborate on what try except does


In [None]:
try:
    # select county when given the option
    select_dropdown(identifier='//*[@id="locale-inner"]/select',  driver = driver, option=county)

except:
    # The dropdown element is not present, proceed with the next steps
    pass


Additionally, a bigger curve ball was that New York and Vermont each have completely different pages, where you can only select which members of the family are enrolling in the marketplace. This was something I only figured out as the code was running and I noticed that it was getting stuck on those states because the options I was telling `selenium` to select weren't available. For this case, I checked-in with the project team to see how they wanted me to proceed and ended up creating a seperate, slgihtly tailored script that would scrape the available values for those two states. 

![NY example](images/ny-example.png)

## ABORT MISSION

Oh no! Our code stopped running! 

How can we create safety nets within the code so that when something stops running (say you're computer went to sleep) the code automatically keeps running from where we left of? 

First, we need to think through how to get back to the state/county/zip code that we were on. To do this, it'll be helpful to have a counter running along with our code to tell us what number we're on. 

In [None]:
# Helper function to access the nth county in the state_counties_zipcodes
# dictionary if the counter is not 0
def skip_counties(n):
    ###--- Get list of all state and zip codes ---###

    # Assuming you have a CSV file with columns 'State' and 'ZIP Code'
    csv_file_path = 'data/tate.csv'

    # read into a dataframe
    if n != 0: 
        raw_csv = pd.read_csv(csv_file_path, skiprows=lambda x: x > 0 and x <= n, dtype={'zipcode': str})
        # filter out new york because the page is different
        raw_csv = raw_csv[~raw_csv['state_abbr'].isin(['ny', 'vt'])]
    else: 
       raw_csv = pd.read_csv(csv_file_path, dtype={'zipcode': str})
       # filter out new york because the page is different 
       raw_csv = raw_csv[~raw_csv['state_abbr'].isin(['ny', 'vt'])]
    # Create a nested dictionary
    state_counties_zipcodes = {}

    for index, row in raw_csv.iterrows():
        state = row['state_abbr']
        county = row['county']
        zipcode = row['zipcode']

        if state not in state_counties_zipcodes:
            state_counties_zipcodes[state] = {}

        state_counties_zipcodes[state][county] = zipcode
    return(state_counties_zipcodes)
