# Airbnb Scrape by Montana Cities

I will use Selenium and Beautiful Soup to scrape the Airbnb listings for every town in Montana with a population below 2,500 people. Then I will try to determine if there is a correlation between the number of listings in a given town and: 
- Median household income
- Median home value
- Population of select age bands (esp. <18 years old)

If my theory holds, higher Airbnb listings will be positively correlated with median household income and median home value, but negatively correlated with population percentage between 18 years old. My theory is that Airbnb allows for greater earning potential for locals, but drives up local home prices by restricting housing supply, and will result in less families moving to those towns because housing will be unaffordable. 

For this analysis, I am especially interested in the effect of Airbnb on small towns, so I'm not too concerned about locales with more than 300 listings. I will take note of which ones they are, though, and do additional scraping if that location fits my other population criteria (<2,500 residents). (Note: Airbnb does not show more than 300 listings for one area without clicking around the map).  

In [1]:
# imports
import pandas as pd
import requests               
from bs4 import BeautifulSoup 
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
from collections import defaultdict
import math
import numpy as np

## Source List of Montana Cities and Towns

The following website has population data sourced from the 2020 Census, where available, else from the 2019 American Community Survey: https://www.montana-demographics.com/cities_by_population.  

I read it in below and filter out places with populations greater than 2,500. 

In [2]:
places = pd.read_csv('Montana.Towns.csv')
places.head()
places["Population"] = pd.to_numeric(places["Population"])

In [3]:
small_towns = places[places['Population'] <2500] 
print(small_towns.head())
len(small_towns.index)

             City  Population
43  East Missoula        2465
44         Conrad        2318
45      Red Lodge        2257
46          Pablo        2138
47       Colstrip        2096


273

## Search Airbnb by Place and Scrape Listings

Now that I have a list of cities to search on Airbnb, I can use Selenium to go through each place in the list and scrape the listing info. 

However, there are some issues here. Airbnb is sometimes FAR too generous with the mileage radius is affords locations when it returns search results. For example, when I search listings in Absarokee, MT, Airbnb returns listings all the way from Greycliff to Red Lodge and Park City. If I zoom in twice on the map, I am given listings that appear to be located in Absarokee, proper. 

However, if I do the same zooming action when I search Gardiner, MT, I wind up with zero listings because it zooms into a mountain range. An additional example: searching for listings in Belfry should be fruitless. There are no listings in the town. However, Airbnb generously draws a circle for Belfry that includes Red Lodge (among other towns), which has many listings. 

There is no perfect solution to this problem; somtimes no zoom is necessary, sometimes one should zoom in one map level; and other times one should zoom in two map level. In an effort to get the most accurate data, I will scrape Airbnb in two ways: first, by scraping all of the listings that the site returns; second, by scraping all listings returned if I search an area and then zoom in on the map two times. 

In my final analysis, I will average the two results for a "final" listing figure. 

In [4]:
# grab the town names, which will be put into the search field
towns = small_towns['City'].tolist()
len(towns)

273

In [5]:
# create a subset to test code on first 
# tested with a town subset first
town_subset = towns[6:8]
town_subset

['Stevensville', 'West Glendive']

In [6]:
url = 'https://www.airbnb.com/'

In [7]:
town_subset_test = {}

In [8]:
# test version

for town in town_subset :
    # open the driver and go to Airbnb's homepage ahead of this loop
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    driver.get(url)
    time.sleep(4)
    
    # ID the search field and input the town into it
    try :
        search_box = driver.find_element(By.CLASS_NAME, "_1xq16jy")
        search_box.send_keys(f'{town}, MT')
        search_box.send_keys(Keys.RETURN)
        time.sleep(2)
    
    except :
        print(f"couldn't find search box for {town}")
    
     # ID the "I'm Flexible" button and click it
    try: 
        flexible_button = driver.find_element(By.CLASS_NAME, "_9qlt59")
        flexible_button.click()
        time.sleep(2)
    
    except :
        print(f"couldn't find flexible box for {town}")
    
     # then click search and provide some time to load the page
    try :
        search_button = driver.find_element(By.XPATH, '//*[@id="search-tabpanel"]/div/div[5]/div[2]/button/span[1]/span')
        search_button.click()
        time.sleep(3)
    
    except :
        print(f"couldn't find search button for {town}")
    
    # pull the soup
    html = driver.page_source
    html_soup = BeautifulSoup(html, 'html.parser')
    
    # grab the number of listings
    num_listings = html_soup.find_all('h1', class_ = '_78tyg5')
    num_listings_string = []
    for x in num_listings:
        num_listings_string.append(str(x))
    listings = num_listings_string[0].split(">")[1].split(" ")[0]
    
    # clear out the list for next round
    num_listings_string.clear()
    
    # add to dictionary
    town_subset_test[town] = listings
    
    driver.quit()
    



Current google-chrome version is 98.0.4758
Get LATEST chromedriver version for 98.0.4758 google-chrome
Driver [/Users/austinsmith/.wdm/drivers/chromedriver/mac64/98.0.4758.102/chromedriver] found in cache


Current google-chrome version is 98.0.4758
Get LATEST chromedriver version for 98.0.4758 google-chrome
Driver [/Users/austinsmith/.wdm/drivers/chromedriver/mac64/98.0.4758.102/chromedriver] found in cache


In [9]:
town_subset_test

{'Stevensville': '133', 'West Glendive': '10'}

In [10]:
test_df = pd.DataFrame(list(town_subset_test.items()), columns = ['Town','Listings - No Zoom'])
test_df

Unnamed: 0,Town,Listings - No Zoom
0,Stevensville,133
1,West Glendive,10


In [9]:
town_listings = {}
errors = []

In [10]:
# scrape all towns
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

for town in towns :
    # open the driver and go to Airbnb's homepage ahead of this loop
    driver.get(url)
    time.sleep(4)
    
    # ID the search field and input the town into it
    try :
        search_box = driver.find_element(By.CLASS_NAME, "_1xq16jy")
        search_box.send_keys(f'{town}, MT')
        search_box.send_keys(Keys.RETURN)
        time.sleep(2)
    
    except :
        print(f"couldn't find search box for {town}")
        errors.append(town)
    
     # ID the "I'm Flexible" button and click it
    try: 
        flexible_button = driver.find_element(By.CLASS_NAME, "_9qlt59")
        flexible_button.click()
        time.sleep(2)
    
    except :
        print(f"couldn't find flexible box for {town}")
        errors.append(town)
    
     # then click search and provide some time to load the page
    try :
        search_button = driver.find_element(By.XPATH, '//*[@id="search-tabpanel"]/div/div[5]/div[2]/button/span[1]/span')
        search_button.click()
        time.sleep(3)
    
    except :
        print(f"couldn't find search button for {town}")
        errors.append(town)
    
    # pull the soup
    html = driver.page_source
    html_soup = BeautifulSoup(html, 'html.parser')
    
    # grab the number of listings
    num_listings = html_soup.find_all('h1', class_ = '_78tyg5')
    num_listings_string = []
    for x in num_listings:
        num_listings_string.append(str(x))
    
    try :
        listings = num_listings_string[0].split(">")[1].split(" ")[0]
    
    except :
        print(f"couldn't get the listings from {town}")
        errors.append(town)
        
    # clear out the list for next round
    num_listings_string.clear()
    
    # add to dictionary
    town_listings[town] = listings




Current google-chrome version is 98.0.4758
Get LATEST chromedriver version for 98.0.4758 google-chrome
Driver [/Users/austinsmith/.wdm/drivers/chromedriver/mac64/98.0.4758.102/chromedriver] found in cache


couldn't get the listings from Roberts
couldn't get the listings from Ballantine
couldn't get the listings from Huson a
couldn't get the listings from Forest Hill Village
couldn't find flexible box for Fort Peck
couldn't find search button for Fort Peck
couldn't find flexible box for Fox Lake
couldn't find search button for Fox Lake
couldn't find flexible box for Willow Creek
couldn't find search button for Willow Creek
couldn't find flexible box for Belfry 
couldn't find search button for Belfry 


In [11]:
# save to data frame

noZoom_listings = pd.DataFrame(list(town_listings.items()), columns = ['Town','Listings - No Zoom'])

In [14]:
noZoom_listings.head()
len(noZoom_listings.index)

273

In [16]:
# conduct scrape of listings when zoomed in twice
# note: did a test run first 

town_listings_zoom = {}
town_listings_zoom_test = {}

In [17]:
# test scrape with zoom in on map twice

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

for town in town_subset :
    # open the driver and go to Airbnb's homepage ahead of this loop
    driver.get(url)
    time.sleep(4)
    
    # ID the search field and input the town into it
    try :
        search_box = driver.find_element(By.CLASS_NAME, "_1xq16jy")
        search_box.send_keys(f'{town}, MT')
        search_box.send_keys(Keys.RETURN)
        time.sleep(2)
    
    except :
        print(f"couldn't find search box for {town}")
        errors.append(town)
    
     # ID the "I'm Flexible" button and click it
    try: 
        flexible_button = driver.find_element(By.CLASS_NAME, "_9qlt59")
        flexible_button.click()
        time.sleep(2)
    
    except :
        print(f"couldn't find flexible box for {town}")
        errors.append(town)
    
     # then click search and provide some time to load the page
    try :
        search_button = driver.find_element(By.XPATH, '//*[@id="search-tabpanel"]/div/div[5]/div[2]/button/span[1]/span')
        search_button.click()
        time.sleep(3)
    
    except :
        print(f"couldn't find search button for {town}")
        errors.append(town)
    
    # select the "zoom" button on the map x2
    try :
        zoom_button = driver.find_element(By.XPATH, '//*[@id="site-content"]/div[3]/div/div/div/div/div/div/div[3]/div/button[1]')
        zoom_button.click()
        time.sleep(1)
        zoom_button.click()
        time.sleep(1)
    
    except :
        print(f"couldn't find the zoom button for {town}")
        errors.append(town)
    
    # pull the soup
    html = driver.page_source
    html_soup = BeautifulSoup(html, 'html.parser')
    
    # grab the number of listings
    num_listings = html_soup.find_all('h1', class_ = '_78tyg5')
    num_listings_string = []
    for x in num_listings:
        num_listings_string.append(str(x))
    
    try :
        listings = num_listings_string[0].split(">")[1].split(" ")[0]
    
    except :
        print(f"couldn't get the listings from {town}")
        errors.append(town)
        
    # clear out the list for next round
    num_listings_string.clear()
    
    # add to dictionary
    town_listings_zoom_test[town] = listings



Current google-chrome version is 98.0.4758
Get LATEST chromedriver version for 98.0.4758 google-chrome
Driver [/Users/austinsmith/.wdm/drivers/chromedriver/mac64/98.0.4758.102/chromedriver] found in cache


In [18]:
# results of test scrape
town_listings_zoom_test

{'Stevensville': '32', 'West Glendive': '0'}

In [20]:
# perform full scrape with zooms
# in hindsight, I should have done these scrapes in one step - one after the other. 
# note to self to edit code accordingly if pulled in the future

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

for town in towns :
    # open the driver and go to Airbnb's homepage ahead of this loop
    driver.get(url)
    time.sleep(4)
    
    # ID the search field and input the town into it
    try :
        search_box = driver.find_element(By.CLASS_NAME, "_1xq16jy")
        search_box.send_keys(f'{town}, MT')
        search_box.send_keys(Keys.RETURN)
        time.sleep(2)
    
    except :
        print(f"couldn't find search box for {town}")
        errors.append(town)
    
     # ID the "I'm Flexible" button and click it
    try: 
        flexible_button = driver.find_element(By.CLASS_NAME, "_9qlt59")
        flexible_button.click()
        time.sleep(2)
    
    except :
        print(f"couldn't find flexible box for {town}")
        errors.append(town)
    
     # then click search and provide some time to load the page
    try :
        search_button = driver.find_element(By.XPATH, '//*[@id="search-tabpanel"]/div/div[5]/div[2]/button/span[1]/span')
        search_button.click()
        time.sleep(3)
    
    except :
        print(f"couldn't find search button for {town}")
        errors.append(town)
    
    # select the "zoom" button on the map x2
    try :
        zoom_button = driver.find_element(By.XPATH, '//*[@id="site-content"]/div[3]/div/div/div/div/div/div/div[3]/div/button[1]')
        zoom_button.click()
        time.sleep(1)
        zoom_button.click()
        time.sleep(1)
    
    except :
        print(f"couldn't find the zoom button for {town}")
        errors.append(town)
    
    # pull the soup
    html = driver.page_source
    html_soup = BeautifulSoup(html, 'html.parser')
    
    # grab the number of listings
    num_listings = html_soup.find_all('h1', class_ = '_78tyg5')
    num_listings_string = []
    for x in num_listings:
        num_listings_string.append(str(x))
    
    try :
        listings = num_listings_string[0].split(">")[1].split(" ")[0]
    
    except :
        print(f"couldn't get the listings from {town}")
        errors.append(town)
        
    # clear out the list for next round
    num_listings_string.clear()
    
    # add to dictionary
    town_listings_zoom[town] = listings



Current google-chrome version is 98.0.4758
Get LATEST chromedriver version for 98.0.4758 google-chrome
Driver [/Users/austinsmith/.wdm/drivers/chromedriver/mac64/98.0.4758.102/chromedriver] found in cache


couldn't find the zoom button for Colstrip
couldn't find the zoom button for Huson a
couldn't get the listings from Huson a


In [24]:
driver.close()

In [27]:
# save to data frame

Zoom_listings = pd.DataFrame(list(town_listings_zoom.items()), columns = ['Town','Listings - Zoom'])

In [28]:
Zoom_listings.head()
len(Zoom_listings.index)

273

In [29]:
# save to csv 
Zoom_listings.to_csv('Zoom_listings.csv')

In [76]:
# combine the two dataframes and include a new column that is the mean of the two listing figures
# note: need to change all of the 300+ listings to 300 


listings = pd.merge(noZoom_listings, Zoom_listings, on=['Town'])

listings.head()

Unnamed: 0,Town,Listings - No Zoom,Listings - Zoom
0,East Missoula,281,180
1,Conrad,28,0
2,Red Lodge,146,115
3,Pablo,45,24
4,Colstrip,1,1


In [77]:
# check to see which columns are numerical (because I'll need to take the mean)
listings.select_dtypes(include=np.number).columns.tolist()

[]

In [80]:
listings = listings.replace('300+',300)
listings.select_dtypes(include=np.number).columns.tolist()

[]

In [85]:
listings['Listings - No Zoom'] = listings['Listings - No Zoom'].astype(int)
listings['Listings - Zoom'] = listings['Listings - Zoom'].astype(int)

In [88]:
# add in a mean column
listings['Mean Listings'] = listings[['Listings - No Zoom', 'Listings - Zoom']].mean(axis=1)

In [89]:
listings.head()

Unnamed: 0,Town,Listings - No Zoom,Listings - Zoom,Mean Listings
0,East Missoula,281,180,230.5
1,Conrad,28,0,14.0
2,Red Lodge,146,115,130.5
3,Pablo,45,24,34.5
4,Colstrip,1,1,1.0


In [91]:
# save

listings.to_csv('listings_by_town.csv')

## Appendix: Code to Scrape Airbnb for Listing Features

The code in this Appendix goes through every search page result for all small Montana towns and returns a list of features for them - header, # guest, # beds, # baths, etc. 

This level of information is not needed for the correlation analysis, but is included for potential future analysis.

### Part 1: Test Code

In [10]:
url = 'https://www.airbnb.com/'

In [11]:
# go to the Airbnb homepage 

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)




Current google-chrome version is 98.0.4758
Get LATEST chromedriver version for 98.0.4758 google-chrome
Driver [/Users/austinsmith/.wdm/drivers/chromedriver/mac64/98.0.4758.102/chromedriver] found in cache


In [13]:
# identify the search box
search_box = driver.find_element(By.CLASS_NAME, "_1xq16jy")

In [14]:
#Write what we be searched
search_box.send_keys('Conrad, MT')

#Submit the text
search_box.send_keys(Keys.RETURN)

In [15]:
# Identify the "I'm flexible" button
flexible_button = driver.find_element(By.XPATH, '//*[@id="tab--tabs--1"]')

In [16]:
# click on it
flexible_button.click()

In [17]:
# then identify the "search" button
search_button = driver.find_element(By.XPATH, '//*[@id="search-tabpanel"]/div/div[5]/div[2]/button/span[1]/span')

In [18]:
# and click on it
search_button.click()

In [19]:
# REMINDER: add in some time for the page to load before pulling listings    

# parse the html and grab the listings
# then parse the html on the page
html = driver.page_source
html_soup = BeautifulSoup(html, 'html.parser')

listings = html_soup.find_all('div', class_ = 'cuj8fzj ln13ysw dir dir-ltr')


In [20]:
# check point - should be 20

len(listings)

20

In [21]:
# figure out how many pages to search
    
# number of listings
num_listings = html_soup.find_all('h1', class_ = '_78tyg5')
num_listings_string = []
for x in num_listings:
    num_listings_string.append(str(x))
raw_listings = int(num_listings_string[0].split(">")[1].split(" ")[0])

# divide by 20 and round up since there are 20 results per page
num_pages = math.ceil(raw_listings /20)

In [22]:
# check point - should be 2
num_pages

2

In [23]:
url_list = []

listings_per_page=20

current_url = driver.current_url

for i in range(num_pages) :
    offset = listings_per_page * i
    url_pagination = current_url + f'&items_offset={offset}'
    url_list.append(url_pagination)
    

In [24]:
url_list

['https://www.airbnb.com/s/Conrad--MT/homes?tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&flexible_trip_dates%5B%5D=april&flexible_trip_dates%5B%5D=march&flexible_trip_lengths%5B%5D=weekend_trip&date_picker_type=flexible_dates&source=structured_search_input_header&search_type=filter_change&items_offset=0',
 'https://www.airbnb.com/s/Conrad--MT/homes?tab_id=home_tab&refinement_paths%5B%5D=%2Fhomes&flexible_trip_dates%5B%5D=april&flexible_trip_dates%5B%5D=march&flexible_trip_lengths%5B%5D=weekend_trip&date_picker_type=flexible_dates&source=structured_search_input_header&search_type=filter_change&items_offset=20']

In [25]:
# load necessary functions for scraping

# this function will extract all the elements in "search_page" above from the html 

def extract_elements(listing_html, params) :
     # Find the right tag
    if 'class' in params:
        elements_found = listing_html.find_all(params['tag'], params['class'])
    else:
        elements_found = listing_html.find_all(params['tag'])

    # Extract the right element
    tag_order = params.get('order', 0)
    element = elements_found[tag_order]
        
    # Get text
    if 'get' in params:
        output = element.get(params['get'])
    else:
        output = element.get_text()

    return output


# extract all of the elements with this function

def extract_page_features(soup, search_items):
    # create a dictionary to hold the features
    features_dict = {}
    
    # go through each item of the search block above and try to find it and put it in dict
    for feature in search_items :
        try:
            features_dict[feature] = extract_elements(soup, search_items[feature])
            
        # if it doesn't exist, place empty in that field
        except:
            features_dict[feature] = 'empty'
    
    return features_dict


def get_listings(search_page) :
    # put the driver ahead of running this function
    # driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    
    # add in some wait time so the page can load
    
    driver.get(search_page)
    
    time.sleep(4)
    
    # then parse the html on the page
    html = driver.page_source
    html_soup = BeautifulSoup(html, 'html.parser')
    listings = html_soup.find_all('div', class_ = 'cuj8fzj ln13ysw dir dir-ltr')
    
    # remember to close the driver after running the function
    #driver.close()
    
    return listings

def process_search_pages(url_list) :
    features_list = []
    for page in url_list:
        listings = get_listings(page)
        for listing in listings:
            features = extract_page_features(listing, search_page)
            features_list.append(features)

    return features_list

In [26]:
# items to scrape on each page
# this is accurate as of 2/21/22 ... if there are empty fields in data frame, fix these tags

search_page = {
    'name': {'tag':'meta', 'get':'content', 'order':0},
    'url': {'tag':'meta', 'get':'content', 'order':2},
    'header': {'tag':'div', 'class': 'cuu4odx c1frjvtt dir dir-ltr'},
    'guests': {'tag':'span', 'class': 'mp2hv9t dir dir-ltr', 'order':0},
    'rooms': {'tag':'span', 'class': 'mp2hv9t dir dir-ltr', 'order':1},
    'beds': {'tag':'span', 'class': 'mp2hv9t dir dir-ltr', 'order':2},
    'baths': {'tag':'span', 'class': 'mp2hv9t dir dir-ltr', 'order':3},
    'price': {'tag':'span', 'class':'a8jt5op dir dir-ltr'},
    'rating': {'tag':'span', 'class':'rpz7y38 dir dir-ltr'},
    'n_reviews': {'tag':'span', 'class': 'r1xr6rtg dir dir-ltr'},
    'superhost': {'tag':'div', 'class': 't1oq1m17 dir dir-ltr'} 
}

In [27]:
# grab the information for each listing in each URL
Conrad = process_search_pages(url_list)

In [30]:
# check results
Conrad[1]

{'name': 'Blackleaf Creek Ranch Guesthouse',
 'url': 'www.airbnb.com/rooms/38430730?adults=1&children=0&infants=0&check_in=2022-04-22&check_out=2022-04-24&previous_page_section_name=1000',
 'header': 'Entire cabin in Bynum',
 'guests': '4 guests',
 'rooms': '1 bedroom',
 'beds': '2 beds',
 'baths': '1 bath',
 'price': '$125 per night',
 'rating': '4.88',
 'n_reviews': '\xa0(24 reviews)',
 'superhost': 'SUPERHOST'}

In [31]:
driver.close()

In [32]:
Conrad_listings = pd.DataFrame(Conrad)

In [33]:
# check 
Conrad_listings.head()

Unnamed: 0,name,url,header,guests,rooms,beds,baths,price,rating,n_reviews,superhost
0,Small Town Charm/Lake Access/Glacier Park,www.airbnb.com/rooms/43789823?adults=1&childre...,Entire residential home in Valier,6 guests,3 bedrooms,4 beds,2 baths,$150 per night,4.94,(31 reviews),SUPERHOST
1,Blackleaf Creek Ranch Guesthouse,www.airbnb.com/rooms/38430730?adults=1&childre...,Entire cabin in Bynum,4 guests,1 bedroom,2 beds,1 bath,$125 per night,4.88,(24 reviews),SUPERHOST
2,NEW! Remodeled Choteau Cottage: Ski & Fish Nea...,www.airbnb.com/rooms/46246886?adults=1&childre...,Entire cottage in Choteau,5 guests,2 bedrooms,4 beds,1 bath,$101 per night,4.93,(30 reviews),SUPERHOST
3,Cut Bank Studio #4 near Glacier National Park!,www.airbnb.com/rooms/19982124?adults=1&childre...,Entire rental unit in Cut Bank,3 guests,Studio,2 beds,1 bath,$89 per night,4.90,(167 reviews),SUPERHOST
4,Tranquil 4-bedroom Glacier View Getaway,www.airbnb.com/rooms/532553660666635523?adults...,Farm stay in Cut Bank,9 guests,4 bedrooms,5 beds,3 baths,$250 per night,empty,empty,empty


In [None]:
# Last step would be saving to csv

### Part 2: Programmatic Scrape

Here, I'll try to loop through two towns and scrape information as I did for one town above. Then I will expand to scrape through all 274 towns. 

In [26]:
towns = small_towns['City'].tolist()
len(towns)

274

In [27]:
# tested with a town subset first
town_subset = towns[6:8]
town_subset

['Stevensville', 'West Glendive']

In [28]:
# default dictionary to hold the results by city
city_listings = defaultdict(dict)
test = defaultdict(dict)

In [29]:
# testing out on two towns first ... 

for town in town_subset :
    # open the driver and go to Airbnb's homepage ahead of this loop
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    driver.get(url)
    time.sleep(4)
    
    # ID the search field and input the town into it
    try :
        search_box = driver.find_element(By.CLASS_NAME, "_1xq16jy")
        search_box.send_keys(f'{town}, MT')
        search_box.send_keys(Keys.RETURN)
        time.sleep(2)
    except:
        print(f"couldn't find search field for {town}")
    
     # ID the "I'm Flexible" button and click it
    try :
        flexible_button = driver.find_element(By.CLASS_NAME, "_9qlt59")
        flexible_button.click()
        time.sleep(2)
    except:
        print(f"couldn't find flexible button for {town}")
    
     # then click search and provide some time to load the page
    try:
        search_button = driver.find_element(By.XPATH, '//*[@id="search-tabpanel"]/div/div[5]/div[2]/button/span[1]/span')
        search_button.click()
        time.sleep(3)
    except:
        print(f"couldn't find search button for {town}")
    
    # figure out how many pages to search
    html = driver.page_source
    html_soup = BeautifulSoup(html, 'html.parser')
    
    # number of listings
    num_listings = html_soup.find_all('h1', class_ = '_78tyg5')
    num_listings_string = []
    for x in num_listings:
        num_listings_string.append(str(x))
    try :
        raw_listings = int(num_listings_string[0].split(">")[1].split(" ")[0])
    except :
        print(f"there's something weird with pages in {town}")
    
    # clear out the list for next round
    num_listings_string.clear()
    
    # divide by 20 and round up since there are 20 results per page
    num_pages = math.ceil(raw_listings /20)
    
    # grab the urls to search for each page of results
    url_list = []
    listings_per_page=20
    
    current_url = driver.current_url

    for i in range(num_pages) :
        offset = listings_per_page * i
        url_pagination = current_url + f'&items_offset={offset}'
        url_list.append(url_pagination)
    
    # scrape each page and add the results to the default dictionary
    if len(url_list) == num_pages :
        test[town] = process_search_pages(url_list)
    else :
        print("Didn't get the correct # of pages.")
    
    # clear out URL list
    url_list.clear()
    
    # then quit that webpage before starting on the next town
    driver.quit()




Current google-chrome version is 98.0.4758
Get LATEST chromedriver version for 98.0.4758 google-chrome
Driver [/Users/austinsmith/.wdm/drivers/chromedriver/mac64/98.0.4758.102/chromedriver] found in cache
  search_box = driver.find_element_by_class_name("_1xq16jy")
  flexible_button = driver.find_element_by_class_name("_9qlt59")
  search_button = driver.find_element_by_xpath('//*[@id="search-tabpanel"]/div/div[5]/div[2]/button/span[1]/span')


Current google-chrome version is 98.0.4758
Get LATEST chromedriver version for 98.0.4758 google-chrome
Driver [/Users/austinsmith/.wdm/drivers/chromedriver/mac64/98.0.4758.102/chromedriver] found in cache


In [30]:
# check point - should be 133
len(test['Stevensville'])

133

In [31]:
# check poing - should be 10
len(test['West Glendive'])

10

In [None]:
# Expand to all towns 
# not run because this takes forever

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

for town in towns :
    # open the driver and go to Airbnb's homepage ahead of this loop
    driver.get(url)
    time.sleep(4)
    
    # ID the search field and input the town into it
    try :
        search_box = driver.find_element(By.CLASS_NAME, "_1xq16jy")
        search_box.send_keys(f'{town}, MT')
        search_box.send_keys(Keys.RETURN)
        time.sleep(2)
    
    except :
        print(f"couldn't find the search box for {town}")
    
     # ID the "I'm Flexible" button and click it
    try :
        flexible_button = driver.find_element(By.CLASS_NAME, "_9qlt59")
        flexible_button.click()
        time.sleep(2)
    
    except :
        print(f"couldn't find the flexible button for {town}")
    
     # then click search and provide some time to load the page
    try :
        search_button = driver.find_element(By.XPATH, '//*[@id="search-tabpanel"]/div/div[5]/div[2]/button/span[1]/span')
        search_button.click()
        time.sleep(3)
    
    except :
        print(f"couldn't find the search button for {town}")
    
    # figure out how many pages to search
    html = driver.page_source
    html_soup = BeautifulSoup(html, 'html.parser')
    
    # number of listings
    num_listings = html_soup.find_all('h1', class_ = '_78tyg5')
    num_listings_string = []
    for x in num_listings:
        num_listings_string.append(str(x))
    
    # the code below accounts for the possibility that there may be "300+" listings
    try :
        raw_listings = (num_listings_string[0].split(">")[1].split(" ")[0])
    except :
        print(f"couldn't get the page #s for {town}")

    if raw_listings == '300+' :
        num_listings = int(raw_listings[:-1])
    else :
        num_listings = int(raw_listings)

    # clear out the list for next round
    num_listings_string.clear()
    
    # divide by 20 and round up since there are 20 results per page
    num_pages = math.ceil(num_listings /20)
    
    # grab the urls to search for each page of results
    url_list = []
    listings_per_page=20
    
    current_url = driver.current_url

    for i in range(num_pages) :
        offset = listings_per_page * i
        url_pagination = current_url + f'&items_offset={offset}'
        url_list.append(url_pagination)
    
    # scrape each page and add the results to the default dictionary
    if len(url_list) == num_pages :
        city_listings[town] = process_search_pages(url_list)
    else :
        print("Didn't get the correct # of pages.")
    
    # clear out URL list
    url_list.clear()
    
    # then quit that webpage before starting on the next town
    driver.quit()

In [92]:
driver.close()