# Web Scraping Housing Data From Facebook Marketplace

In this project, I aimed to gather all the rental data available on Facebook Marketplace in Kelowna. Facebook Marketplace is one of the most popular platforms for renting and buying purposes. Therefore, I chose to web scrape this platform to collect data on all the rental properties available in Kelowna. This data will provide valuable insights into the rental market in Kelowna and can be used for various purposes such as real estate analysis, property management, and predicting rental price analysis.

## Importing libraries

The web scraping was done using the Python library Selenium, along with other relevant libraries.

In [9]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from time import sleep
import requests
import warnings
warnings.filterwarnings('ignore')
import random
import pandas as pd
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

## Declaring xpath

To access the various elements on the Facebook Marketplace page, I used the Xpath method. Xpath is a language used to navigate XML and HTML documents and select elements from the document tree. In this project, I used Xpath to select specific elements from the Facebook Marketplace page and extract relevant data. The complete list of Xpaths used to access the different elements is provided below for reference:

In [7]:
open_xpath = '//div[@ class="x3ct3a4"]'
#Xpath for opening a listing on the web page

name_xpath = '//span[@class="x193iq5w xeuugli x13faqbe x1vvkbs xlh3980 xvmahel x1n0sxbx x1lliihq x1s928wv xhkezso x1gmr53x x1cpjm7i x1fgarty x1943h6x xtoi2st x41vudc xngnso2 x1qb5hxa x1xlr1w8 xzsf02u"]'
#Xpath for the name of the listings, for instance "1 Bed 1 Bath House"

price_xpath = '//div[@class="x1anpbxc"]'
#Xpath for the price of the listings, for example "$1,100/Month"

unit_details_xpath = '//span[@ class="x193iq5w xeuugli x13faqbe x1vvkbs xlh3980 xvmahel x1n0sxbx x1lliihq x1s928wv xhkezso x1gmr53x x1cpjm7i x1fgarty x1943h6x x4zkp8e x3x7a5m x6prxxf xvq8zen xo1l8bm xzsf02u x1yc453h"]'
#Xpath for the unit details of each listing
#Typically includes information on the area, number of beds and baths, and other details

see_more_xpath = '//span[@class="x193iq5w xeuugli x13faqbe x1vvkbs xlh3980 xvmahel x1n0sxbx x6prxxf xvq8zen x1s688f xzsf02u"]'
#Xpath for the "See More" button, which provides the full description of the listing when clicked

description_xpath = '//div[@class="xz9dl7a x4uap5 xsag5q8 xkhd6sd x126k92a"]'
#Xpath for the description of each listing

score_xpath = '//span[@class="x193iq5w xeuugli x13faqbe x1vvkbs xlh3980 xvmahel x1n0sxbx x1lliihq x1s928wv xhkezso x1gmr53x x1cpjm7i x1fgarty x1943h6x x4zkp8e x3x7a5m x6prxxf xvq8zen xo1l8bm xi81zsa x1yc453h"]'
#Xpath for the various scores provided by Walk Score

adress_xpath = '//span[@class="x193iq5w xeuugli x13faqbe x1vvkbs xlh3980 xvmahel x1n0sxbx x1lliihq x1s928wv xhkezso x1gmr53x x1cpjm7i x1fgarty x1943h6x x4zkp8e x676frb x1nxh6w3 x1sibtaa xo1l8bm xi81zsa x1yc453h"]'
#Xpath for the address of the listing

close_xpath = '//div[@class="x1i10hfl x6umtig x1b1mbwd xaqea5y xav7gou x1ypdohk xe8uvvx xdj266r x11i5rnm xat24cr x1mh8g0r x16tdsg8 x1hl2dhg xggy1nq x87ps6o x1lku1pv x1a2a7pz x6s0dn4 x14yjl9h xudhj91 x18nykt9 xww2gxu x972fbf xcfux6l x1qhh985 xm0m39n x9f619 x78zum5 xl56j7k xexx8yu x4uap5 x18d9i69 xkhd6sd x1n2onr6 x1vqgdyp x100vrsf x18l40ae x14ctfv"]'
#Xpath for the "Close" button, which is used to close the listing after all the information has been extracted

The code below opens the Facebook Marketplace main page, which is the starting point for the web scraping process. A 65 km radius around Kelowna was chosen as the scope for this project as it was deemed the optimal radius to gather all rental data from the area.

## Opening the main page

In [None]:
driver = webdriver.Chrome(executable_path = '/Users/allienn/chromedriver_mac_arm64/chromedriver')
driver.get('https://www.facebook.com/marketplace/111949595490847/propertyrentals/?exact=false')

## Setting request

In [55]:
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
session.get('https://www.facebook.com/marketplace/111949595490847/propertyrentals/?exact=false')


<Response [200]>

## Start of the Web Scrapping

A dictionary named "dict" was created to store all the data that will be scraped from the web.

In [54]:
dict = {      'name':'',
              'price':'',
              'adress':'',
              'unit_details':'',
              'description':'',
              'score':'',
              'URL':''}

In [None]:
# Initializing lists to store the scraped data
namell = [[]]
pricell = [[]]
addressll = [[]]
unit_detailsll = [[]]
descriptionll = [[]]
scorell = [[]]
urlll = [[]]
counter = 0

# Finding all elements with the XPATH specified in open_xpath
open = driver.find_elements(By.XPATH, open_xpath)

# Looping through the elements found
while True:

    # Get a new set of data after scrolling and avoiding repeating the same data that has already been read
    open = open[counter:]
    for o in open:
        # Initializing lists for each iteration to store data for one element
        namel = []
        pricel = []
        addressl = []
        unit_detailsl = []
        descriptionl = []
        scorel = []
        urll = []

        # Clicking on the listing link
        o.click()
        # Sleeping for a random time to avoid detection
        sleep(random.randint(2, 4))

        # Finding elements with the XPATHs specified for each data type
        # If an error occurs, the value is set to an empty string
        try:
            name = driver.find_elements(By.XPATH, name_xpath)
        except:
            name = ''
        try:
            price = driver.find_elements(By.XPATH, price_xpath)
        except:
            price = ''
        try:
            unit_details = driver.find_elements(By.XPATH, unit_details_xpath)
        except:
            unit_details = ''
        try:
            see_more = driver.find_element(By.XPATH, see_more_xpath)
            if see_more.text == 'See more':
                see_more.click()
        except:
            print('Error 2')
            pass
        try:
            description = driver.find_element(By.XPATH, description_xpath)
        except:
            description = ''
        try:
            score = driver.find_elements(By.XPATH, score_xpath)
        except:
            score = ''
        try:
            adress = driver.find_element(By.XPATH, adress_xpath)
        except:
            adress = ''
        # Getting the URL of the current page
        strUrl = driver.current_url

        # Adding the data for each element to its corresponding list
        for x in name:
            namel.append(x.text)
        try:
            addressl.append(adress.text)
        except:
            addressl.append(adress)
        for x in price:
            pricel.append(x.text)
        for x in unit_details:
            unit_detailsl.append(x.text)
        try:
            descriptionl.append(description.text)
        except:
            descriptionl.append(description)
        for s in score:
            scorel.append(s.text)
        urll.append(strUrl)

        # Updating the double lists with the lists for each data type
        namell.append(namel)
        pricell.append(pricel)
        addressll.append(addressl)
        unit_detailsll.append(unit_detailsl)
        descriptionll.append(descriptionl)
        scorell.append(scorel)
        urlll.append(urll)

        # Updating the dictionary with the list for each data type
        dict.update({'name':namell})
        dict.update({'adress':addressll})
        dict.update({'price':pricell})
        dict.update({'unit_details':unit_detailsll})
        dict.update({'description':descriptionll})
        dict.update({'score':scorell})
        dict.update({'URL':urlll})

        #Increasing counter by 1
        counter += 1

        #Closing the listing
        try:
            close_button = driver.find_element(By.XPATH, close_xpath)
            close_button.click()
            sleep(random.randint(2, 4))
        except:
            print('Error might be here in close button')
            sleep(random.randint(1, 2))
            close_button = driver.find_element(By.XPATH, close_xpath)
            close_button.click()
    # Scrolling the page down
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Sleeping for a random time to avoid detection
    sleep(random.randint(10, 15))

    # Finding new elements with the XPATH specified in open_xpath after the scroll
    open = driver.find_elements(By.XPATH, open_xpath)

## Checking the dataframe and saving it

In [59]:
df = pd.DataFrame(dict)
df.describe()

Unnamed: 0,name,price,adress,unit_details,description,score,URL
count,1350,1350,1350,1350,1350,1350,1350
unique,380,208,499,1319,1339,180,1350
top,"[, Private Room For Rent]","[$2,000 / Month]",[Location is approximate],"[Condition, Used - like new, Kelowna, BC, Join...",[Bright and spacious 2 bed 1 bath walk-out bas...,[],[]
freq,107,58,563,6,2,1156,1


In [64]:
df.to_csv('../raw/kelowna_housing_data.csv')
df.to_excel('../raw/kelowna_housing_data.xlsx')