# Housing price prediction in Bogota, Colombia
[Author: Elias Buitrago Bolivar](https://github.com/ebuitrago?tab=repositories)

This jupyter notebook depicts a python based web scraping  algorithm to obtain real estate data from the portal fincaraiz.com.co. The code presented here is functional and was tested by scraping real estate data of used apartments sales from Bogota, Colombia. It is recommended to run the notebook locally in Ananconda, as there is a bug when running it on platforms such as google colab.

_Updated: February 25th, 2023_


## Experimental design

### Web Scraping real estate data
This section explains the web scraping process applied to the fincaraiz.com.co web page.

#### Import required libreries

In [9]:
from bs4 import BeautifulSoup as bs
from selenium import webdriver
import pandas as pd

In [10]:
#Function to get 'href' from each article item
def gethref(soup):

    links = []

    for article in soup.find_all(attrs={"class": "listingCard"}):
        
        url = article.find('a', href=True)

        if url:
            link = url['href']
            links.append(link)

    print("Href obtained: ", len(links))

    return links

In [11]:
def varsfromscrap(soup, cols):

    features = []
    aux_dict = {}

    for i in range(0, len(soup), 3):

        if soup[i + 2].text == "¡Pregúntale!":       
            val = "NaN"
        else:
            val = soup[i + 2].text

        aux_dict[soup[i].text[2:]] = val

    features = [aux_dict.get(col, "NaN") for col in cols]

    return features

In [12]:
# Version 1.0
def housing_features(soup, cols):

    #Obtaining whole info from the html section that stores main housing variables
    s = soup.find('div',{'class':'technical-sheet'}).find_all(class_='ant-col')
    #print(s)

    #Extract first 10 features from soup
    feautures = varsfromscrap(s, cols)

    # Adding Price
    price = soup.find('span',{'class':'price'}).text
    feautures[-1] = price

    return feautures

In [13]:
#Function to call housing_features routine on each href
def scrapper(id_inmueble, cols):

    #Initialize and execute Selenium
    op = webdriver.ChromeOptions()
    op.add_argument('headless')
    browser = webdriver.Chrome(options=op)
    url_inm = 'https://www.fincaraiz.com.co' + id_inmueble + '/'
    print(url_inm)
    browser.get(url_inm)
    browser.implicitly_wait(10)
    html=browser.page_source

    #Obtaining the html from the web page after applying Selenium
    soup=bs(html,'lxml')

    #Create a list to store info obtained from one particular property
    features = []

    #Applying function to obtain variables defined from one particular property
    features = housing_features(soup, cols)

    #Close the web browser tab
    browser.close()

    #Close the web browser
    browser.quit()

    return(features)

## Selenium

In [14]:
#Selenium+BS
pag = 1
url = f'https://www.fincaraiz.com.co/venta/casas-y-apartamentos-y-apartaestudios/medellin/antioquia/usados?pagina={pag}'

op = webdriver.ChromeOptions()
op.add_argument('headless')
browser = webdriver.Chrome(options=op)

print(url)
browser.get(url)
browser.implicitly_wait(10)
html = browser.page_source
soup = bs(html,'lxml')

https://www.fincaraiz.com.co/venta/casas-y-apartamentos-y-apartaestudios/medellin/antioquia/usados?pagina=1


In [15]:
#Get href
links = gethref(soup)

Href obtained:  21


In [16]:
#Scraping
cols = ["Estrato", "Tipo de Inmueble", "Estado", "Baños", "Área Construida", "Área Privada", "Antigüedad", "Habitaciones",
        "Parqueaderos", "Administración", "Precio"]

data = pd.DataFrame(columns=cols)

p = []
#Scraping a los inmuebles filtrados
for i, link in enumerate(links):

    print('Scrapping', i, '/', len(links), '...')
    p.append(scrapper(link, cols))
    print(p[i])

    #append list to DataFrame
    data.loc[len(data)] = p[i]

Scrapping 0 / 21 ...
https://www.fincaraiz.com.co/apartamento-en-venta/191702331/
['6', 'Apartamento', 'NaN', '5', '234.00  m2', '234.00  m2', '16 a 30 años', '3', '3', '1,110,000.00', '$ 2.129.000.000']
Scrapping 1 / 21 ...
https://www.fincaraiz.com.co/apartamento-en-venta/191466600/
['5', 'Apartamento', 'NaN', '4', '166.00  m2', '166.00  m2', '9 a 15 años', '3', '1', 'NaN', '$ 850.000.000']
Scrapping 2 / 21 ...
https://www.fincaraiz.com.co/apartamento-en-venta/191266208/
['4', 'Apartamento', 'NaN', '2', '60.32  m2', '60.32  m2', '1 a 8 años', '3', '1', '250,000.00', '$ 339.000.000']
Scrapping 3 / 21 ...
https://www.fincaraiz.com.co/apartamento-en-venta/191415776/
['5', 'Apartamento', 'NaN', '2', '83.00  m2', '83.00  m2', 'NaN', '2', 'NaN', '660,000.00', '$ 540.000.000']
Scrapping 4 / 21 ...
https://www.fincaraiz.com.co/apartamento-en-venta/191435403/
['5', 'Apartamento', 'NaN', '3', '166.00  m2', '166.00  m2', 'más de 30 años', '3', '2', '1,075,000.00', '$ 990.000.000']
Scrapping 5 /

In [17]:
display(data)

Unnamed: 0,Estrato,Tipo de Inmueble,Estado,Baños,Área Construida,Área Privada,Antigüedad,Habitaciones,Parqueaderos,Administración,Precio
0,6,Apartamento,,5.0,234.00 m2,234.00 m2,16 a 30 años,3,3.0,1110000.0,$ 2.129.000.000
1,5,Apartamento,,4.0,166.00 m2,166.00 m2,9 a 15 años,3,1.0,,$ 850.000.000
2,4,Apartamento,,2.0,60.32 m2,60.32 m2,1 a 8 años,3,1.0,250000.0,$ 339.000.000
3,5,Apartamento,,2.0,83.00 m2,83.00 m2,,2,,660000.0,$ 540.000.000
4,5,Apartamento,,3.0,166.00 m2,166.00 m2,más de 30 años,3,2.0,1075000.0,$ 990.000.000
5,6,Apartamento,,3.0,132.00 m2,132.00 m2,9 a 15 años,3,2.0,912300.0,$ 1.300.000.000
6,5,Apartamento,,3.0,120.00 m2,120.00 m2,1 a 8 años,3,1.0,,$ 900.000.000
7,3,Apartamento,,2.0,60.00 m2,60.00 m2,más de 30 años,3,1.0,127000.0,$ 350.000.000
8,5,Apartamento,Buen estado,,82.00 m2,82.00 m2,más de 30 años,3,1.0,356000.0,$ 438.000.000
9,4,Casa,,6.0,343.00 m2,261.00 m2,menor a 1 año,6,2.0,,$ 1.500.000.000


#### Initialize and execute Selenium + BeautifulSoap

In [18]:
# data.to_csv('used_properties_medellin_2024.csv', encoding='utf-8', index=False)