## Web Scraper - IMFD Taller Periodistas 2019
Se obtendrán datos de fuentes de oferta inmobiliaria

#### Fuentes:
1. www.portalinmobiliario.com
2. http://www.propiedades.emol.com
3. www.zoominmobiliario.com

El objetivo es obtener ofertas inmobiliarias enfocadas en la clase media Chilena (C1b, C2, C3).

#### Supuestos de la clase media:
- Ingresos mensuales entre 900.000 y 2.000.000 CLP.
- Préstamo inmobiliario a 25 o hasta 40 años.
- persona Chilena promedio gasta en promedio 40% a 60% del sueldo en vivienda (360.000 a 1.200.000)
- Precio de compra entre 2000 y 3000 UFs.
- Arriendo entre 360.000 y 1.200.000 CLP (UF 13.0 a 44.0).

Tutoriales:
1. Selenium Web Scraping: https://medium.com/the-andela-way/introduction-to-web-scraping-using-selenium-7ec377a8cf72
2. GeckoDriver: https://askubuntu.com/questions/870530/how-to-install-geckodriver-in-ubuntu
3. Selenium: https://selenium-python.readthedocs.io/installation.html, 

In [6]:
import sys

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.common.exceptions import TimeoutException

----
### Portal Inmobiliario

In [7]:
# Variables

# Purchase min and max values
price_down = '2.000'
price_high = '3.000'

# Purchase min and max values
rent_min = '13'
rent_max = '44'

# First is dptos (pages[0]), second houses(pages[1])
# The inside bracket: first purchase (pages[0][0], pages[1][0]), second renting (pages[0][1], pages[1][1])
pages = [[148, 357],[69, 61]]

tipos = ['departamento', 'casa']

urls = []
for i in range(len(tipos)):
    t = tipos[i]
    # Buy
    for p in range(1, pages[i][0]+1):
        urls.append("https://www.portalinmobiliario.com/venta/"+t+"/metropolitana?pd="+price_down+"&ph="+price_high+"&pg="+str(p))
    # Rent
    for p in range(1, pages[i][1]+1):
        urls.append("https://www.portalinmobiliario.com/arriendo/"+t+"/metropolitana?pd="+rent_min+"&ph="+rent_max+"&pg="+str(p))
    
# for u in urls:
#     print(u)

In [None]:
# Get data
browser = webdriver.Firefox()
p_inmobiliario = []

for u in urls:
    browser.get(u)
    
    # Find all offers' data
    titles_element = browser.find_elements_by_class_name('product-item-data')
    
    # Code, Address, Price(s), Size
    for prop in titles_element:
        
        data = prop.text.split('\n')
        try:
            rooms = 'n/a'

            # Clean Data depending on "Proyecto" or "Propiedad Usada"
            if 'Proyecto' in data[0]:
                data = data[2:]
            else:
                rooms = data[3]
                data = data[1:3] + data[4:]

            # Get Address
            addr = data[0]

            # Get Code
            code = int(data[1].split()[1])

            # Get price
            price_min = float(data[3].split(',')[0].replace("UF ", "").replace(".", ''))

            values = []
            # Check if there's "hasta" price
            if(data[4]=="Hasta:"):
                price_max = float(data[5].split(',')[0].replace("UF ", "").replace(".", ''))
                values = data[7].replace(",", ".").split() 
            else:
                price_max = 0.0
                values = data[5].replace(",", ".").split()

            size_min = float(values[0])

            if(len(values)>2):
                size_max = float(values[2])
            else:
                size_max = 0.0

            purchase = 'venta' if 'venta' in u else 'arriendo'
            elem_type = 'casa' if 'casa' in u else 'departamento'

            p_inmobiliario.append([code, addr, rooms, price_min, price_max, size_min, size_max, purchase, elem_type])
            
        except:
            e = sys.exc_info()
            print(u)
            print(e)
            print(data)
browser.close()

https://www.portalinmobiliario.com/venta/departamento/metropolitana?pd=2.000&ph=3.000&pg=16
(<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f5b92a8cc48>)
['Proyecto Revelacion - Estudio - Santiago, Santiago', 'Código: 4852854', 'Valor:', 'UF 2.000,00']
https://www.portalinmobiliario.com/venta/departamento/metropolitana?pd=2.000&ph=3.000&pg=21
(<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f5b92a89808>)
['Departamento Barrio Yungay, Santiago, Santiago', 'Código: 5009529', 'UF 2.065,00']
https://www.portalinmobiliario.com/venta/departamento/metropolitana?pd=2.000&ph=3.000&pg=23
(<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f5b92a17d88>)
['SD-01 Santo Domingo - Metro Santa Ana, Santiago', 'Código: 4534753', 'Valor:', 'UF 2.099,00']
https://www.portalinmobiliario.com/venta/departamento/metropolitana?pd=2.000&ph=3.000&pg=28
(<class 'IndexError'>, IndexError('list index out of

https://www.portalinmobiliario.com/venta/departamento/metropolitana?pd=2.000&ph=3.000&pg=88
(<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f5b929bfa08>)
['Proyecto Vista Bella La Florida - 2D1B+B, La Florida', 'Código: 4935921', 'Valor:', 'UF 2.588,00']
https://www.portalinmobiliario.com/venta/departamento/metropolitana?pd=2.000&ph=3.000&pg=90
(<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f5b929bf748>)
['Edificio Portal Independencia 3 - 2D1B -, Independencia', 'Código: 4942644', 'Valor:', 'UF 2.600,00']
https://www.portalinmobiliario.com/venta/departamento/metropolitana?pd=2.000&ph=3.000&pg=93
(<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f5b929b36c8>)
['Edificio Portal Independencia 3 - 2D1B -, Independencia', 'Código: 4942644', 'Valor:', 'UF 2.600,00']
https://www.portalinmobiliario.com/venta/departamento/metropolitana?pd=2.000&ph=3.000&pg=94
(<class 'IndexError'>, 

https://www.portalinmobiliario.com/arriendo/departamento/metropolitana?pd=13&ph=44&pg=21
(<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f5b928fe1c8>)
['Padre Alonso De Ovalle 1621, Santiago', 'Código: 5023834', 'Valor:', 'UF 13,58']
https://www.portalinmobiliario.com/arriendo/departamento/metropolitana?pd=13&ph=44&pg=21
(<class 'ValueError'>, ValueError("could not convert string to float: 'Superficie:'"), <traceback object at 0x7f5b929044c8>)
['Antiguo Remodelado / Espectacular Terraz, Santiago', 'Código: 5027563', 'UF 13,58', 'Superficie:', '48 - 48 m²']
https://www.portalinmobiliario.com/arriendo/departamento/metropolitana?pd=13&ph=44&pg=31
(<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f5b928f6bc8>)
['Amalia Errazuriz 956 (+ Bod), Independencia', 'Código: 5022473', 'Valor:', 'UF 13,93']
https://www.portalinmobiliario.com/arriendo/departamento/metropolitana?pd=13&ph=44&pg=31
(<class 'IndexError'>, IndexErro

https://www.portalinmobiliario.com/arriendo/departamento/metropolitana?pd=13&ph=44&pg=201
(<class 'ValueError'>, ValueError("could not convert string to float: 'Superficie:'"), <traceback object at 0x7f5b9273c788>)
['COVENTRY, Ñuñoa', 'Código: 4835560', 'UF 25,04', 'Superficie:', '75 - 85 m²']
https://www.portalinmobiliario.com/arriendo/departamento/metropolitana?pd=13&ph=44&pg=202
(<class 'ValueError'>, ValueError("could not convert string to float: 'Superficie:'"), <traceback object at 0x7f5b92737f48>)
['COVENTRY, Ñuñoa', 'Código: 4835560', 'UF 25,04', 'Superficie:', '75 - 85 m²']
https://www.portalinmobiliario.com/arriendo/departamento/metropolitana?pd=13&ph=44&pg=210
(<class 'IndexError'>, IndexError('list index out of range'), <traceback object at 0x7f5b92751f48>)
['Las Hualtatas/ San Damian/ Imagina, Las Condes', 'Código: 4334500', 'Valor:', 'UF 26,47']
https://www.portalinmobiliario.com/arriendo/departamento/metropolitana?pd=13&ph=44&pg=225
(<class 'IndexError'>, IndexError('lis

#### Google Maps
Después de obtener los datos, debemos mostrarlos en el mapa

In [None]:
with open('API_key.txt') as f:
    api_key = f.readline()
    f.close

### TRY STUFF BLOCK

In [148]:
browser.get('https://www.portalinmobiliario.com/venta/departamento/metropolitana?pd=2.000&ph=3.000&pg=1')
# find_elements_by_xpath returns an array of selenium objects.
titles_element = browser.find_elements_by_class_name('product-item-data')

# print(titles_element[0].text.split('\n'))#)[7]))#.split(',')[0].replace('UF ', '')))

p_inmobiliario = []

# Code, Address, Price(s), Size
for prop in titles_element:
    data = prop.text.split('\n')
    rooms = 'n/a'
    # Clean Data depending on "Proyecto" or "Propiedad Usada"
    if 'Proyecto' in data[0]:
        data = data[2:]
    else:
        rooms = data[3]
        data = data[1:3] + data[4:]
        
    print(data)
    # Get Address
    addr = data[0]
    # Get Code
    code = int(data[1].split()[1])
    # Get price
    price_min = float(data[3].split(',')[0].replace("UF ", "").replace(".", ''))
    values
    # Check if there's "hasta" price
    if(data[4]=="Hasta:"):
        price_max = float(data[5].split(',')[0].replace("UF ", "").replace(".", ''))
        values = data[7].replace(",", ".").split() 
    else:
        price_max = 0.0
        values = data[5].replace(",", ".").split()
    size_min = float(values[0])
    if(len(values)>2):
        size_max = float(values[2])
    else:
        size_max = 0.0
    
    p_inmobiliario.append([code, addr, rooms, price_min, price_max, size_min, size_max, "sell", "dpto"])
    print(p_inmobiliario[-1])



['Departamental 1475, La Florida', 'Código: 7364', 'Desde:', 'UF 2.100,00', 'Hasta:', 'UF 3.690,00', 'Superficie:', '37,21 - 67,65 m²']
[7364, 'Departamental 1475, La Florida', 'n/a', 2100.0, 3690.0, 37.21, 67.65, 'sell', 'dpto']
['Lazo 1456, San Miguel', 'Código: 7401', 'Desde:', 'UF 2.797,00', 'Hasta:', 'UF 3.570,00', 'Superficie:', '55,90 - 71,70 m²']
[7401, 'Lazo 1456, San Miguel', 'n/a', 2797.0, 3570.0, 55.9, 71.7, 'sell', 'dpto']
['Vicuña Mackenna 6130, La Florida', 'Código: 7612', 'Desde:', 'UF 2.460,00', 'Hasta:', 'UF 3.500,00', 'Superficie:', '39,06 - 74,24 m²']
[7612, 'Vicuña Mackenna 6130, La Florida', 'n/a', 2460.0, 3500.0, 39.06, 74.24, 'sell', 'dpto']
['Camino del Paisaje 6546, La Florida', 'Código: 4708', 'Desde:', 'UF 2.690,00', 'Hasta:', 'UF 6.920,00', 'Superficie:', '53,08 - 129,69 m²']
[4708, 'Camino del Paisaje 6546, La Florida', 'n/a', 2690.0, 6920.0, 53.08, 129.69, 'sell', 'dpto']
['José Miguel Carrera 680, Santiago', 'Código: 6952', 'Desde:', 'UF 2.825,00', 'Supe

In [138]:
print(data[1:])

['Avenida Vicuña Mackenna 2935 - Departamento 704, San Joaquín', 'Código: 4763668', '1D/1B', 'Valor:', 'UF 2.380,00', 'Superficie:', '33 - 36 m²']


In [146]:
arr = ['a','c,','s','r','t','f','d','q']

print(arr[1:3] + arr[4:])

['c,', 's', 't', 'f', 'd', 'q']
