# Final Project IS602
## Real State Price Modeler

### Marco Siqueira Campos

This modeler predicts the real state price of apartments in the city of Porto Alegre in southern Brazil.  
The database was captured on the web by another project.  
Multiple regression and random forest regression were used to calculate the value estimates.  
Random forest presents lower estimation error, analysis of model performance was outside the scope of this project.  
As a benchmark for testing the following neighborhood contain different profiles: *Bela Vista* higher, *Centro* (city center) intermediate, and *Restinga* low value.  


In [1]:
# import modules
import ipywidgets as widgets
from IPython.display import display, Javascript
from __future__ import print_function
from __future__ import division
from ipywidgets import interact, interactive, fixed, interact_manual
from ipyleaflet import (Map, Marker)
import pandas as pd
import statsmodels.formula.api as smf
from sklearn.ensemble import RandomForestRegressor
import numpy as np
import pylab as pl
import operator

### Functions
Two functions were developed:  
   *place* - The objective was to organize the latitude and longitude data so that the map function could be made.  
   *run_cell* - The goal was to execute a javascript to enable the interactivity of the submitt button.  


In [2]:
# Functions
def place(c,d):
    e=d[['lat','lng']][c:c+1]
    e=list(e.values.flatten())
    return (e)

def run_cell(ev):
    display(Javascript('IPython.notebook.execute_cell()'))


### Data Base
Two database were downloaded from my Github account.  
   *poa_database.csv* is the table from ad in web of the main features of apartments.    
    *poa_nbh.csv* Is the neighborhood table with latitud and longitud of center of neighborhood.  

In [3]:
# load data base
poa_db = pd.read_csv('https://raw.githubusercontent.com/MarcoSCampos/testdata/master/poa_database.csv')
poa_nbh = pd.read_csv('https://raw.githubusercontent.com/MarcoSCampos/testdata/master/poa_nbh.csv')

### Organizing and modeling data


In [4]:
#tiding the data
poa_db.rename(columns={'Bairro':'nbh','Area':'area', 'Valor':'value', 'Dormitorios':'rooms', 'Vagas':'parking'}, inplace=True)
poa_nbh.rename(columns={'bairro':'nbh'}, inplace=True)

#LabelEncoder
poa_nbh.insert(0, 'nbh_id', range(0, len(poa_nbh)))

# Merge with right join
poa_db2 = pd.merge(poa_nbh, poa_db, on='nbh', how='right')
poa_db2.head()
poa_db2['nbh2']=pd.Series(poa_db2.nbh, index=poa_db2.index).str.replace('_',' ')
poa_nbh['nbh2']=pd.Series(poa_nbh.nbh, index=poa_nbh.index).str.replace('_',' ')

# Generate a dict and sort
mydict=dict(zip(poa_nbh.nbh2,poa_nbh.nbh_id))
mydict=sorted(mydict.items(),key=operator.itemgetter(1))

# Do lm and rf model
lm=smf.ols(formula='value ~ area + rooms + parking + nbh', data=poa_db2).fit()
rf = RandomForestRegressor(n_estimators=150);




### Widgets
Generate the widgets with start values

In [5]:
w1 = widgets.IntSlider(description='rooms', min=1, max=4,step=1, value=2)
w2 = widgets.IntSlider(description='garage', min=0, max=2,step=1, value=1)
w3= widgets.BoundedIntText(value=1000,description='area',min=10, max=2700,step=1, disabled=False)
w4=widgets.RadioButtons(description='Prediction:', options=['linear regression','random forest'], value='linear regression', disabled=False)
w5 = widgets.Dropdown(options=mydict,value=12,description='Neighborhood:',)
w7 = widgets.RadioButtons(description='Measuring', options=['m2','ft2'], value='ft2', disabled=False)
a=[-30.028591  ,-51.228060]
w6=Map(center=a, zoom=15)


### Display the data
We have two test if the measuring system is ft2 or m2 and if the model is random forest rf or multiple linear regression, lm.  
Convert the currency from Real to US Dollar.  
Generate the information to inside the marker.  
Display the data.  


In [7]:

l=[]
l.append(w1.value)# l[0] rooms 2
l.append(w2.value)# l[1] garage 1
l.append(w3.value)# l[2] area 1000
l.append(w5.value)# l[3] neighborhood 12="Centro"
l.append(w7.value)# l[4] measuring ft2

w8= widgets.Button(description="Submit")
w8.on_click(run_cell)

# measuring system
if w7.value in['ft2']: 
    l[2]=int(round(float(l[2])*0.09290))
else:
    pass 
    
# modeling
if w4.value in ['linear regression']:
 #Linear modeling
 new={'area':[l[2]],'rooms':[l[0]],'parking':[l[1]], 'nbh':[poa_nbh.nbh[l[3]]]}
 z=lm.predict(new)[0]
else:
 #Random forest modeling   
 new1=pd.DataFrame({'area':[l[2]],'rooms':[l[0]],'parking':[l[1]], 'nbh_id':[l[3]]})
 cols = ['area', 'rooms', 'parking','nbh_id']
 rf.fit(poa_db2[cols], poa_db2.value);
 z=rf.predict(new1[cols])[0]
    
z=z/3.2 # convert real to dollar
w6=Map(center=place(w5.value,poa_nbh), zoom=15)
mark=Marker(location=w6.center,title='Property value \n US$ %.2f'%(z))
mark.visible
w6 += mark

display (w1,w2,w3,w7,w4,w5,w6,w8)


<IPython.core.display.Javascript object>