# Modelos de Predicción en Inmuebles

## Variables

- ad_description: Descripción del anuncio
- ad_last_update: Ultima actualización del anuncio
- air_conditioner: Aire acondicionado
- balcony: Balcon
- bath_num: Número de baños
- built_in_wardrobe: Armario empotrado
- chimney: Chimenea
- condition: Condición (Segunda mano, Nueva contrucción)
- construct_date: Fecha construcción
- energetic_certif: Si possee certificado energetico
- floor: Piso
- garage: Garaje
- garden: Jardin
- ground_size: 
- heating
- house_id
- house_type
- kitchen
- lift: Altura (Atico, Bajo)
- loc_city
- loc_district
- loc_full
- loc_neigh
- loc_street
- loc_zone
- m2_real
- m2_useful
- obtention_date
- orientation
- price
- reduced_mobility
- room_num
- storage_room
- swimming_pool
- terrace
- unfurnished
- number_of_companies_prov
- population_prov
- companies_prov_vs_national_%
- population_prov_vs_national_%
- renta_media_prov

In [2]:
### Bibliotecas básicas de gestión de datos
import pandas as pd
import numpy as np

### Bibliotecas gráficas
import matplotlib.pyplot as plt
import seaborn as sns 

### Utilidades Scikit-learn 
from sklearn.model_selection import train_test_split

### Modelos
from xgboost import XGBClassifier, plot_importance
from sklearn.ensemble import RandomForestClassifier

### Métricas
from sklearn import metrics
from sklearn.metrics import accuracy_score, roc_curve, auc

## Preprocesamiento

In [3]:
df = pd.read_csv('./data/spanish_houses.csv')
df.head()

Unnamed: 0,ad_description,ad_last_update,air_conditioner,balcony,bath_num,built_in_wardrobe,chimney,condition,construct_date,energetic_certif,...,room_num,storage_room,swimming_pool,terrace,unfurnished,number_of_companies_prov,population_prov,companies_prov_vs_national_%,population_prov_vs_national_%,renta_media_prov
0,Precio chalet individual en la localidad de Ab...,Anuncio actualizado el 27 de marzo,0,0,2,0,0,segunda mano/buen estado,,,...,4,0,0,1,,19147,328868,0.57,0.7,19889.0
1,"Atico de 80m2, para entrar a vivir, con salón ...",más de 5 meses sin actualizar,0,0,2,0,0,segunda mano/buen estado,2006.0,no indicado,...,3,1,0,0,,19147,328868,0.57,0.7,19889.0
2,B/ Etxaguen. Casa de reciente construcción con...,más de 5 meses sin actualizar,0,0,3,0,0,segunda mano/buen estado,,no indicado,...,4,1,0,1,,19147,328868,0.57,0.7,19889.0
3,Se vende vivienda en abornikano (ayuntamiento ...,más de 5 meses sin actualizar,0,1,1,1,1,segunda mano/buen estado,,en trámite,...,4,1,0,1,,19147,328868,0.57,0.7,19889.0
4,Negociables.,más de 5 meses sin actualizar,0,0,1,0,0,segunda mano/buen estado,,no indicado,...,2,1,1,1,,19147,328868,0.57,0.7,19889.0


In [4]:
# Estadísticas descriptivas
df.describe()

Unnamed: 0,number_of_companies_prov,population_prov,companies_prov_vs_national_%,population_prov_vs_national_%,renta_media_prov
count,100000.0,100000.0,100000.0,100000.0,59280.0
mean,100050.34209,1320589.0,3.000553,2.827112,11864.35119
std,114436.597411,1379332.0,3.429043,2.952647,9980.4353
min,5689.0,88600.0,0.17,0.19,21.613
25%,49582.0,720592.0,1.49,1.54,22.822
50%,75628.0,1128908.0,2.27,2.42,19818.0
75%,96638.0,1149628.0,2.9,2.46,19818.0
max,538917.0,6578079.0,16.15,14.08,21714.0


In [5]:
# Tipos de datos
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 41 columns):
 #   Column                         Non-Null Count   Dtype  
---  ------                         --------------   -----  
 0   ad_description                 95426 non-null   object 
 1   ad_last_update                 100000 non-null  object 
 2   air_conditioner                100000 non-null  object 
 3   balcony                        100000 non-null  object 
 4   bath_num                       100000 non-null  object 
 5   built_in_wardrobe              100000 non-null  object 
 6   chimney                        100000 non-null  object 
 7   condition                      86059 non-null   object 
 8   construct_date                 32059 non-null   object 
 9   energetic_certif               74691 non-null   object 
 10  floor                          79693 non-null   object 
 11  garage                         40811 non-null   object 
 12  garden                         

In [6]:
df.isin( [ 0 ] ).sum()

ad_description                   0
ad_last_update                   0
air_conditioner                  0
balcony                          0
bath_num                         0
built_in_wardrobe                0
chimney                          0
condition                        0
construct_date                   0
energetic_certif                 0
floor                            0
garage                           0
garden                           0
ground_size                      0
heating                          0
house_id                         0
house_type                       0
kitchen                          0
lift                             0
loc_city                         0
loc_district                     0
loc_full                         0
loc_neigh                        0
loc_street                       0
loc_zone                         0
m2_real                          0
m2_useful                        0
obtention_date                   0
orientation         