# Project: Building a rental price barometer for Madrid using Idealista

### Notebook: Model deployment
 
Antonio Montilla

Madrid, December 2023

- This notebook brings together the analysis and output produced in the previous two notebooks (Rental_price_barometer1_data_extraction.ipynb and Rental_price_barometer2_EDA_modelling.ipynb) to build the rental price barometer interface, based on the data collected from Idealista.
- The user interface will be built to request the user desired rental features (such as neighbourhood, property's size, number of rooms and bathrooms, among others) to produce the following outputs:
    1. A prediction of the rental price based on market trends as of early December 2023. The output is given in the form of an approximate range rounded to the nearest hundred euros. For example, if the model predicts 1,760 euros per month, the output will be presented as 1,700 to 1,800 euros per month. The output serves as a price reference or a benchmark of current market prices.
    2. Actual statistics of rental prices from properties currently available that match the user's features. These include: average price, minimum and maximum.
    3. Up to five listings that match the search criteria, together with its URL address. For (2) and (3), in case there are no available properties matching the criteria, a warning message would be displaced instead.
    4. An option to retrieve socio-economic data for the enquired district.

## Importing libraries

In [60]:
import pandas as pd
import numpy as np
import random as rnd
from pandas import read_csv
from scipy.stats import ttest_1samp
import time
import json
import math
import pickle
import time

import seaborn as sns
from scipy.stats import norm, skew
from scipy import stats 
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

## Importing the database, the model and other transformers

In [2]:
#importing database
data = pd.read_csv('data_1_90.csv')

In [3]:
#importing RandomForestRegressor model
final_model = pickle.load(open('final_model.pkl', 'rb'))

#importing numerical scaler
scaler = pickle.load(open('scaler.pkl','rb'))

#importing categorical encoder
encoder = pickle.load(open('encoder.pkl','rb'))


In [4]:
data

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,hasPlan,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus
0,103241997,https://img3.idealista.com/blur/WEB_LISTING/90...,ACCM,21,1350.0,flat,rent,55.0,1,2,...,True,False,False,True,False,False,False,False,False,False
1,101227123,https://img3.idealista.com/blur/WEB_LISTING/0/...,3d26340f0da112d7aafd,15,1743.0,studio,rent,25.0,0,1,...,True,False,False,True,False,False,False,False,False,False
2,458306,https://img3.idealista.com/blur/WEB_LISTING/0/...,,24,1595.0,flat,rent,98.0,2,2,...,False,False,False,True,False,False,False,False,False,False
3,99586539,https://img3.idealista.com/blur/WEB_LISTING/0/...,2392,22,2400.0,flat,rent,57.0,1,1,...,False,False,False,True,False,False,False,False,False,False
4,102075463,https://img3.idealista.com/blur/WEB_LISTING/0/...,120413,23,1488.0,flat,rent,42.0,1,1,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4495,100627014,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline VI,13,2160.0,flat,rent,66.0,2,2,...,False,False,False,True,False,False,False,False,False,False
4496,100930829,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline V,16,2160.0,flat,rent,55.0,1,1,...,False,False,False,True,False,False,False,False,False,False
4497,100097037,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline I,11,2700.0,flat,rent,85.0,2,2,...,False,False,False,True,False,False,False,False,False,False
4498,100931418,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline VIII,13,2160.0,flat,rent,50.0,1,1,...,False,False,False,True,False,False,False,False,False,False


## Step 1: building a function for getting data ready for modelling

- **Step 1.1.** The first step before cleaning the data will be to actually obtaining the data from the user. I will first define a function to get inputs from the user and to return the answers as a dataframe.
- **Step 1.2.** The following step will be to create a function for cleaning that dataframe and transform it for making a prediction.

In [5]:
#1.1: Creating a function for asking inputs from users:

def get_user_input():
    #First, creating the list of questions with available choices
    questions = [
        ("In which district are you looking? (Please select one of the following options)", ["Centro", "Barrio de Salamanca", "Chamberí", "Chamartín", "Tetuán",
                                                "Retiro", "Moncloa", "Arganzuela", "Ciudad Lineal", "Latina",
                                                "Carabanchel", "Puente de Vallecas", "Usera", "Moratalaz"]),
        ("What type of property are you looking for? (Please select one of the following options)", ["flat", "penthouse", "studio", "duplex", "chalet"]),
        ("How big (in mts2) is your desired home? (Enter a number from 20 to 350)", None),
        ("How many rooms do you need? (Enter a number from 0 to 6)", None),
        ("How many bathrooms do you need? (Enter a number from 1 to 5)", None),
        ("Do you need parking? (1 for yes, 0 for no)", None),
        ("Do you want the rental to include a terrace? (1 for yes, 0 for no)", None),
        ("Do you want the rental to include a lift? (1 for yes, 0 for no)", None),
        ("Do you prefer luxury properties? (1 for luxury, 0 for no luxury)", None)
    ]

    #creating an empty list to store user responses
    user_responses = []

    #initiating a loop for getting user input for each question
    for question, choices in questions:
        answer = None
        while answer is None:
            try:
                if choices is not None:
                    #If choices are provided, display them to the user
                    print(f"{question} ({', '.join(choices)})")
                    answer = input().strip()
                    if answer not in choices:
                        raise ValueError("Invalid choice. Please select from the provided options.")
                else:
                    #If no choices are provided, take a numeric input
                    print(question)
                    answer = input().strip().lower()
                    if question.lower().startswith("how big") and not (20 <= float(answer) <= 350):
                        raise ValueError("Invalid input. Please enter a number between 20 and 350.")
                    elif question.lower().startswith("how many rooms") and not (0 <= float(answer) <= 6):
                        raise ValueError("Invalid input. Please enter a number between 0 and 6.")
                    elif question.lower().startswith("how many bathrooms") and not (1 <= float(answer) <= 5):
                        raise ValueError("Invalid input. Please enter a number between 1 and 5.")
                    elif question.lower().startswith("do") and float(answer) not in [0, 1]:
                        raise ValueError("Invalid input. Please enter 1 for yes or 0 for no.")
            except ValueError as e:
                print(f"Error: {e}")
                answer = None

        #now appending the user response list
        user_responses.append(answer)

    #creating a DataFrame from the user responses with specified column names
    user_df = pd.DataFrame([user_responses], columns=['district', 'propertyType', 'size', 'rooms', 'bathrooms',
                                                      'parkingSpace', 'terrace', 'hasLift', 'luxuryType'])
    #finally declaring integer columns to be ready for next stage
    user_df[['size', 'rooms', 'bathrooms', 'parkingSpace', 'terrace', 'hasLift', 'luxuryType']] = user_df[['size', 'rooms', 'bathrooms', 'parkingSpace', 'terrace', 'hasLift', 'luxuryType']].astype(int)

    return user_df


In [75]:
#using the get_user_input() function:
user_dataframe = get_user_input()

In which district are you looking? (Please select one of the following options) (Centro, Barrio de Salamanca, Chamberí, Chamartín, Tetuán, Retiro, Moncloa, Arganzuela, Ciudad Lineal, Latina, Carabanchel, Puente de Vallecas, Usera, Moratalaz)
Usera
What type of property are you looking for? (Please select one of the following options) (flat, penthouse, studio, duplex, chalet)
flat
How big (in mts2) is your desired home? (Enter a number from 20 to 350)
50
How many rooms do you need? (Enter a number from 0 to 6)
1
How many bathrooms do you need? (Enter a number from 1 to 5)
1
Do you need parking? (1 for yes, 0 for no)
0
Do you want the rental to include a terrace? (1 for yes, 0 for no)
0
Do you want the rental to include a lift? (1 for yes, 0 for no)
0
Do you prefer luxury properties? (1 for luxury, 0 for no luxury)
0


In [76]:
user_dataframe

Unnamed: 0,district,propertyType,size,rooms,bathrooms,parkingSpace,terrace,hasLift,luxuryType
0,Usera,flat,50,1,1,0,0,0,0


In [8]:
user_dataframe.dtypes

district        object
propertyType    object
size             int64
rooms            int64
bathrooms        int64
parkingSpace     int64
terrace          int64
hasLift          int64
luxuryType       int64
dtype: object

In [9]:
#1.2: Creating a function for cleaning the user's dataframe before using in model

def cleaning_user_data(data):
    #transforming 'propertyType'
    data['propertyType'] = data['propertyType'].replace(['studio', 'duplex'], 'other')
    data['propertyType'] = data['propertyType'].replace('chalet', 'penthouse')
    #transforming 'district'
    Moncloa_Arganzuela = ['Moncloa', 'Arganzuela']
    data['district'] = np.where(data.district.isin(Moncloa_Arganzuela),'Moncloa-Arganzuela', data.district)
    Vallecas_ciudad = ["Ciudad Lineal", "Moratalaz", "Puente de Vallecas"]
    data['district'] = np.where(data.district.isin(Vallecas_ciudad),'Vallecas-Ciudad Lineal', data.district)
    sur = ["Latina", "Carabanchel", "Usera"]
    data['district'] = np.where(data.district.isin(sur),'Distritos Sur', data.district)
    #splitting X_num and scaling
    X_num = data[['size', 'rooms', 'bathrooms', 'parkingSpace', 'hasLift', 'luxuryType', 'terrace']]
    x_standardized = scaler.transform(X_num)
    X_num_s = pd.DataFrame(x_standardized, columns=X_num.columns)
    #splitting X_cat and encoding
    X_cat = data[['propertyType', 'district']]
    cols = ['propertyType_other', 'propertyType_penthouse', 'district_Centro', 'district_Chamartín', 'district_Chamberí', 'district_Distritos Sur', 'district_Moncloa-Arganzuela', 'district_Retiro', 'district_Tetuán', 'district_Vallecas-Ciudad Lineal']
    encoded = encoder.transform(X_cat).toarray()
    X_cat_onehot_encoded = pd.DataFrame(encoded, columns=cols)
    #concatenating
    X_trans = pd.concat([X_num_s, X_cat_onehot_encoded], axis=1)
    return X_trans

In [10]:
#applying function cleaning_user_data() on user_dataframe:
user_dataframe_clean = cleaning_user_data(user_dataframe)
user_dataframe_clean

Unnamed: 0,size,rooms,bathrooms,parkingSpace,hasLift,luxuryType,terrace,propertyType_other,propertyType_penthouse,district_Centro,district_Chamartín,district_Chamberí,district_Distritos Sur,district_Moncloa-Arganzuela,district_Retiro,district_Tetuán,district_Vallecas-Ciudad Lineal
0,0.056604,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


## Step 2: building a function for making prediction ranges

In [11]:
def prediction_ranges(X_transf):
    #making predictions
    predictions = final_model.predict(X_transf)
    #rounding predictions to the nearest hundred euros
    rounded_predictions = np.floor(predictions / 100) * 100
    #creating a df with the original predictions and rounded predictions
    result_df = pd.DataFrame({'original_prediction': predictions, 'rounded_prediction': rounded_predictions})
    #calculating the approximation range between the two nearest hundred euros
    result_df['pred_range'] = result_df.apply(lambda row: f"{int(row['rounded_prediction'])}-{int(row['rounded_prediction'] + 100)}", axis=1)
    result_df = result_df.drop(['rounded_prediction'], axis = 1)
    return result_df

In [12]:
#now using function taking as input user_dataframe_clean:
user_dataframe_predictions = prediction_ranges(user_dataframe_clean)
user_dataframe_predictions

Unnamed: 0,original_prediction,pred_range
0,1466.41,1400-1500


In [62]:
user_dataframe_predictions['pred_range'][0]

'1400-1500'

## Step 3: Building a function for filtering data to compute statistics and show listings

In [13]:
#3.1: the first step is to have the database with all properties ready to be filtered
#importing data
data = pd.read_csv('data_1_90.csv')

#droppping duplicates
data = data.drop_duplicates()
#filling NaN for description
data['description'] = data['description'].fillna('not available')
#transforming booleans into int
bolean_col = ['parkingSpace', 'luxuryType', 'hasLift']
for col in bolean_col:
    data[col] = data[col].astype(int)
#creating 'terrace'
data['terrace'] = data['description'].str.lower().apply(lambda x: 1 if any(word in x for word in ["balcón", "balcon", "balcones", "terraza", "terrace", "balcony"]) else 0) 

In [14]:
data

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus,terrace
0,103241997,https://img3.idealista.com/blur/WEB_LISTING/90...,ACCM,21,1350.0,flat,rent,55.0,1,2,...,False,False,1,False,0,False,False,False,False,1
1,101227123,https://img3.idealista.com/blur/WEB_LISTING/0/...,3d26340f0da112d7aafd,15,1743.0,studio,rent,25.0,0,1,...,False,False,1,False,0,False,False,False,False,1
2,458306,https://img3.idealista.com/blur/WEB_LISTING/0/...,,24,1595.0,flat,rent,98.0,2,2,...,False,False,1,False,0,False,False,False,False,1
3,99586539,https://img3.idealista.com/blur/WEB_LISTING/0/...,2392,22,2400.0,flat,rent,57.0,1,1,...,False,False,1,False,0,False,False,False,False,1
4,102075463,https://img3.idealista.com/blur/WEB_LISTING/0/...,120413,23,1488.0,flat,rent,42.0,1,1,...,False,False,0,False,0,False,False,False,False,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4495,100627014,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline VI,13,2160.0,flat,rent,66.0,2,2,...,False,False,1,False,0,False,False,False,False,0
4496,100930829,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline V,16,2160.0,flat,rent,55.0,1,1,...,False,False,1,False,0,False,False,False,False,1
4497,100097037,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline I,11,2700.0,flat,rent,85.0,2,2,...,False,False,1,False,0,False,False,False,False,1
4498,100931418,https://img3.idealista.com/blur/WEB_LISTING/0/...,Skyline VIII,13,2160.0,flat,rent,50.0,1,1,...,False,False,1,False,0,False,False,False,False,1


In [15]:
data.columns

Index(['propertyCode', 'thumbnail', 'externalReference', 'numPhotos', 'price',
       'propertyType', 'operation', 'size', 'rooms', 'bathrooms', 'address',
       'province', 'municipality', 'district', 'country', 'latitude',
       'longitude', 'showAddress', 'url', 'distance', 'description',
       'hasVideo', 'status', 'newDevelopment', 'parkingSpace', 'priceByArea',
       'typology', 'subTypology', 'subtitle', 'title', 'hasPlan', 'has3DTour',
       'has360', 'hasLift', 'hasStaging', 'luxuryType', 'villaType',
       'superTopHighlight', 'topNewDevelopment', 'topPlus', 'terrace'],
      dtype='object')

In [16]:
#3.2: the second step is to build a function that uses as input the answers from users and returns as output:
# Average price of matching listings
# Minimum price of matching listings
# Maximum price of matching listings
# Top 5 matching listings
def get_matching_list(total_df, user_df):
    #dropping size and luxuryType as otherwise it excludes most properties
    user_df = user_df.drop(['size', 'luxuryType'], axis = 1)
    #creating an empty mask for filtering
    mask = pd.Series(True, index=total_df.index)
    #now applying filtering based on each column in user_df
    for column in user_df.columns:
        if column in total_df.columns:
            mask &= total_df[column] == user_df[column].values[0]

    #finally, using the mask filters to the total_dataframe
    filtered_dataframe = total_df[mask]

    return filtered_dataframe


In [17]:
filtered_df = get_matching_list(data, user_dataframe)

In [18]:
filtered_df

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus,terrace
2194,102454674,https://img3.idealista.com/blur/WEB_LISTING/0/...,204811,6,1600.0,flat,rent,45.0,1,1,...,False,False,0,False,0,False,False,False,False,0
2224,103365277,https://img3.idealista.com/blur/WEB_LISTING/0/...,Retiro- MP Bajo,28,1600.0,flat,rent,56.0,1,1,...,False,False,0,False,0,False,False,False,False,0
2826,103362998,https://img3.idealista.com/blur/WEB_LISTING/0/...,,17,1200.0,flat,rent,46.0,1,1,...,False,False,0,False,0,False,False,False,False,0
2933,90114863,https://img3.idealista.com/blur/WEB_LISTING/0/...,dd9c6219a6ae49385113,35,2200.0,flat,rent,45.0,1,1,...,False,False,0,False,0,False,False,False,False,0
3247,102075773,https://img3.idealista.com/blur/WEB_LISTING/0/...,120416,71,1490.0,flat,rent,65.0,1,1,...,False,False,0,False,0,False,False,False,False,0
3292,102075390,https://img3.idealista.com/blur/WEB_LISTING/0/...,120409,25,1503.0,flat,rent,50.0,1,1,...,False,False,0,False,0,False,False,False,False,0
3410,99306353,https://img3.idealista.com/blur/WEB_LISTING/0/...,30,24,1100.0,flat,rent,50.0,1,1,...,False,False,0,False,0,False,False,False,False,0
3451,102075486,https://img3.idealista.com/blur/WEB_LISTING/0/...,120517,47,1770.0,flat,rent,55.0,1,1,...,False,False,0,False,0,False,False,False,False,0


In [20]:
user_dataframe

Unnamed: 0,district,propertyType,size,rooms,bathrooms,parkingSpace,terrace,hasLift,luxuryType
0,Retiro,flat,50,1,1,0,0,0,0


In [21]:
#confirming it shows the actual filtered data
data[(data['district']== 'Retiro')&(data['rooms']== 1)&(data['bathrooms']== 1)&(data['terrace']== 0)&(data['hasLift']== 0)&(data['parkingSpace']== 0)&(data['propertyType']== 'flat')]

Unnamed: 0,propertyCode,thumbnail,externalReference,numPhotos,price,propertyType,operation,size,rooms,bathrooms,...,has3DTour,has360,hasLift,hasStaging,luxuryType,villaType,superTopHighlight,topNewDevelopment,topPlus,terrace
2194,102454674,https://img3.idealista.com/blur/WEB_LISTING/0/...,204811,6,1600.0,flat,rent,45.0,1,1,...,False,False,0,False,0,False,False,False,False,0
2224,103365277,https://img3.idealista.com/blur/WEB_LISTING/0/...,Retiro- MP Bajo,28,1600.0,flat,rent,56.0,1,1,...,False,False,0,False,0,False,False,False,False,0
2826,103362998,https://img3.idealista.com/blur/WEB_LISTING/0/...,,17,1200.0,flat,rent,46.0,1,1,...,False,False,0,False,0,False,False,False,False,0
2933,90114863,https://img3.idealista.com/blur/WEB_LISTING/0/...,dd9c6219a6ae49385113,35,2200.0,flat,rent,45.0,1,1,...,False,False,0,False,0,False,False,False,False,0
3247,102075773,https://img3.idealista.com/blur/WEB_LISTING/0/...,120416,71,1490.0,flat,rent,65.0,1,1,...,False,False,0,False,0,False,False,False,False,0
3292,102075390,https://img3.idealista.com/blur/WEB_LISTING/0/...,120409,25,1503.0,flat,rent,50.0,1,1,...,False,False,0,False,0,False,False,False,False,0
3410,99306353,https://img3.idealista.com/blur/WEB_LISTING/0/...,30,24,1100.0,flat,rent,50.0,1,1,...,False,False,0,False,0,False,False,False,False,0
3451,102075486,https://img3.idealista.com/blur/WEB_LISTING/0/...,120517,47,1770.0,flat,rent,55.0,1,1,...,False,False,0,False,0,False,False,False,False,0


In [None]:
#3.3. As a final step, I will now combine the two functions to return all desired outputs.
#Will also add a message warning in case there are no matching properties
def get_matching(total_df, user_df):
    filtered_df = get_matching_list(total_df, user_df)
    if len(filtered_df) > 0:
        mean_price = round(filtered_df['price'].mean())
        max_price = round(filtered_df['price'].max())
        min_price = round(filtered_df['price'].min())
        print("The average rental price for the requested criteria is", mean_price, "euros/month")
        time.sleep(2)
        print("The rental price ranges from a minimum of", min_price, "euros/month to a maximum of", max_price, "euros/month")
        time.sleep(2)
        print("Also, here is a list of potential properties you could check in Idealista: ")
        time.sleep(3)
        display(filtered_df['url'].head())
    else:
        print("However, unfortunately there were no properties currently available in Idealista that matches your criteria.")

In [26]:
#now using the function on the data df and the user requests
get_matching(data, user_dataframe)

The average rental price for the requested criteria is 1558 euros/month
The rental price ranges from a minimum of 1100 euros/month to a maximum of 2200 euros/month
Also, here is also a list of potential properties you could check in Idealista: 


2194    https://www.idealista.com/inmueble/102454674/
2224    https://www.idealista.com/inmueble/103365277/
2826    https://www.idealista.com/inmueble/103362998/
2933     https://www.idealista.com/inmueble/90114863/
3247    https://www.idealista.com/inmueble/102075773/
Name: url, dtype: object

## Step 4: building a function to retrieve socio-economic data

In [57]:
#importing the database
socio_data = pd.read_excel('Madrid_distritos.xlsx')

  for idx, row in parser.parse():


In [58]:
socio_data

Unnamed: 0,Socio-economic and demographic indicators,Centro,Arganzuela,Retiro,Barrio de Salamanca,Chamartín,Tetuán,Chamberí,Fuencarral-El Pardo,Moncloa,...,Usera,Puente de Vallecas,Moratalaz,Ciudad Lineal,Hortaleza,Villaverde,Villa de Vallecas,Vicálvaro,San Blas,Barajas
0,Area (hm2),522.82,646.22,546.62,539.24,917.55,537.47,467.92,23783.84,4653.11,...,777.77,1496.86,610.32,1142.57,2762.61,2018.76,5146.72,3526.67,2229.24,4171.65
1,Density (hab./hm2),267.179909,237.231902,215.722074,270.198798,157.807204,297.694755,295.358181,10.445874,26.1668,...,183.532407,161.406544,152.074322,192.850329,71.812887,78.780043,22.830269,23.762927,72.320163,11.661093
2,Population: all,139687.0,153304.0,117918.0,145702.0,144796.0,160002.0,138204.0,248443.0,121757.0,...,142746.0,241603.0,92814.0,220345.0,198391.0,159038.0,117501.0,83804.0,161219.0,48646.0
3,Population: male,70770.0,71754.0,53458.0,64452.0,65245.0,72938.0,61149.0,116944.0,56357.0,...,67538.0,114542.0,42212.0,100759.0,94100.0,76131.0,56726.0,40580.0,76154.0,23332.0
4,Population: female,68917.0,81550.0,64460.0,81250.0,79551.0,87064.0,77055.0,131499.0,65400.0,...,75208.0,127061.0,50602.0,119586.0,104291.0,82907.0,60775.0,43224.0,85065.0,25314.0
5,Average age (years),44.0,45.4,47.6,46.2,45.8,44.1,46.3,43.8,44.9,...,42.6,43.3,48.3,46.0,42.8,42.1,40.0,40.9,43.8,43.3
6,School-age population (% of total),7.4,11.7,12.0,10.9,13.4,11.2,10.5,17.4,13.8,...,14.4,13.1,11.1,11.6,17.2,15.2,18.3,16.1,13.8,16.6
7,Population > 65 years (% of total),15.7,20.5,26.6,24.0,23.8,19.3,24.4,21.5,22.2,...,16.7,17.8,26.4,23.0,18.7,16.7,13.1,13.6,17.7,18.8
8,Spanish citizens (% of total),73.4,88.9,91.0,84.3,89.5,78.5,86.7,90.2,87.7,...,75.0,77.8,88.9,81.9,87.2,75.8,83.8,82.0,84.4,89.5
9,Non-Spanish citizens (% of total),26.6,10.6,8.8,15.6,10.3,19.9,12.7,8.9,11.2,...,23.6,19.7,10.6,15.1,11.1,20.9,13.9,12.6,14.1,10.0


In [55]:
socio_data[['Socio-economic and demographic indicators', user_dataframe['district'][0]]]

Unnamed: 0,Socio-economic and demographic indicators,Retiro
0,Area (hm2),546.62
1,Density (hab./hm2),215.722074
2,Population: all,117918.0
3,Population: male,53458.0
4,Population: female,64460.0
5,Average age (years),47.58
6,School-age population (% of total),11.99308
7,Population > 65 years (% of total),26.57
8,Spanish citizens (% of total),90.987805
9,Non-Spanish citizens (% of total),8.802727


In [95]:
def get_socio_eco_data(socio_eco_df, district):
    output = socio_eco_df[['Socio-economic and demographic indicators', district]]
    print("Sure, here is a table with socio-economic and demographic indicators for the district", district, ".")
    time.sleep(3)
    display(output)

In [59]:
#now applying the function on the user_dataframe['district']
get_socio_eco_data(socio_data, user_dataframe['district'][0])

Sure, here is a table with socio-economic and demographic indicators for the district  Retiro


Unnamed: 0,Socio-economic and demographic indicators,Retiro
0,Area (hm2),546.62
1,Density (hab./hm2),215.722074
2,Population: all,117918.0
3,Population: male,53458.0
4,Population: female,64460.0
5,Average age (years),47.6
6,School-age population (% of total),12.0
7,Population > 65 years (% of total),26.6
8,Spanish citizens (% of total),91.0
9,Non-Spanish citizens (% of total),8.8


## Wrapping up: introducing the rental price barometer prototype

In [96]:
#Combining all the above functions to produce the rental price barometer interface

def rental_price_barometer():
    #First printing a message welcoming the user and explaining the product
    print('Hello, Let me first introduce myself.')
    print("     ")
    time.sleep(2)
    print('I am your assistant for getting the most accurate information for rental prices in the city of Madrid.')
    print("     ")
    time.sleep(3)
    print('My aim is to give you a reference range of rental prices of properties featuring your selected criteria.')
    time.sleep(3)
    print("     ")
    print('Please note my estimations are based on listings published in Idealista, with data taken in December 2023.')
    time.sleep(5)
    print("     ")
    print('Ok. I will start by first asking you some questions on the type of property you would like to search for.')
    time.sleep(5)
    print("     ")
    #asking user input
    user_dataframe = get_user_input()
    #storing the district to be used later as input
    district_user = user_dataframe['district'][0]
    #cleaning data
    user_dataframe_clean = cleaning_user_data(user_dataframe)
    #making predictions
    user_dataframe_predictions = prediction_ranges(user_dataframe_clean)
    print("...processing your request...     ")
    time.sleep(3)
    print('Ok, my model predicts that the rental price in the district of ', district_user, 'should be between ', user_dataframe_predictions['pred_range'][0])
    time.sleep(3)
    #now using the function on the data df and the user requests
    print("     ")
    get_matching(data, user_dataframe)
    #finally, asking if the user would want to print socio-economic data
    print("     ")
    answer_socio = input('Finally, would you like to know socio-economic and demographic information for your requested district? (1 for yes, 0 for no)')
    while float(answer_socio) not in [0, 1]:
        raise ValueError("Invalid input. Please enter 1 for yes or 0 for no.")
    if float(answer_socio) == 1:
        get_socio_eco_data(socio_data, district_user)
        print('Thanks for using our barometer and feel free to make new requests. Best of luck in your rental search:)')
    else:
        print('Sure. Thanks for using our barometer and feel free to make new requests. Best of luck in your rental search:)')


In [97]:
rental_price_barometer()

Hello, Let me first introduce myself.
     
I am your assistant for getting the most accurate information for rental prices in the city of Madrid.
     
My aim is to give you a reference range of rental prices of properties featuring your selected criteria.
     
Please note my estimations are based on listings published in Idealista, with data taken in December 2023.
     
Ok. I will start by first asking you some questions on the type of property you would like to search for.
     
In which district are you looking? (Please select one of the following options) (Centro, Barrio de Salamanca, Chamberí, Chamartín, Tetuán, Retiro, Moncloa, Arganzuela, Ciudad Lineal, Latina, Carabanchel, Puente de Vallecas, Usera, Moratalaz)
Barrio de Salamanca
What type of property are you looking for? (Please select one of the following options) (flat, penthouse, studio, duplex, chalet)
flat
How big (in mts2) is your desired home? (Enter a number from 20 to 350)
120
How many rooms do you need? (Enter a n

1642    https://www.idealista.com/inmueble/103382812/
3397    https://www.idealista.com/inmueble/103271206/
3465    https://www.idealista.com/inmueble/103364149/
3488     https://www.idealista.com/inmueble/96258138/
3595    https://www.idealista.com/inmueble/102956175/
Name: url, dtype: object

     
Finally, would you like to know socio-economic and demographic information for your requested district? (1 for yes, 0 for no)1
Sure, here is a table with socio-economic and demographic indicators for the district Barrio de Salamanca .


Unnamed: 0,Socio-economic and demographic indicators,Barrio de Salamanca
0,Area (hm2),539.24
1,Density (hab./hm2),270.198798
2,Population: all,145702.0
3,Population: male,64452.0
4,Population: female,81250.0
5,Average age (years),46.2
6,School-age population (% of total),10.9
7,Population > 65 years (% of total),24.0
8,Spanish citizens (% of total),84.3
9,Non-Spanish citizens (% of total),15.6


Thanks for using our barometer and feel free to make new requests. Best of luck in your rental search:)


## Final remarks, caveats and further analysis.

- This projected aimed to construct a price rental barometer to help both tenants and landlords get access to aggregated information on prices of the rental market in the city of Madrid.
- We successfully constructed a regression model using the random forest algorithm, leveraging the Idealista API for data extraction. The model, trained and tested on data from Idealista, demonstrated robust predictive capabilities with an accuracy score (R2) on the test set of 0.8. This high accuracy attests to the effectiveness of our approach in capturing rental price variations based on property features.
- The model's deployment in the form of a user-friendly rental barometer interface marked a significant step forward. This interface not only enables users to input property details for personalized price predictions but also serves as a reference point for current market trends. By providing statistics on rental prices from Idealista listings and suggesting relevant property links, the barometer empowers users in their property evaluation process.
- In addition to predictive pricing, our rental barometer offers users the ability to access socio-economic information about the desired district, providing a holistic view of the neighborhood. This integration enhances the user experience by considering broader factors beyond just rental prices.
- Despite the project's success, we acknowledge several limitations in the current version. 
    * The model's training data, limited to properties listed as of December 11, 2023, poses a constraint on real-time prediction accuracy. 
    * Furthermore, the inability to make real-time recommendations due to constraints in accessing live data from the Idealista API is a recognized challenge.
    * The user interface, while functional, requires refinement to enhance user-friendliness. 
- In conclusion, our rental price barometer prototype offers a valuable tool for navigating the Madrid rental market. While acknowledging its current limitations, we view it as a stepping stone for further development.