![AIRBNB](https://www.stevenridercpa.au/wp-content/uploads/2022/09/airbnb-tax.jpeg)

# Airbnb - Price Prediction
-------

## Peroblem Statement

A dataset containing information of accommodations published in AirBnB with their respective prices is presented. The size of the train dataset is approximately 1.5 Gb, and 0.5 Gb for the test dataset. This has 84 predictor variables that can be used as they see fit.

The objective is to assign the correct price to the listed accommodations. 

In addition to the dataset, you are provided with this notebook containing the data loading script and a baseline model corresponding to a feed forward architecture.

------

## Guidelines

### Participation in Kaggle Competition

The goal of this section is to participate in the Kaggle competition and achieve a Mean Absolute Error of less than 70 points. [->Link to the competition<-](https://www.kaggle.com/t/69c648e3aa214d1f812bf2314c8d4ffa).

### Use of Grid Search (or equivalent)

To meet the requirement for optimal model searching, a comprehensive and methodical grid search must be conducted. We strongly recommend [Weights and Biases](https://wandb.ai/site).

### You must also research and implement the following techniques

- [Batch Normalization](https://machinelearningmastery.com/how-to-accelerate-learning-of-deep-neural-networks-with-batch-normalization/)
- [Gradient Normalization and/or Gradient Clipping](https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/)


-------

## Setup
### Imports

In [2]:
import pandas as pd
import numpy as np
# import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split

### Seeds

In [None]:
np.random.seed(117)
# tf.random.set_seed(117)

## 2. Carga de datos

In [3]:
TRAIN_PATH = './data/public_train_data.csv'
df = pd.read_csv(TRAIN_PATH)

##  3. Análisis exploratorio de datos
### 3.1 Dimensiones

In [4]:
df.shape

(326287, 85)

### Obtain information about columns and data types

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 326287 entries, 0 to 326286
Data columns (total 85 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   id                              326287 non-null  int64  
 1   Last Scraped                    326286 non-null  object 
 2   Name                            326018 non-null  object 
 3   Summary                         315651 non-null  object 
 4   Space                           228792 non-null  object 
 5   Description                     326188 non-null  object 
 6   Experiences Offered             326287 non-null  object 
 7   Neighborhood Overview           192513 non-null  object 
 8   Notes                           130729 non-null  object 
 9   Transit                         200649 non-null  object 
 10  Access                          177108 non-null  object 
 11  Interaction                     169193 non-null  object 
 12  House Rules     

In [19]:
# Select only numeric columns
df_numeric = df.select_dtypes(include=['float64', 'int64'])

# Calculate the correlation matrix
correlation_matrix = df_numeric.corr()

# Get correlation of all features with 'price'
price_correlation = correlation_matrix['Price'].sort_values(ascending=False)

# Filter out the features with a correlation above a certain threshold, for example 0.3
important_features = price_correlation[abs(price_correlation) >= 0.1]

# Room Type, Smart Location	

In [20]:
print(important_features)

Price               1.000000
Cleaning Fee        0.739040
Security Deposit    0.394896
Accommodates        0.373885
Bedrooms            0.361354
Beds                0.293338
Extra People        0.274710
Bathrooms           0.260185
Guests Included     0.192998
Square Feet         0.111586
Name: Price, dtype: float64


### 3.3 Visualizar las primeras filas del dataset

In [14]:
pd.set_option('display.max_columns', None)

df.head(10)

Unnamed: 0,id,Last Scraped,Name,Summary,Space,Description,Experiences Offered,Neighborhood Overview,Notes,Transit,Access,Interaction,House Rules,Thumbnail Url,Medium Url,Picture Url,XL Picture Url,Host ID,Host URL,Host Name,Host Since,Host Location,Host About,Host Response Time,Host Response Rate,Host Acceptance Rate,Host Thumbnail Url,Host Picture Url,Host Neighbourhood,Host Listings Count,Host Total Listings Count,Host Verifications,Street,Neighbourhood,Neighbourhood Cleansed,Neighbourhood Group Cleansed,City,State,Zipcode,Market,Smart Location,Country Code,Country,Latitude,Longitude,Property Type,Room Type,Accommodates,Bathrooms,Bedrooms,Beds,Bed Type,Amenities,Square Feet,Security Deposit,Cleaning Fee,Guests Included,Extra People,Minimum Nights,Maximum Nights,Calendar Updated,Has Availability,Availability 30,Availability 60,Availability 90,Availability 365,Calendar last Scraped,Number of Reviews,First Review,Last Review,Review Scores Rating,Review Scores Accuracy,Review Scores Cleanliness,Review Scores Checkin,Review Scores Communication,Review Scores Location,Review Scores Value,License,Jurisdiction Names,Cancellation Policy,Calculated host listings count,Reviews per Month,Geolocation,Features,Price
0,0,2017-05-12,Grand Loft in the heart of historic Antwerp,Best location for visiting Antwerp!! Beautiful...,Welcome in Antwerp!! The loft is situated on t...,Best location for visiting Antwerp!! Beautiful...,none,,,,The Loft will be only accessible for you and y...,Please feel free to ask me anything and I will...,- It would be greatly appreciated to be respec...,https://a0.muscache.com/im/pictures/ddd94a2b-b...,https://a0.muscache.com/im/pictures/ddd94a2b-b...,https://public.opendatasoft.com/api/explore/v2...,https://a0.muscache.com/im/pictures/ddd94a2b-b...,50969699,https://www.airbnb.com/users/show/50969699,Petra,2015-12-10,"Antwerp, Flanders, Belgium","Hi, I'm Petra , a Belgian interior designer. ...",within an hour,100.0,,https://a0.muscache.com/im/pictures/5271a897-2...,https://a0.muscache.com/im/pictures/5271a897-2...,,2.0,2.0,"email,phone,reviews","Antwerp, Flanders, Belgium",,Historisch Centrum,,Antwerp,Flanders,,Antwerp,"Antwerp, Belgium",BE,Belgium,51.219388,4.403444,Apartment,Entire home/apt,6.0,1.0,2.0,4.0,Real Bed,"TV,Cable TV,Wireless Internet,Kitchen,Elevator...",,200.0,50.0,6.0,25.0,2.0,51.0,6 weeks ago,,15.0,36.0,54.0,317.0,2017-05-12,9.0,2017-01-29,2017-04-26,98.0,9.0,9.0,9.0,9.0,10.0,9.0,,,strict,2.0,2.6,"51.21938762207894, 4.4034442505151885","Host Has Profile Pic,Instant Bookable",159.0
1,1,2017-05-03,"CHARMING, CLEAN & COZY BUNGALOW!",Very centrally located and less than 15 min fr...,"Well lit, private entrance with small patio.",Very centrally located and less than 15 min fr...,none,"Quiet. Pretty tree lined streets, safe area.",Has dining table and high back desk chair.,"Uber, bus line and metro link is less than 5 m...",Front parking,,"No smoking in unit, quiet hours after 10pm, no...",https://a0.muscache.com/im/pictures/f025506b-d...,https://a0.muscache.com/im/pictures/f025506b-d...,https://public.opendatasoft.com/api/explore/v2...,https://a0.muscache.com/im/pictures/f025506b-d...,50132213,https://www.airbnb.com/users/show/50132213,Laurie,2015-11-29,"Los Angeles, California, United States",,within a few hours,100.0,,https://a0.muscache.com/im/pictures/58bde558-c...,https://a0.muscache.com/im/pictures/58bde558-c...,Valley Glen,1.0,1.0,phone,"Valley Glen, Los Angeles, CA 91401, United States",Valley Glen,Valley Glen,,Los Angeles,CA,91401,Los Angeles,"Los Angeles, CA",US,United States,34.189269,-118.419935,House,Private room,2.0,1.0,1.0,1.0,Real Bed,"TV,Internet,Wireless Internet,Air conditioning...",,100.0,50.0,1.0,0.0,1.0,7.0,4 weeks ago,,29.0,59.0,89.0,89.0,2017-05-02,0.0,,,,,,,,,,,"City of Los Angeles, CA",flexible,1.0,,"34.1892692286356, -118.41993491931177","Host Has Profile Pic,Is Location Exact",49.0
2,2,2017-05-09,la casa di maurizio,"nice apartment with view to via veneto , very ...",,"nice apartment with view to via veneto , very ...",none,,,,,,,https://a0.muscache.com/im/pictures/31df3f0d-3...,https://a0.muscache.com/im/pictures/31df3f0d-3...,https://public.opendatasoft.com/api/explore/v2...,https://a0.muscache.com/im/pictures/31df3f0d-3...,47460762,https://www.airbnb.com/users/show/47460762,Stefano,2015-10-26,"Rome, Lazio, Italy",,,,,https://a0.muscache.com/im/pictures/6272bc04-d...,https://a0.muscache.com/im/pictures/6272bc04-d...,Ludovisi,1.0,1.0,"email,phone","Ludovisi, Roma, Lazio 00187, Italy",Ludovisi,I Centro Storico,,Roma,Lazio,00187,Rome,"Roma, Italy",IT,Italy,41.908596,12.493518,Apartment,Private room,2.0,1.0,1.0,1.0,Real Bed,"TV,Wireless Internet,Air conditioning,Breakfas...",,,,1.0,0.0,1.0,1125.0,18 months ago,,30.0,60.0,90.0,365.0,2017-05-08,0.0,,,,,,,,,,,,flexible_new,1.0,,"41.90859623057272, 12.493518028459327","Host Has Profile Pic,Is Location Exact",75.0
3,3,2017-05-09,Camera a due passi dal Colosseo NEW,In un quartiere tranquillo un'accogliente came...,"L'appartamento è nel centro storico di Roma, i...",In un quartiere tranquillo un'accogliente came...,none,,,,Gli ospiti avranno a disposizione la stanza da...,,In casa non si fuma e non sono ammessi animali...,https://a0.muscache.com/im/pictures/79298437/4...,https://a0.muscache.com/im/pictures/79298437/4...,https://public.opendatasoft.com/api/explore/v2...,https://a0.muscache.com/im/pictures/79298437/4...,33016967,https://www.airbnb.com/users/show/33016967,Attilio,2015-05-10,"Rome, Lazio, Italy",,within an hour,100.0,,https://a0.muscache.com/im/pictures/44ac861d-7...,https://a0.muscache.com/im/pictures/44ac861d-7...,Monti,1.0,1.0,"email,phone,reviews","Monti, Roma, Lazio 00184, Italy",Monti,I Centro Storico,,Roma,Lazio,00184,Rome,"Roma, Italy",IT,Italy,41.883985,12.499185,Apartment,Private room,2.0,1.0,1.0,1.0,Real Bed,"Breakfast,Elevator in building,Heating,Washer,...",,,10.0,1.0,0.0,2.0,1125.0,a week ago,,5.0,15.0,45.0,46.0,2017-05-08,55.0,2015-05-17,2017-04-29,98.0,10.0,10.0,10.0,10.0,10.0,10.0,,,moderate_new,1.0,2.28,"41.883984550594604, 12.499185366336528","Host Is Superhost,Host Has Profile Pic,Is Loca...",55.0
4,4,2017-05-04,Perfectly Located Apartment in Downtown Manhattan,"The apartment is bright, clean and simple in a...",No need to waste money on bogus frills. The ha...,"The apartment is bright, clean and simple in a...",none,Awesome neighborhood! The apartment is perfect...,"This apartment is a walk up: free exercise, th...",The apartment is conveniently located near man...,24/7 Access. Private Apartment!,Here to help and answer any questions!,,,,https://public.opendatasoft.com/api/explore/v2...,,86382914,https://www.airbnb.com/users/show/86382914,Bianca,2016-07-27,"New York, New York, United States","Hi, I'm a 27 year old teacher and traveler! Fe...",within an hour,70.0,,https://a0.muscache.com/im/pictures/7a052645-e...,https://a0.muscache.com/im/pictures/7a052645-e...,Soho,2.0,2.0,"phone,reviews","Soho, New York, NY 10013, United States",Soho,SoHo,Manhattan,New York,NY,10013,New York,"New York, NY",US,United States,40.720983,-73.997836,Apartment,Entire home/apt,2.0,1.0,1.0,1.0,Real Bed,"Internet,Wireless Internet,Air conditioning,Ki...",,250.0,95.0,2.0,20.0,1.0,1125.0,today,,0.0,5.0,15.0,234.0,2017-05-04,28.0,2016-08-14,2017-05-01,86.0,9.0,9.0,9.0,9.0,10.0,9.0,,,strict,1.0,3.18,"40.72098312129154, -73.99783555096808","Host Has Profile Pic,Is Location Exact,Instant...",177.0
5,5,2017-04-05,Adorable Studio à deux pas de la tour Eiffel,"Mon logement est proche de Eiffel Tower, Inval...",,"Mon logement est proche de Eiffel Tower, Inval...",none,,,,,,- Pas de bruit aprés une certaine heure (23h3...,https://a0.muscache.com/im/pictures/7173a3d1-e...,https://a0.muscache.com/im/pictures/7173a3d1-e...,https://public.opendatasoft.com/api/explore/v2...,https://a0.muscache.com/im/pictures/7173a3d1-e...,87211695,https://www.airbnb.com/users/show/87211695,Adnane,2016-08-01,"Paris, Île-de-France, France",,within an hour,100.0,,https://a0.muscache.com/im/pictures/0a011da9-2...,https://a0.muscache.com/im/pictures/0a011da9-2...,Tour Eiffel - Champ de Mars,1.0,1.0,"email,phone,reviews","Tour Eiffel - Champ de Mars, Paris, Île-de-Fra...",Tour Eiffel - Champ de Mars,Palais-Bourbon,,Paris,Île-de-France,75007,Paris,"Paris, France",FR,France,48.859354,2.308245,Apartment,Entire home/apt,2.0,1.5,0.0,1.0,Real Bed,"TV,Wireless Internet,Air conditioning,Kitchen,...",,,,1.0,20.0,1.0,15.0,3 weeks ago,,15.0,35.0,59.0,324.0,2017-04-05,4.0,2017-02-24,2017-04-05,67.0,6.0,5.0,7.0,7.0,9.0,6.0,,Paris,strict,1.0,2.93,"48.85935433452363, 2.308245117928801","Host Has Profile Pic,Is Location Exact,Instant...",47.0
6,6,2017-04-04,Charming Studio Canal Saint Martin,,I am renting my recently refurbished studio (d...,I am renting my recently refurbished studio (d...,none,,,,,,,https://a0.muscache.com/im/pictures/13734840/5...,https://a0.muscache.com/im/pictures/13734840/5...,https://public.opendatasoft.com/api/explore/v2...,https://a0.muscache.com/im/pictures/13734840/5...,4974964,https://www.airbnb.com/users/show/4974964,Pierre,2013-02-05,"Paris, Île-de-France, France","Curieux de nature, adepte de voyage sac à dos...",within an hour,100.0,,https://a0.muscache.com/im/users/4974964/profi...,https://a0.muscache.com/im/users/4974964/profi...,Buttes-Chaumont - Belleville,2.0,2.0,"email,phone,reviews,jumio","Buttes-Chaumont - Belleville, Paris, Île-de-Fr...",Buttes-Chaumont - Belleville,Buttes-Chaumont,,Paris,Île-de-France,75010,Paris,"Paris, France",FR,France,48.879162,2.371675,Apartment,Entire home/apt,2.0,,1.0,1.0,Pull-out Sofa,"TV,Cable TV,Internet,Wireless Internet,Kitchen...",,350.0,20.0,1.0,0.0,4.0,1125.0,3 days ago,,3.0,12.0,21.0,276.0,2017-04-04,79.0,2013-03-24,2017-03-12,93.0,10.0,9.0,10.0,10.0,9.0,9.0,,Paris,moderate,1.0,1.61,"48.87916169458883, 2.371675497225248","Host Has Profile Pic,Host Identity Verified,Is...",70.0
7,7,2017-05-05,Summer in the heart of Harlem,Live in a classic uptown apartment with splend...,Gorgeous sunlight and sweeping views looking w...,Live in a classic uptown apartment with splend...,none,"This is the 'undiscovered' part of Harlem, wit...",,Three busses stop directly outside the apartme...,"A gateway to Central Harlem, the space is conv...",,,https://a0.muscache.com/im/pictures/aaeb41a5-b...,https://a0.muscache.com/im/pictures/aaeb41a5-b...,https://public.opendatasoft.com/api/explore/v2...,https://a0.muscache.com/im/pictures/aaeb41a5-b...,12279426,https://www.airbnb.com/users/show/12279426,Dan,2014-02-16,"New York, New York, United States",,,,,https://a0.muscache.com/im/users/12279426/prof...,https://a0.muscache.com/im/users/12279426/prof...,Harlem,1.0,1.0,"email,phone,reviews,jumio","Harlem, New York, NY 10037, United States",Harlem,Harlem,Manhattan,New York,NY,10037,New York,"New York, NY",US,United States,40.816603,-73.93702,Apartment,Entire home/apt,3.0,1.0,1.0,2.0,Real Bed,"Wireless Internet,Kitchen,Smoking allowed,Pets...",,200.0,75.0,2.0,15.0,25.0,1125.0,13 months ago,,0.0,0.0,0.0,0.0,2017-05-05,0.0,,,,,,,,,,,,flexible,1.0,,"40.816602667155735, -73.93702018967657","Host Has Profile Pic,Host Identity Verified,Is...",57.0
8,8,2017-04-03,East Melbourne Abundance awaits,"As you enter the large living room, you will n...",This is a spacious two bedroom apartment is in...,"As you enter the large living room, you will n...",none,Enjoy staying in a PREMIUM LOCATION. This apar...,Additional Notes The listed price is for up to...,We have easy access to a range of transport: A...,"PREMIUM LOCATION - easy tram, train and bus ac...","I, and my close friend George live extremely c...",This is a newly renovated apartment so please ...,,,https://public.opendatasoft.com/api/explore/v2...,,24164998,https://www.airbnb.com/users/show/24164998,Luci,2014-11-25,"Victoria, Australia","We are a family of four, very passionate about...",within a few hours,100.0,,https://a0.muscache.com/im/pictures/b68d6459-9...,https://a0.muscache.com/im/pictures/b68d6459-9...,East Melbourne,7.0,7.0,"email,phone,reviews,jumio","East Melbourne, East Melbourne, VIC 3002, Aust...",East Melbourne,Yarra,,East Melbourne,VIC,3002,Melbourne,"East Melbourne, Australia",AU,Australia,-37.809188,144.986107,Apartment,Entire home/apt,5.0,1.0,2.0,3.0,Real Bed,"TV,Internet,Wireless Internet,Air conditioning...",,200.0,60.0,4.0,50.0,4.0,1125.0,7 weeks ago,,24.0,54.0,84.0,359.0,2017-04-03,0.0,,,,,,,,,,,,strict,7.0,,"-37.8091875093531, 144.98610729053317","Host Has Profile Pic,Host Identity Verified,Is...",275.0
9,9,2017-03-05,Double Room in Streatham,Light and bright double room in a quiet clean ...,Bright and sunny double room in a friendly qu...,Light and bright double room in a quiet clean ...,none,"Lots of cafes, restaurants, shops and green sp...",,Located equal distance between Streatham and S...,Kitchen and bathroom available,Host available for advice and information on l...,As this is my family home I would hope that gu...,https://a0.muscache.com/im/pictures/9a9759ed-8...,https://a0.muscache.com/im/pictures/9a9759ed-8...,https://public.opendatasoft.com/api/explore/v2...,https://a0.muscache.com/im/pictures/9a9759ed-8...,17922605,https://www.airbnb.com/users/show/17922605,Cathy,2014-07-10,GB,,,,,https://a0.muscache.com/im/users/17922605/prof...,https://a0.muscache.com/im/users/17922605/prof...,Streatham,1.0,1.0,"email,phone,manual_online,reviews,jumio","Harborough Road, London, SW16, United Kingdom",Streatham,Lambeth,,London,,SW16,London,"London, United Kingdom",GB,United Kingdom,51.430897,-0.119824,Apartment,Private room,1.0,1.0,1.0,1.0,Real Bed,"Internet,Pets live on this property,Cat(s),Hea...",,,,1.0,0.0,2.0,4.0,10 months ago,,0.0,0.0,0.0,0.0,2017-03-04,0.0,,,,,,,,,,,,flexible,1.0,,"51.430896573331204, -0.11982396382815948","Host Has Profile Pic,Host Identity Verified,Is...",30.0


### 3.4 Estadísticas descriptivas

In [None]:
df.describe()

In [None]:
df.columns

## 4. Modelo Baseline

### 4.1 Seleccionar características relevantes

In [None]:
features = ['Bathrooms', 'Bedrooms']  # Reemplaza con las características relevantes
target = 'Price' 
df = df[[*features, target]]
df.dropna(inplace=True)

In [None]:
X = df[features]
y = df[target]

### 4.2 Dividir los datos en conjuntos de entrenamiento y prueba

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### 4.3 Definir el modelo

In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense


model = Sequential([
    Dense(32, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(1, activation='relu')  # Capa de salida para la predicción del precio
])

model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])

### 4.4 Entrenar

In [None]:
history = model.fit(X_train, y_train, epochs=5, batch_size=128, validation_split=0.2)

### 4.5 Evaluar en Test

In [None]:
loss, mae = model.evaluate(X_test, y_test)
print(f'Test Loss: {loss}, Test MAE: {mae}')

## 5 Generación de salida para competencia en Kaggle

In [None]:
file_path2 = './airbnb_data/private_data_to_predict.csv'
data_for_kaggle = pd.read_csv(file_path2)

In [None]:
kaggle_results = model.predict(data_for_kaggle[features])
test_ids = data_for_kaggle['id']
test_ids = np.array(test_ids).reshape(-1,1)
output = np.stack((test_ids, kaggle_results), axis=-1)
output = output.reshape([-1, 2])
df = pd.DataFrame(output)
df.columns = ['id','expected']  
df['expected'] = df['expected'].fillna(0)   
df.to_csv("output_to_submit.csv", index = False, index_label = False)


## 6 Ejemplo de uso de Weights and Biases

In [None]:
from tensorflow.keras.layers import Dropout

def get_model(neurons, optimizer, dropout):
    layers = []
    input_shape = (X_train.shape[1],)
    for n in neurons:
        layers.append(Dense(n, activation = "relu", input_shape = input_shape))
        layers.append(Dropout(dropout))
        input_shape = (n,)
        
    model = Sequential(layers)
    model.compile(optimizer = optimizer, loss='mean_squared_error', metrics=['mae'])
    return model

In [None]:
# Import the W&B Python Library and log into W&B
import wandb

wandb.login()

#Creamos un proyecto en WandB a través de su interfaz
project = "obligatorio_dl"
entity = "franzmayr"

In [None]:
import traceback

def run_train():
    try:       
        with wandb.init(config = None, project = project, entity=entity):     
            # initialize model
            config = wandb.config
            print(config)
            model= get_model(config.neurons, config.optimizer, config.dropout)
            tf.keras.backend.clear_session()
            wandb_callback = wandb.keras.WandbCallback()
            model.fit(X_train, y_train, epochs=5, batch_size=128, validation_split=0.2, callbacks=[wandb_callback], max_queue_size=3, workers=2)

    except Exception as e:
        # exit gracefully, so wandb logs the problem
        print(traceback.print_exc(), file=sys.stderr)
        exit(1)

In [None]:
import pprint

sweep_config = {
'name': 'sweep_example',
'method': 'grid',
'metric': {
    'name': 'val_loss',
    'goal': 'minimize'   
},
'parameters': {
    'dropout':{'value': 0.1},
    'neurons':{
        'values': [[32,2],[64,32,2]]
        },
    'optimizer': {
        'values': ['adam', 'sgd']
        }
}
}

pprint.pprint(sweep_config)

In [None]:
sweep_id = wandb.sweep(sweep_config, project = project, entity = entity)

In [None]:
wandb.agent(sweep_id, function = run_train, count=10, project = project, entity = entity)