In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

![](https://drive.google.com/uc?export=view&id=1Ztchy5vwth2oX0PSRrQXXLjzTGudH9Ph)

# **PREDICTING CUSTOMER EXPERIENCE USING AIRBnB DATA**

Airbnb has seen a fleeting development since its beginning in 2008 with the number of rentals recorded on its site developing exponentially each year. Airbnb hsa turned out to be a game changer when it comes to the traditional travel and hospitality industry as more and more travelers are looking true value for their money.

This analysis is based on 13,578 unique listing in Lisbon, while the minimum rental in Lisbon stands at 10 dollar in neighborhoods like Campolide,Santa Maria Maior, Santa Clara, SÃ£o Vicente, Avenidas Novas, Areeiro, Carnide and Alvalade. 

The price for a listing ranges from 10 dollars per night to a whopping 4K dollars per night in Benfica.

The aim of this analysis is to identify key indicators that can help us to improve the customer and property experience, ultimatley helping AirBnB Lisbon more business and a positive customer experience.

# **Getting Data**

While there are multiple ways to get data into Google Colab environment, however, my preferred way is to get the google drive mounted which saves a lot of time for me. This step is also helpful as within the Google Colab environment once the sessio is terminated your uploaded files etc. will be lost.

In [None]:
import pandas as pd

In [None]:
data=pd.read_csv('/kaggle/input/airbnb-analysis-lisbon/airbnb_lisbon_1480_2017-07-27.csv')

In [None]:
data.head()

# **Firing up PyCaret Environment For Analysis🚀**

PyCaret is an open-source, low-code machine learning library in Python that aims to reduce the cycle time from hypothesis to insights. It is well suited for seasoned data scientists who want to increase the productivity of their ML experiments by using PyCaret in their workflows or for citizen data scientists and those new to data science with little or no background in coding. PyCaret allows you to go from preparing your data to deploying your model within seconds using your choice of notebook environment.

In [None]:
!pip install pycaret[full]

# **Step 1: Importing a Module**

Depending on the type of experiment you want to perform, one of the six available modules currently supported must be imported in your python environment. Importing a module prepares an environment for specific task. For this analysis we will be using Regression, NLP and probably Classification module to perform series of analysis to make sense of the data and to make predictions.

In [None]:
from pycaret.regression import *

In [None]:
reg1 = setup(data = data, target = 'overall_satisfaction', session_id=123,
                  remove_multicollinearity = True, multicollinearity_threshold = 0.95, 
                  ignore_features = ['bathrooms','name','neighborhood','room_id','host_id','country','city','borough','minstay','last_modified','latitude','longitude','location'],
                  log_experiment = True, experiment_name = 'hotel1')

# **Comparing Models**

Notice that we have used n_select parameter within compare_models. while the compare_models option by default returns the best performing model (you can sort the model based on the metricof your choice, by default it picks up R2). However you can use n_select parameter to return top N models. In this routine we will compare_models Top 3 of our best performing models and will be further sorting it using RMSLE (Root Mean Squared Logarithmic Error) metric the lower the better.

In [None]:
top3 = compare_models(exclude = ['ransac'], n_select = 3, sort='RMSLE')

# **Let's see if we can further optmize the Gradient Boosting Regressor Model?**

In [None]:
gbr1 = create_model('gbr', fold = 10)

In [None]:
tuned_gbr=tune_model( gbr1 ,optimize='R2')

# **Plotting the Tuned Model**

Before model finalization, I will use the plot_model() function can be used to analyze the performance across different aspects such as Residuals Plot, Prediction Error, Feature Importance etc. This function takes a trained model object and returns a plot based on the test / hold-out set.

In [None]:
plot_model(tuned_gbr)

In [None]:
plot_model(tuned_gbr, plot = 'error')

In [None]:
plot_model(tuned_gbr, plot='feature')

In [None]:
plot_model(tuned_gbr, plot='residuals_interactive')

# **Finalizig the Model**

In [None]:
final_gbr = finalize_model(tuned_gbr)

In [None]:
print(final_gbr)

In [None]:
predict_model(final_gbr);

In [None]:
predictions = predict_model(final_gbr, data = data)

In [None]:
predictions.head(2000)

In [None]:
prediction = pd.DataFrame(predictions, columns=['Id','Label','room_type','overall_satisfaction','bedrooms']).to_csv('prediction.csv')

In [None]:
save_model(final_gbr,'Final GBR Model 7Jun21')