### AIRBNB NASHVILLE DATA ANALYSIS AND VISUALIZATION STEP BY STEP ###

The objective of this notebook is to analyze how is the rental real estate market inside the Airbnb app

#### Datalink : http://insideairbnb.com/get-the-data

Detailed Listings data: <br>
http://data.insideairbnb.com/united-states/tn/nashville/2022-03-20/data/listings.csv.gz

Detailed Calendar Data: <br>
http://data.insideairbnb.com/united-states/tn/nashville/2022-03-20/data/calendar.csv.gz 

Detailed Review Data: <br>
http://data.insideairbnb.com/united-states/tn/nashville/2022-03-20/data/reviews.csv.gz 

Summary information and metrics for listings in Nashville (good for visualisations): <br>
http://data.insideairbnb.com/united-states/tn/nashville/2022-03-20/visualisations/listings.csv

Summary Review data and Listing ID (to facilitate time based analytics and visualisations linked to a listing): <br>
http://data.insideairbnb.com/united-states/tn/nashville/2022-03-20/visualisations/reviews.csv 

Neighbourhood list for geo filter. Sourced from city or open source GIS files: <br>
http://data.insideairbnb.com/united-states/tn/nashville/2022-03-20/visualisations/neighbourhoods.csv 

GeoJSON file of neighbourhoods of the city: <br>
http://data.insideairbnb.com/united-states/tn/nashville/2022-03-20/visualisations/neighbourhoods.geojson 


#### IMPORTING LIBRARIES AND PACKAGES

In [2]:
import datapane as dp
import numpy as np
import pandas as pd
import plotly.express as px
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix

px.defaults.width = 1200
px.defaults.height = 700

#### CHECKING DOWNLOADS FILES IN DIRECTORY 

In [3]:
print(os.listdir("/home/dm/Desktop/airbnb/input"))

['reviews.csv', 'neighbourhoods.csv', 'listings_details.csv', 'calendar.csv', 'neighbourhoods.geojson', 'review2.csv', 'polarity_values_reviews.csv']


#### LINKS VARIABLES FILES AND CONVERT THEM INTO DATAFRAME USING PANDAS LIBRARY


In [4]:
listings_details = pd.read_csv("/home/dm/Desktop/airbnb/input/listings_details.csv")
listings_details = pd.DataFrame(listings_details)

calendar = pd.read_csv("/home/dm/Desktop/airbnb/input/calendar.csv")
calendar = pd.DataFrame(calendar)

neighbourhoods = pd.read_csv("/home/dm/Desktop/airbnb/input/neighbourhoods.csv")
neighbourhoods = pd.DataFrame(neighbourhoods)

reviews = pd.read_csv("/home/dm/Desktop/airbnb/input/reviews.csv")
reviews = pd.DataFrame(reviews)

review2 = pd.read_csv("/home/dm/Desktop/airbnb/input/review2.csv")
review2 = pd.DataFrame(reviews)



#### CHECKING COLUMNS 

In [5]:
listings_details.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'description',
       'neighborhood_overview', 'picture_url', 'host_id', 'host_url',
       'host_name', 'host_since', 'host_location', 'host_about',
       'host_response_time', 'host_response_rate', 'host_acceptance_rate',
       'host_is_superhost', 'host_thumbnail_url', 'host_picture_url',
       'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude',
       'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms',
       'bathrooms_text', 'bedrooms', 'beds', 'amenities', 'price',
       'minimum_nights', 'maximum_nights', 'minimum_minimum_nights',
       'maximum_minimum_nights', 'minimum_maximum_nights',
       'maximum_maximum_nights', 'minimum_nights_avg_ntm',
       'maximum_nights_avg_ntm', 'calendar_upd

#### BEGIN THE TREATMENT OF NULL VALUES

In [6]:
listings_details.isnull().sum().sum() 

46794

In [7]:
# Null values in listings_details

listings_details.isnull().sum()[listings_details.isnull().sum() > 0] 


description                       46
neighborhood_overview           1911
host_location                     10
host_about                      2091
host_response_time               619
host_response_rate               619
host_acceptance_rate             497
host_neighbourhood              2373
neighbourhood                   1911
neighbourhood_group_cleansed    6799
bathrooms                       6799
bathrooms_text                     4
bedrooms                         321
beds                              41
minimum_minimum_nights             1
maximum_minimum_nights             1
minimum_maximum_nights             1
maximum_maximum_nights             1
minimum_nights_avg_ntm             1
maximum_nights_avg_ntm             1
calendar_updated                6799
first_review                     908
last_review                      908
review_scores_rating             908
review_scores_accuracy           919
review_scores_cleanliness        919
review_scores_checkin            920
r

In [8]:
# Percentage of null values in listings_details

listings_details.isnull().sum()[listings_details.isnull().sum() > 0]/listings_details.shape[0]*100 

description                       0.676570
neighborhood_overview            28.107075
host_location                     0.147080
host_about                       30.754523
host_response_time                9.104280
host_response_rate                9.104280
host_acceptance_rate              7.309899
host_neighbourhood               34.902191
neighbourhood                    28.107075
neighbourhood_group_cleansed    100.000000
bathrooms                       100.000000
bathrooms_text                    0.058832
bedrooms                          4.721283
beds                              0.603030
minimum_minimum_nights            0.014708
maximum_minimum_nights            0.014708
minimum_maximum_nights            0.014708
maximum_maximum_nights            0.014708
minimum_nights_avg_ntm            0.014708
maximum_nights_avg_ntm            0.014708
calendar_updated                100.000000
first_review                     13.354905
last_review                      13.354905
review_scor

##### REMOVING NULL VALUES PLUS THAN 80% OF THE DATA, SAVE THIS FUNTION FOR A FUTURE, U WILL NEED IN NEXT DATA ANALYSIS JOBS :)

In [9]:

def eliminar_columnas(df):
    """
    Remove columns in dataframe whit null values more than 80%, u can change the value df.shape[0]*0.8 to change the percentage.
    remember inplace=True to aply the changes in the original dataframe.
    """

    for col in df.columns:
        if df[col].isnull().sum() > df.shape[0]*0.8:
            df.drop(col, axis=1, inplace=True)
    return df


eliminar_columnas(listings_details)
eliminar_columnas(calendar)
eliminar_columnas(neighbourhoods)
eliminar_columnas(reviews)
eliminar_columnas(reviews)


Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,6422,1927,2009-04-30,14100,Melissa,I can't say enough about how wonderful it was ...
1,6422,3867,2009-06-11,17413,Raquel,Michelle and Collier's home is wonderful! They...
2,6422,4159,2009-06-17,20253,Ulrike,I spent one night at Michele's home and felt j...
3,6422,5724,2009-07-18,22544,Phil,Michele and Collier are two of the loveliest p...
4,6422,11891,2009-09-29,33409,Claire,We had the most lovely time staying with Miche...
...,...,...,...,...,...,...
415079,578352515187036769,585443355907226000,2022-03-18,439418241,Sean,Great communicator. Great Host. Close to every...
415080,579080729746540168,586259968944174013,2022-03-19,165475856,Carlos,The house was great very nice and clean.
415081,579246749462338110,585443160650045205,2022-03-18,117194076,Austin,"Location, location, location! This property is..."
415082,579510477958268252,582543243980031914,2022-03-14,11001577,Jeanine,Alex was a great host very easy to communicate...


In [10]:
# Null values in listings_details

listings_details.isnull().sum()[listings_details.isnull().sum() > 0]/listings_details.shape[0]*100

description                     0.676570
neighborhood_overview          28.107075
host_location                   0.147080
host_about                     30.754523
host_response_time              9.104280
host_response_rate              9.104280
host_acceptance_rate            7.309899
host_neighbourhood             34.902191
neighbourhood                  28.107075
bathrooms_text                  0.058832
bedrooms                        4.721283
beds                            0.603030
minimum_minimum_nights          0.014708
maximum_minimum_nights          0.014708
minimum_maximum_nights          0.014708
maximum_maximum_nights          0.014708
minimum_nights_avg_ntm          0.014708
maximum_nights_avg_ntm          0.014708
first_review                   13.354905
last_review                    13.354905
review_scores_rating           13.354905
review_scores_accuracy         13.516694
review_scores_cleanliness      13.516694
review_scores_checkin          13.531402
review_scores_co

In [11]:
calendar.isnull().sum()[calendar.isnull().sum() > 0]/calendar.shape[0]*100 

minimum_nights    0.000605
maximum_nights    0.000605
dtype: float64

In [12]:
neighbourhoods.isnull().sum()[neighbourhoods.isnull().sum() > 0]/neighbourhoods.shape[0]*100

Series([], dtype: float64)

In [13]:
reviews.isnull().sum()[reviews.isnull().sum() > 0]/reviews.shape[0]*100

reviewer_name    0.000241
comments         0.032524
dtype: float64

In [14]:
reviews.isnull().sum()[reviews.isnull().sum() > 0]/reviews.shape[0]*100

reviewer_name    0.000241
comments         0.032524
dtype: float64

##### This funtion calculate the median or the mode in a dataframe to repair nulles values


In [15]:

def repair_null(df):
    
    """ 
    Repair null values in dataframe using mode if the column is categorical or mean if the column is numerical.
    
    """
    
    
    for col in df.columns:
        if df[col].dtype == 'object':
            df[col].fillna(df[col].mode()[0], inplace=True)
        else:
            df[col].fillna(df[col].mean(), inplace=True)
    return df

# Repair null values in dataframes
repair_null(listings_details)
repair_null(calendar)
repair_null(neighbourhoods)
repair_null(reviews)
repair_null(reviews)

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,6422,1927,2009-04-30,14100,Melissa,I can't say enough about how wonderful it was ...
1,6422,3867,2009-06-11,17413,Raquel,Michelle and Collier's home is wonderful! They...
2,6422,4159,2009-06-17,20253,Ulrike,I spent one night at Michele's home and felt j...
3,6422,5724,2009-07-18,22544,Phil,Michele and Collier are two of the loveliest p...
4,6422,11891,2009-09-29,33409,Claire,We had the most lovely time staying with Miche...
...,...,...,...,...,...,...
415079,578352515187036769,585443355907226000,2022-03-18,439418241,Sean,Great communicator. Great Host. Close to every...
415080,579080729746540168,586259968944174013,2022-03-19,165475856,Carlos,The house was great very nice and clean.
415081,579246749462338110,585443160650045205,2022-03-18,117194076,Austin,"Location, location, location! This property is..."
415082,579510477958268252,582543243980031914,2022-03-14,11001577,Jeanine,Alex was a great host very easy to communicate...


In [16]:
listings_details.isnull().sum() #Checking again to be sure

id                                              0
listing_url                                     0
scrape_id                                       0
last_scraped                                    0
name                                            0
                                               ..
calculated_host_listings_count                  0
calculated_host_listings_count_entire_homes     0
calculated_host_listings_count_private_rooms    0
calculated_host_listings_count_shared_rooms     0
reviews_per_month                               0
Length: 70, dtype: int64

In [17]:
listings_details.columns

Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'description',
       'neighborhood_overview', 'picture_url', 'host_id', 'host_url',
       'host_name', 'host_since', 'host_location', 'host_about',
       'host_response_time', 'host_response_rate', 'host_acceptance_rate',
       'host_is_superhost', 'host_thumbnail_url', 'host_picture_url',
       'host_neighbourhood', 'host_listings_count',
       'host_total_listings_count', 'host_verifications',
       'host_has_profile_pic', 'host_identity_verified', 'neighbourhood',
       'neighbourhood_cleansed', 'latitude', 'longitude', 'property_type',
       'room_type', 'accommodates', 'bathrooms_text', 'bedrooms', 'beds',
       'amenities', 'price', 'minimum_nights', 'maximum_nights',
       'minimum_minimum_nights', 'maximum_minimum_nights',
       'minimum_maximum_nights', 'maximum_maximum_nights',
       'minimum_nights_avg_ntm', 'maximum_nights_avg_ntm', 'has_availability',
       'availability_30', 'availability_60', 

In [18]:
#in listings_details erase in the neighbourhood column all cities different from "Nashville"

listings_details.loc[listings_details.neighbourhood != "Nashville", "neighbourhood"] = "Nashville"

In [19]:
# repair the price column formating the values to float 
listings_details.price = listings_details.price.str.replace("$", "").str.replace("\n", "").str.replace("\t", "").str.replace("\r", "").str.replace(",", "").astype(float)


  listings_details.price = listings_details.price.str.replace("$", "").str.replace("\n", "").str.replace("\t", "").str.replace("\r", "").str.replace(",", "").astype(float)


In [20]:
listings_details["price"].value_counts()

100.0     70
150.0     69
200.0     52
103.0     49
99.0      45
          ..
8958.0     1
630.0      1
1188.0     1
626.0      1
481.0      1
Name: price, Length: 861, dtype: int64

In [21]:
listings_details.price.describe()

count     6799.000000
mean       288.304015
std        402.602153
min          0.000000
25%        129.000000
50%        210.000000
75%        337.000000
max      10000.000000
Name: price, dtype: float64

In [23]:
#normalize property_type to work correctly with the model
listings_details.property_type = listings_details.property_type.str.replace("\n", "").str.replace("\t", "").str.replace("\r", "").str.replace(",", "").astype(str)


In [26]:
#create a new value in listing_details.property_type named others if the value count is less than 50

def property_type_others(df):
    for i in df.property_type.value_counts().index:
        if df.property_type.value_counts()[i] < 50:
            df.property_type.replace(i, "others", inplace=True)
    return df
property_type_others(listings_details)

Unnamed: 0,id,listing_url,scrape_id,last_scraped,name,description,neighborhood_overview,picture_url,host_id,host_url,...,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month
0,46607590,https://www.airbnb.com/rooms/46607590,20220320051215,2022-03-20,Music City Getaway,Single level home with 3 Beds and 2 Baths. Le...,Great home in a quiet neighborhood that is con...,https://a0.muscache.com/pictures/4c876991-3103...,230819613,https://www.airbnb.com/users/show/230819613,...,5.000000,5.000000,4.850000,4.930000,f,1,1,0,0,3.580000
1,28916403,https://www.airbnb.com/rooms/28916403,20220320051215,2022-03-20,Asher House 2,"Your Nashville retreat for Music, History and ...",We have the luxury of not being part of a form...,https://a0.muscache.com/pictures/84979b63-d8aa...,217285315,https://www.airbnb.com/users/show/217285315,...,4.960000,4.900000,4.920000,4.900000,t,2,0,2,0,1.140000
2,26496915,https://www.airbnb.com/rooms/26496915,20220320051215,2022-03-20,"The ""Hillbilly"" country retreat 30 min to down...",Country Retreat - just 15 min from dining and...,This is a very unique place that has been in m...,https://a0.muscache.com/pictures/5eff6e82-30a5...,132765119,https://www.airbnb.com/users/show/132765119,...,4.990000,4.950000,4.900000,4.930000,f,4,4,0,0,2.600000
3,44395269,https://www.airbnb.com/rooms/44395269,20220320051215,2022-03-20,"Coco's Cabin, Pets Stay FREE, near Nashville","Coco's Cabin, all the comforts of home and the...","Relaxing, cozy, and safe setting.",https://a0.muscache.com/pictures/d88c0a29-4bcc...,88197167,https://www.airbnb.com/users/show/88197167,...,5.000000,5.000000,5.000000,5.000000,t,3,3,0,0,4.390000
4,32341026,https://www.airbnb.com/rooms/32341026,20220320051215,2022-03-20,"Cali's Cottage, Pets Stay FREE, near Nashville","Cali's Cottage, a cozy pet friendly retreat ne...","I love this place because it is so relaxing, i...",https://a0.muscache.com/pictures/65a85c1a-ad86...,88197167,https://www.airbnb.com/users/show/88197167,...,4.990000,4.990000,4.930000,4.980000,t,3,3,0,0,8.070000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6794,585581618267659285,https://www.airbnb.com/rooms/585581618267659285,20220320051215,2022-03-20,"Ideally-located condo with city view, central ...",<b>The space</b><br />Muse #212<br /><br />Exp...,Resort Description:<br /><br />Nestled on the ...,https://a0.muscache.com/pictures/prohost-api/H...,134126657,https://www.airbnb.com/users/show/134126657,...,4.895098,4.887935,4.792611,4.772563,t,58,58,0,0,2.316179
6795,585908990326289759,https://www.airbnb.com/rooms/585908990326289759,20220320051215,2022-03-20,Landing | Modern Apartment with Amazing Amenit...,This Landing listing is only available for mon...,Resort Description:<br /><br />Nestled on the ...,https://a0.muscache.com/pictures/prohost-api/H...,263502162,https://www.airbnb.com/users/show/263502162,...,4.895098,4.887935,4.792611,4.772563,t,9,9,0,0,2.316179
6796,585927996724919949,https://www.airbnb.com/rooms/585927996724919949,20220320051215,2022-03-20,Landing | Modern Apartment with Amazing Amenit...,This Landing listing is only available for mon...,Resort Description:<br /><br />Nestled on the ...,https://a0.muscache.com/pictures/prohost-api/H...,263502162,https://www.airbnb.com/users/show/263502162,...,4.895098,4.887935,4.792611,4.772563,t,9,9,0,0,2.316179
6797,585996401122268935,https://www.airbnb.com/rooms/585996401122268935,20220320051215,2022-03-20,A place of your own | 1BR in Nashville,Stay for 90+ nights (minimum nights and rates ...,Resort Description:<br /><br />Nestled on the ...,https://a0.muscache.com/pictures/a4re/floorpla...,368944610,https://www.airbnb.com/users/show/368944610,...,4.895098,4.887935,4.792611,4.772563,t,119,119,0,0,2.316179


In [27]:
listings_details.property_type.value_counts()

Entire residential home             2306
Entire rental unit                  1062
Entire condominium (condo)           863
Entire townhouse                     848
Private room in residential home     345
others                               340
Entire guest suite                   268
Entire serviced apartment            201
Entire guesthouse                    159
Entire loft                          133
Room in boutique hotel                87
Entire bungalow                       69
Room in hotel                         68
Entire cottage                        50
Name: property_type, dtype: int64

##### STARTING EDA (EXPLORATORY DATA ANALYSIS)