## Air BnB Bangkok Listing.

Date 2013-2022
From https://www.kaggle.com/code/dyahwitamara/air-bnb-bangkok-analysis/dataset

# Introduction

This dataset provides information on short-term rental listings offered through Airbnb in Bangkok, Thailand. It serves as a valuable resource for researchers, data analysts, and anyone interested in the Bangkok Airbnb market.

Data Source:

The data can originate from various sources, including:

Publicly available dumps from websites like Inside Airbnb (https://insideairbnb.com/get-the-data)
Public data repositories like Kaggle (https://www.kaggle.com/code/sheldonlopez1001/airbnb-listings-bangkok)

Data Updates:

The frequency of data updates depends on the source. Public repositories might offer periodic updates, while scraped data might require refreshing at specific intervals.


In [1]:
## Importing important libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats


## Loading Dataset

In [2]:
# Loaded variable 'df' from URI: c:\Users\HP\Desktop\python ds\Projects\Airbnb Listings Bangkok.csv
df = pd.read_csv(r'c:\\Users\\HP\\Desktop\\python ds\\Projects\\Airbnb Listings Bangkok.csv')

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
0,0,27934,Nice room with superb city view,120437,Nuttee,Ratchathewi,13.75983,100.54134,Entire home/apt,1905,3,65,2020-01-06,0.5,2,353,0
1,1,27979,"Easy going landlord,easy place",120541,Emy,Bang Na,13.66818,100.61674,Private room,1316,1,0,,,2,358,0
2,2,28745,modern-style apartment in Bangkok,123784,Familyroom,Bang Kapi,13.75232,100.62402,Private room,800,60,0,,,1,365,0
3,3,35780,Spacious one bedroom at The Kris Condo Bldg. 3,153730,Sirilak,Din Daeng,13.78823,100.57256,Private room,1286,7,2,2022-04-01,0.03,1,323,1
4,4,941865,Suite Room 3 at MetroPoint,610315,Kasem,Bang Kapi,13.76872,100.63338,Private room,1905,1,0,,,3,365,0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15854 entries, 0 to 15853
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   Unnamed: 0                      15854 non-null  int64  
 1   id                              15854 non-null  int64  
 2   name                            15846 non-null  object 
 3   host_id                         15854 non-null  int64  
 4   host_name                       15853 non-null  object 
 5   neighbourhood                   15854 non-null  object 
 6   latitude                        15854 non-null  float64
 7   longitude                       15854 non-null  float64
 8   room_type                       15854 non-null  object 
 9   price                           15854 non-null  int64  
 10  minimum_nights                  15854 non-null  int64  
 11  number_of_reviews               15854 non-null  int64  
 12  last_review                     

## Checking Null values

In [5]:
df.isnull().sum()

Unnamed: 0                           0
id                                   0
name                                 8
host_id                              0
host_name                            1
neighbourhood                        0
latitude                             0
longitude                            0
room_type                            0
price                                0
minimum_nights                       0
number_of_reviews                    0
last_review                       5790
reviews_per_month                 5790
calculated_host_listings_count       0
availability_365                     0
number_of_reviews_ltm                0
dtype: int64

In [6]:
df[df['name'].isna()]               

Unnamed: 0.1,Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
439,439,4549768,,18852579,Titawan,Phra Khanong,13.69406,100.59619,Private room,1080,5,0,,,1,365,0
544,544,4720818,,24386225,Cherry,Din Daeng,13.77562,100.57346,Private room,1200,1,0,,,1,365,0
572,572,4245018,,22030043,Parichart,Bang Phlat,13.78376,100.49821,Private room,1200,1,0,,,1,365,0
669,669,6148415,,31895202,Chira,Bang Na,13.68276,100.60894,Entire home/apt,2424,2,0,,,1,365,0
1030,1030,8055144,,42521288,Nantida,Vadhana,13.74126,100.55761,Private room,5000,3,0,,,1,365,0
1282,1282,10000742,,51374914,Diamond Bangkok,Ratchathewi,13.75328,100.52928,Private room,930,1,6,2017-05-13,0.07,1,365,0
1594,1594,10710165,,55347997,Khaneungnit,Vadhana,13.71757,100.60464,Private room,1000,1,0,,,1,365,0
2075,2075,13142743,,73275200,Pakaphol,Khlong Toei,13.72566,100.56416,Private room,850,1,2,2017-12-11,0.03,3,220,0


We are going to drop missing value of the subset column; name since the mean of missing value is less than 1% making them insignificant

In [7]:
# Drop rows with missing data in column: 'name'
df = df.dropna(subset=['name'])

Checking for duplicate files

In [8]:
df_duplicate = df[df.duplicated(subset=['longitude', 'latitude','last_review','price','name','room_type'])]
display(len(df_duplicate))
df_duplicate

14

Unnamed: 0.1,Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm
5976,5976,28907857,NA BANGLAMPOO GUEST HOUSE,87704107,Yui,Phra Nakhon,13.76279,100.4979,Private room,1400,1,0,,,9,361,0
8790,8790,37954129,small1,97598307,Soo,Don Mueang,13.92072,100.57578,Shared room,350,1,0,,,4,180,0
15121,15121,767128654727724698,Sathon Luxury Loft/China Town/Icon Siam,276781306,Alex,Sathon,13.710777,100.519891,Entire home/apt,1580,1,0,,,13,339,0
15144,15144,767945912006659422,Sathon Luxury Loft/China Town/Icon Siam,344327171,Alice,Sathon,13.710777,100.519891,Entire home/apt,1580,1,0,,,40,339,0
15186,15186,765691389894680033,Sathon Luxury 2 br/China Town/Icon Siam,264864968,Tricia,Sathon,13.710777,100.519891,Entire home/apt,2221,1,0,,,44,340,0
15190,15190,765716244664642439,Sathon Luxury 2 br/China Town/Icon Siam,344327171,Alice,Sathon,13.710777,100.519891,Entire home/apt,2221,1,0,,,40,340,0
15191,15191,765721954905526928,Sathon Luxury 2 br/China Town/Icon Siam,344327171,Alice,Sathon,13.710777,100.519891,Entire home/apt,2221,1,0,,,40,340,0
15192,15192,765728211212001811,Sathon Luxury 2 br/China Town/Icon Siam,276781306,Alex,Sathon,13.710777,100.519891,Entire home/apt,2221,1,0,,,13,340,0
15195,15195,765781484209218358,Sathon Luxury 2 br/China Town/Icon Siam,20133201,Willam,Sathon,13.710777,100.519891,Entire home/apt,2221,1,0,,,33,340,0
15371,15371,775777794097427183,New! Gateway/ Bangkok University 1BR 2PPL near...,52161947,Noons,Khlong Toei,13.716669,100.584967,Entire home/apt,2976,1,0,,,99,362,0


In [9]:
# Drop column: 'Unnamed: 0' Because it is a duplicate of Index
df = df.drop(columns=['Unnamed: 0'])

In [16]:

df.drop_duplicates(subset=['longitude', 'latitude','last_review','price','name','room_type'], keep='first', inplace=True, ignore_index=False)

In [17]:
df['reviews_per_month'].fillna(value=0,inplace=True)

In [10]:
# Creating a new column (last_review (Year)) to facilitate creating time series graphs.

df['last_review (Year)']=df['last_review'].apply(lambda x: str(x)[:4])

In [12]:
# Change column (last_review (Year) to date time

df['last_review']=pd.to_datetime(df['last_review'])
df['last_review (Year)']=pd.to_datetime(df['last_review (Year)'])

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 15846 entries, 0 to 15853
Data columns (total 17 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   id                              15846 non-null  int64         
 1   name                            15846 non-null  object        
 2   host_id                         15846 non-null  int64         
 3   host_name                       15845 non-null  object        
 4   neighbourhood                   15846 non-null  object        
 5   latitude                        15846 non-null  float64       
 6   longitude                       15846 non-null  float64       
 7   room_type                       15846 non-null  object        
 8   price                           15846 non-null  int64         
 9   minimum_nights                  15846 non-null  int64         
 10  number_of_reviews               15846 non-null  int64         
 11  last_re

In [15]:
df.describe()

Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,last_review (Year)
count,15846.0,15846.0,15846.0,15846.0,15846.0,15846.0,15846.0,10062,10062.0,15846.0,15846.0,15846.0,10062
mean,1.580194e+17,154163400.0,13.745149,100.559899,3218.465,15.299129,16.66206,2021-08-30 15:36:57.531305728,0.813297,13.895999,244.326896,3.483277,2020-12-19 10:23:32.522361344
min,27934.0,58920.0,13.5273,100.32955,0.0,1.0,0.0,2012-12-15 00:00:00,0.01,1.0,0.0,0.0,2012-01-01 00:00:00
25%,21045130.0,39744310.0,13.720092,100.52969,900.0,1.0,0.0,2020-02-20 00:00:00,0.12,1.0,138.0,0.0,2020-01-01 00:00:00
50%,35052970.0,122455600.0,13.73849,100.561405,1429.0,1.0,2.0,2022-10-24 00:00:00,0.44,4.0,309.0,0.0,2022-01-01 00:00:00
75%,52584730.0,239194100.0,13.759497,100.585148,2429.0,7.0,13.0,2022-12-08 00:00:00,1.06,13.0,360.0,3.0,2022-01-01 00:00:00
max,7.908162e+17,492665900.0,13.95354,100.92344,1100000.0,1125.0,1224.0,2022-12-28 00:00:00,19.13,228.0,365.0,325.0,2022-01-01 00:00:00
std,2.946545e+17,131880400.0,0.043043,0.050917,24978.38,50.826943,40.622034,,1.090251,30.276153,125.849295,8.918845,


In [19]:
df.isna().sum()   

id                                   0
name                                 0
host_id                              0
host_name                            1
neighbourhood                        0
latitude                             0
longitude                            0
room_type                            0
price                                0
minimum_nights                       0
number_of_reviews                    0
last_review                       5770
reviews_per_month                    0
calculated_host_listings_count       0
availability_365                     0
number_of_reviews_ltm                0
last_review (Year)                5770
dtype: int64

In [22]:
# Replace missing values with 0 in column: 'reviews_per_month'
df = df.fillna({'reviews_per_month': 0})

# Drop rows with missing data in column: 'last_review'
df = df.dropna(subset=['last_review'])

In [23]:
df.isna().sum()   

id                                0
name                              0
host_id                           0
host_name                         1
neighbourhood                     0
latitude                          0
longitude                         0
room_type                         0
price                             0
minimum_nights                    0
number_of_reviews                 0
last_review                       0
reviews_per_month                 0
calculated_host_listings_count    0
availability_365                  0
number_of_reviews_ltm             0
last_review (Year)                0
dtype: int64

In [26]:
# Sort by column: 'price' (ascending)
df = df.sort_values(['price'])

In [27]:
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,last_review (Year)
15473,780103082979427963,Flourish Capsule Hostel,491129139,Chirapat,Bang Rak,13.732114,100.526675,Shared room,295,1,1,2022-12-19,1.0,1,176,1,2022-01-01
9529,40547972,Private fan room with local family,314098135,Khwanjai,Bangkok Noi,13.75421,100.47077,Private room,300,1,1,2019-12-27,0.03,1,227,0,2019-01-01
9636,40651692,Mystery Hostel l: Deluxe 10 Bred Mix Dormitory,313221586,Krit,Phra Nakhon,13.75753,100.49709,Shared room,303,1,2,2022-09-29,0.05,4,181,1,2022-01-01
8974,38406752,üè°5 mins walk to üöÖ Cozy room in a local livingüòä,29685153,Mon,Phasi Charoen,13.72635,100.46514,Private room,304,2,16,2022-10-10,0.42,2,330,11,2022-01-01
9558,40597532,Standard Room 10 Bed With Shared Bathroom,313221586,Krit,Phra Nakhon,13.75938,100.49829,Shared room,304,1,1,2022-11-21,0.79,4,271,1,2022-01-01


In [28]:
# Dropping the rows where the price has a value of 0 because it is considered an anomaly (accommodation price cannot be set to 0).
df.drop(df[df['price'] == 0].index, inplace=True)

In [30]:
# Detecting columns that have a value of 0 after the rows with 0 values have been dropped.

for column in df.columns:
    if 0 in df[column].values:
        print(f"Has {column} a value of 0.")

Has availability_365 a value of 0.
Has number_of_reviews_ltm a value of 0.


  if 0 in df[column].values:
