# Section 1: Overview
>## Background
This notebook is to explore the datasets of airbnb markets. The resource includes the listing file, review file and location file called neighbourhood. This analysis will focus on the listing file, which include information such as, location, listing keyword, host id & name, room type, price, review and etc. 
>## Use cases 
This notebook aims to explore the data and analyse them from aspects of prices, users and listings. Through the analysis, this notebook is expected to understand several features of airbnb hence satisfy the needs of stakeholders(users and hosts).
>>### 1. Listing prices and its location distribution
>>### 2. User and host profile analysis
>>### 3. Listing keyword analysis





# Section 2: Preprocession

In [1]:
## Import library to support the analysis
import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

In [1]:
import geopandas

ModuleNotFoundError: No module named 'geopandas'

In [2]:
## Read in file
airBnb_listing = pd.read_csv('DataSource_AirBnb/listings.csv')
airBnb_listing.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
0,9835,Beautiful Room & House,33057,Manju,,Manningham,-37.77268,145.09213,Private room,60,1,4,2015-09-12,0.03,1,365,0,
1,12936,St Kilda 1BR+BEACHSIDE+BALCONY+WIFI+AC,50121,The A2C Team,,Port Phillip,-37.85999,144.97662,Entire home/apt,95,3,42,2020-03-15,0.3,10,0,0,
2,33111,Million Dollar Views Over Melbourne,143550,Paul,,Melbourne,-37.81997,144.96834,Private room,1000,1,2,2012-01-27,0.02,1,265,0,
3,38271,Melbourne - Old Trafford Apartment,164193,Daryl & Dee,,Casey,-38.05725,145.33936,Entire home/apt,110,1,171,2021-12-16,1.26,1,313,18,
4,41836,CLOSE TO CITY & MELBOURNE AIRPORT,182833,Diana,,Darebin,-37.69729,145.00082,Private room,40,7,159,2018-08-22,1.17,2,0,0,


In [3]:
airBnb_listing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17409 entries, 0 to 17408
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              17409 non-null  int64  
 1   name                            17407 non-null  object 
 2   host_id                         17409 non-null  int64  
 3   host_name                       17405 non-null  object 
 4   neighbourhood_group             0 non-null      float64
 5   neighbourhood                   17409 non-null  object 
 6   latitude                        17409 non-null  float64
 7   longitude                       17409 non-null  float64
 8   room_type                       17409 non-null  object 
 9   price                           17409 non-null  int64  
 10  minimum_nights                  17409 non-null  int64  
 11  number_of_reviews               17409 non-null  int64  
 12  last_review                     

In [4]:
airBnb_listing.isnull().sum()

id                                    0
name                                  2
host_id                               0
host_name                             4
neighbourhood_group               17409
neighbourhood                         0
latitude                              0
longitude                             0
room_type                             0
price                                 0
minimum_nights                        0
number_of_reviews                     0
last_review                        3925
reviews_per_month                  3925
calculated_host_listings_count        0
availability_365                      0
number_of_reviews_ltm                 0
license                           17409
dtype: int64

## 2. Data cleaning

In [5]:
## check the rows with missing listing name
airBnb_listing.loc[airBnb_listing['name'].isna()]

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
1021,5406148,,27981790,Celine,,Melbourne,-37.82165,144.95687,Entire home/apt,125,1,0,,,1,0,0,
3804,15822412,,39805494,Bernadette,,Bayside,-37.89076,144.99128,Private room,120,1,17,2019-05-18,0.29,1,88,0,


**There seems to be no way of filling the information from inference of other cells. Hence fill them with 'unknown'**

In [16]:
airBnb_listing['name'].fillna('unknown', inplace=True)
airBnb_listing['name'].isna().sum()

0

In [6]:
## check the rows with missing host name
airBnb_listing.loc[airBnb_listing['host_name'].isna()]

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
1203,6147642,Large Room Just Across Crown Hotel,31891911,,,Melbourne,-37.82445,144.96246,Private room,75,1,3,2015-07-12,0.04,1,0,0,
2750,11999648,"Private Bedroom, central as it gets",64163227,,,Melbourne,-37.81085,144.96711,Private room,68,1,1,2016-04-12,0.01,2,0,0,
7616,25766554,Lucky Home,193648165,,,Bayside,-37.9826,145.05115,Shared room,80,1,1,2018-08-10,0.02,1,88,0,
9658,32327308,Rowville Beauty,64163227,,,Knox,-37.93473,145.22225,Entire home/apt,225,2,3,2019-04-22,0.09,2,0,0,


**There seems to be no other listings with the same listing ids as the missing host name ones. Hence 'unknown' will be used.**

In [17]:
airBnb_listing['host_name'].fillna('unknown', inplace=True)
airBnb_listing['host_name'].isna().sum()

0

In [18]:
## Read in review file
airBnb_review = pd.read_csv('DataSource_AirBnb/reviews.csv')
airBnb_review.head()

Unnamed: 0,listing_id,date
0,9835,2011-05-24
1,9835,2013-02-26
2,9835,2014-12-08
3,9835,2015-09-12
4,12936,2010-08-04


In [10]:
airBnb_review.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 469148 entries, 0 to 469147
Data columns (total 2 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   listing_id  469148 non-null  int64 
 1   date        469148 non-null  object
dtypes: int64(1), object(1)
memory usage: 7.2+ MB


In [35]:
## check the price range
airBnb_listing['price'].describe()

count    17409.000000
mean       190.626458
std        411.494624
min          0.000000
25%         75.000000
50%        122.000000
75%        200.000000
max      15000.000000
Name: price, dtype: float64

In [36]:
px.box(airBnb_listing, x='price')

In [25]:
## check the price 0s
price0 = airBnb_listing.loc[airBnb_listing['price']==0]
price0

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
12813,41278626,Free bushfire accommodation for family in need,175946909,Shaun,,Yarra Ranges,-37.78599,145.38469,Private room,0,1,0,,,1,0,0,


**No other inference about the price hence this listing will be dropped in this notebook**

In [29]:
## check the extremely expensive prices
priceEx = airBnb_listing.loc[airBnb_listing['price']>1000]
priceEx

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license
172,997517,Kanturk Country Retreat hideaway,5247469,Sally (And Gary),,Frankston,-38.18977,145.20910,Private room,1600,2,18,2019-04-21,0.18,3,68,0,
274,1656742,Inner City Luxury Designer Home,5090138,Eva,,Port Phillip,-37.83163,144.94641,Entire home/apt,1061,7,7,2019-12-29,0.07,1,178,0,
399,2258136,"Convention Centre/Crown, bay views, balcony",11532952,Nina & Gary,,Melbourne,-37.82688,144.95641,Entire home/apt,5000,2,359,2019-07-26,3.74,1,38,0,
456,2532934,A country paradise - Kangaroo Manor,4681384,Louise,,Yarra Ranges,-37.80369,145.64636,Entire home/apt,1200,3,121,2021-12-22,1.44,1,333,7,
519,2998252,Private Double Room in Melbourne CBD,7063206,Dene,,Melbourne,-37.81288,144.95406,Private room,2000,5,9,2016-09-04,0.13,1,0,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17122,53832531,Luxurious 2-Story Home With Stunning Beach View,436051681,James,,Bayside,-37.95369,145.00703,Entire home/apt,1019,3,0,,,1,50,0,
17299,54019976,"Beautiful 4-Bedroom, Minutes from Albert Park ...",49893267,Maddison,,Port Phillip,-37.84633,144.96190,Entire home/apt,1250,2,0,,,1,362,0,
17382,54130492,Large 7 BR residence in an acreage-Sleeps 15,147004384,Lal,,Cardinia,-38.09091,145.73929,Entire home/apt,1280,2,0,,,1,363,0,
17391,54150737,St Aubin,11914644,Luxico Holiday Homes,,Port Phillip,-37.84424,144.94413,Entire home/apt,1126,3,0,,,44,353,0,


In [None]:
airBnb_listing

# Section 3: Data analysis
## 1. Price analysis

## 2. User analysis

## 3. Listing analysis