# Boston Airbnb Open Data

This data set was obtained from [Kaggle](https://www.kaggle.com/datasets/airbnb/boston).

this dataset describes the listing activity of homestays in Boston, MA.

The following Airbnb activity is included in this Boston dataset:

 - Listings, including full descriptions and average review score
 - Reviews, including unique id for each reviewer and detailed comments
 - Calendar, including listing id and the price and availability for that day

# Boston Airbnb Data Aggregation

## Import Functions and load data

### Import cleaned listing data

In [1]:
import pandas as pd
import numpy as np

## Load the data for further analysis

#listing=pd.read_csv('airbnb_boston/listings_cleaned.csv')

listing=pd.read_csv('https://raw.githubusercontent.com/Suhong88/AA630_Spring2023/main/listings_cleaned.csv')

listing.head()

Unnamed: 0,id,listing_url,name,neighbourhood,city,state,zipcode,latitude,longitude,host_name,...,bed_type,amenities,price,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value
0,12147973,https://www.airbnb.com/rooms/12147973,Sunny Bungalow in the City,Roslindale,Boston,MA,2131.0,42.282619,-71.133068,Virginia,...,Real Bed,"{TV,""Wireless Internet"",Kitchen,""Free Parking ...",250.0,91.916667,9.431571,9.258041,9.646293,9.646549,9.414043,9.168234
1,3075044,https://www.airbnb.com/rooms/3075044,Charming room in pet friendly apt,Roslindale,Boston,MA,2131.0,42.286241,-71.134374,Andrea,...,Real Bed,"{TV,Internet,""Wireless Internet"",""Air Conditio...",65.0,94.0,10.0,9.0,10.0,10.0,9.0,9.0
2,6976,https://www.airbnb.com/rooms/6976,Mexican Folk Art Haven in Boston,Roslindale,Boston,MA,2131.0,42.292438,-71.135765,Phil,...,Real Bed,"{TV,""Cable TV"",""Wireless Internet"",""Air Condit...",65.0,98.0,10.0,9.0,10.0,10.0,9.0,10.0
3,1436513,https://www.airbnb.com/rooms/1436513,Spacious Sunny Bedroom Suite in Historic Home,Roslindale,Boston,MA,,42.281106,-71.121021,Meghna,...,Real Bed,"{TV,Internet,""Wireless Internet"",""Air Conditio...",75.0,100.0,10.0,10.0,10.0,10.0,10.0,10.0
4,7651065,https://www.airbnb.com/rooms/7651065,Come Home to Boston,Roslindale,Boston,MA,2131.0,42.284512,-71.136258,Linda,...,Real Bed,"{Internet,""Wireless Internet"",""Air Conditionin...",79.0,99.0,10.0,10.0,10.0,10.0,9.0,10.0


### Import reviews data

- each reviewer can write multiple reviews
- listing and reviews are joined by id in the listing table, and listing_id in review table.

In [2]:
# reviews=pd.read_csv('airbnb_boston/reviews.csv')

reviews=pd.read_csv('https://raw.githubusercontent.com/Suhong88/AA630_Spring2023/main/airbnb_boston/reviews.csv')

reviews.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,1178162,4724140,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,4869189,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,5003196,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...
3,1178162,5150351,2013-06-15,2215611,Marine,The room was nice and clean and so were the co...
4,1178162,5171140,2013-06-16,6848427,Andrew,Great location. Just 5 mins walk from the Airp...


In [3]:
reviews.dtypes

listing_id        int64
id                int64
date             object
reviewer_id       int64
reviewer_name    object
comments         object
dtype: object

In [4]:
# covert listing_id from int to object

reviews['listing_id']=reviews['listing_id'].astype('object')

# covert date from object to date

reviews['date']=pd.to_datetime(reviews.date)

In [5]:
reviews.dtypes

listing_id               object
id                        int64
date             datetime64[ns]
reviewer_id               int64
reviewer_name            object
comments                 object
dtype: object

In [6]:
# drop id from reviews table

reviews1=reviews.drop(['id'], axis=1)

In [8]:
reviews1.head()

Unnamed: 0,listing_id,date,reviewer_id,reviewer_name,comments
0,1178162,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...
3,1178162,2013-06-15,2215611,Marine,The room was nice and clean and so were the co...
4,1178162,2013-06-16,6848427,Andrew,Great location. Just 5 mins walk from the Airp...


# Merging Dataframe

## Merge data from listings and reviews

In [8]:
# inner join gives us all listing appear in both tables

combined_inner=listing.merge(reviews1, left_on='id', right_on='listing_id', how='inner')

# left join gives us all listing appear in both tables and also listings that do not have a review yet

combined_left=listing.merge(reviews1, left_on='id', right_on='listing_id', how='left')

# right join gives us all listing appear in both tables and also reviews that have no matching listing id in the listing table (bad data)

combined_right=listing.merge(reviews1, left_on='id', right_on='listing_id', how='right')

# outer join gives us all listings appear in both tables and in either of the left/right tables

combined_outer=listing.merge(reviews1, left_on='id', right_on='listing_id', how='outer')

In [9]:
print(combined_inner.shape, combined_left.shape,combined_right.shape,combined_outer.shape)

(68260, 31) (69007, 31) (68275, 31) (69022, 31)


In [10]:
combined_inner.head()

Unnamed: 0,id,listing_url,name,neighbourhood,city,state,zipcode,latitude,longitude,host_name,...,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,listing_id,date,reviewer_id,reviewer_name,comments
0,3075044,https://www.airbnb.com/rooms/3075044,Charming room in pet friendly apt,Roslindale,Boston,MA,2131,42.286241,-71.134374,Andrea,...,9.0,10.0,10.0,9.0,9.0,3075044,2014-06-01,9645972,Dmitrii,Andrea is a great host. Neighborhood is wonder...
1,3075044,https://www.airbnb.com/rooms/3075044,Charming room in pet friendly apt,Roslindale,Boston,MA,2131,42.286241,-71.134374,Andrea,...,9.0,10.0,10.0,9.0,9.0,3075044,2014-06-06,12020681,Paola,We had a great time at Andrea's place. He is v...
2,3075044,https://www.airbnb.com/rooms/3075044,Charming room in pet friendly apt,Roslindale,Boston,MA,2131,42.286241,-71.134374,Andrea,...,9.0,10.0,10.0,9.0,9.0,3075044,2014-06-30,8165047,Jaydee,Adrea was very welcoming and flexible to our n...
3,3075044,https://www.airbnb.com/rooms/3075044,Charming room in pet friendly apt,Roslindale,Boston,MA,2131,42.286241,-71.134374,Andrea,...,9.0,10.0,10.0,9.0,9.0,3075044,2014-09-18,21319433,Anthony,Andrea made us feel welcome because he made th...
4,3075044,https://www.airbnb.com/rooms/3075044,Charming room in pet friendly apt,Roslindale,Boston,MA,2131,42.286241,-71.134374,Andrea,...,9.0,10.0,10.0,9.0,9.0,3075044,2014-09-27,21706950,Xinny,Me and my friend were so happy about Andrea's ...


# Binning

## Classify price into different range. Create 4 bins for price

In [11]:
listing['price_level']=pd.cut(listing.price, bins=4, labels=['low', 'medium', 'high', 'very high'])


In [12]:
listing['price_level'].value_counts()

low          3002
medium        517
high           46
very high       8
Name: price_level, dtype: int64

In [14]:
listing['price_level']=pd.cut(listing.price, bins=4)

listing['price_level'].value_counts()

(19.02, 265.0]     3002
(265.0, 510.0]      517
(510.0, 755.0]       46
(755.0, 1000.0]       8
Name: price_level, dtype: int64

In [15]:
listing['price_level']=pd.cut(listing.price, bins=[0, 100, 200, 300, 400, 500, 1000])

listing['price_level'].value_counts()

(100, 200]     1337
(0, 100]       1250
(200, 300]      648
(300, 400]      222
(400, 500]       60
(500, 1000]      56
Name: price_level, dtype: int64

# Summarizing Dataframes

In [72]:
listing.columns

Index(['id', 'listing_url', 'name', 'neighbourhood', 'city', 'state',
       'zipcode', 'latitude', 'longitude', 'host_name', 'property_type',
       'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds',
       'bed_type', 'amenities', 'price', 'review_scores_rating',
       'review_scores_accuracy', 'review_scores_cleanliness',
       'review_scores_checkin', 'review_scores_communication',
       'review_scores_location', 'review_scores_value', 'price_level'],
      dtype='object')

In [17]:
# display max, min, average price for all listings

listing['price'].agg(['mean', 'min', 'max'])

mean     169.626644
min       20.000000
max     1000.000000
Name: price, dtype: float64

In [20]:
# display max, min, average price and review_score_rating, and average number of bedrooms for all listings

listing[['price', 'review_scores_rating', 'bedrooms']].agg({
    'price':['mean', 'min', 'max'],
    'review_scores_rating': ['mean', 'min', 'max'],
    "bedrooms": 'mean'
}
).sort_index()

Unnamed: 0,price,review_scores_rating,bedrooms
max,1000.0,100.0,
mean,169.626644,91.910999,1.254845
min,20.0,20.0,


# Using groupby



## display count, mean, max, min price for all listing by neighborhood. Limit to top 5 neighbourhood based on number of listings

In [21]:
# show mean for all numerical columns

listing.groupby('neighbourhood').mean()

Unnamed: 0_level_0,id,latitude,longitude,accommodates,bathrooms,bedrooms,beds,price,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value
neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Allston,9939201.0,42.35449,-71.130603,2.559846,1.227799,1.173745,1.420849,112.698842,90.669884,9.328549,9.069989,9.600042,9.611702,9.280731,9.081553
Back Bay,8002780.0,42.349464,-71.081209,3.245847,1.16113,1.105541,1.531561,237.598007,91.586656,9.353766,9.295783,9.573606,9.573662,9.831651,9.130471
Bay Village,9089334.0,42.349222,-71.068435,3.541667,1.458333,1.458333,1.833333,266.833333,92.638889,9.35219,9.25268,9.715431,9.757183,9.763014,9.139411
Beacon Hill,8025664.0,42.359043,-71.067841,2.84375,1.114583,1.07425,1.479167,213.34375,93.203559,9.444571,9.318713,9.763078,9.737069,9.869464,9.234571
Brighton,8938194.0,42.347561,-71.150239,2.740541,1.178378,1.243243,1.572973,118.767568,91.16036,9.396982,9.137395,9.646526,9.668219,9.31638,9.160801
Charlestown,8981856.0,42.379216,-71.066982,3.207207,1.256757,1.423423,1.702703,198.045045,93.135886,9.541486,9.378662,9.726309,9.771434,9.472896,9.305299
Chinatown,9832464.0,42.350481,-71.061289,3.732394,1.316901,1.450704,1.732394,232.352113,92.035211,9.393622,9.320299,9.526603,9.597133,9.343962,9.141508
Dorchester,7713704.0,42.304301,-71.062219,2.6171,1.198674,1.315985,1.561798,91.639405,89.711276,9.319153,9.029256,9.62353,9.613735,8.771866,9.069036
Downtown,8811625.0,42.35654,-71.061078,3.649123,1.248538,1.233918,1.701754,237.783626,92.272417,9.373528,9.360679,9.58043,9.568813,9.607861,9.157406
East Boston,8929267.0,42.374669,-71.030639,2.906667,1.163333,1.26,1.606667,119.153333,90.425556,9.366947,9.170751,9.613654,9.620345,9.065311,9.062369


In [26]:
# only show the mean for the price

listing.groupby('neighbourhood')['price'].mean().sort_values(ascending=False).head(5)

neighbourhood
Bay Village                266.833333
South Boston Waterfront    261.148148
Leather District           253.600000
Downtown                   237.783626
Back Bay                   237.598007
Name: price, dtype: float64

In [27]:
# display count, mean, min and max price by nieghbourhood

listing.groupby('neighbourhood')['price'].agg(['count', 'mean', 'min', 'max'])


Unnamed: 0_level_0,count,mean,min,max
neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Allston,259,112.698842,20.0,550.0
Back Bay,301,237.598007,40.0,975.0
Bay Village,24,266.833333,90.0,500.0
Beacon Hill,192,213.34375,75.0,849.0
Brighton,185,118.767568,29.0,999.0
Charlestown,111,198.045045,39.0,1000.0
Chinatown,71,232.352113,80.0,399.0
Dorchester,269,91.639405,25.0,395.0
Downtown,171,237.783626,60.0,600.0
East Boston,150,119.153333,30.0,359.0


In [29]:
# limit to top 5 neighbourhood based on number of listing

listing.groupby('neighbourhood')['price'].agg(['count', 'mean', 'min', 'max']).\
sort_values(by=['count'], ascending=False).head(5)

Unnamed: 0_level_0,count,mean,min,max
neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Jamaica Plain,343,138.478134,22.0,750.0
South End,325,200.978462,45.0,800.0
Back Bay,301,237.598007,40.0,975.0
Fenway,287,199.536585,30.0,750.0
Dorchester,269,91.639405,25.0,395.0


In [7]:
# sort the value by count



## Explore amenities

- what are the most popular amenities?

In [30]:
listing['amenities']

0       {TV,"Wireless Internet",Kitchen,"Free Parking ...
1       {TV,Internet,"Wireless Internet","Air Conditio...
2       {TV,"Cable TV","Wireless Internet","Air Condit...
3       {TV,Internet,"Wireless Internet","Air Conditio...
4       {Internet,"Wireless Internet","Air Conditionin...
                              ...                        
3568    {Internet,"Wireless Internet","Air Conditionin...
3569    {TV,Internet,"Wireless Internet","Air Conditio...
3570    {"translation missing: en.hosting_amenity_49",...
3571    {Kitchen,Gym,"Family/Kid Friendly",Washer,Drye...
3572    {"Wireless Internet",Kitchen,Essentials,"trans...
Name: amenities, Length: 3573, dtype: object

In [31]:
# increase width of pandas column to see the full content of amenities

pd.set_option('display.max_colwidth', 100)

listing['amenities']

0       {TV,"Wireless Internet",Kitchen,"Free Parking on Premises","Pets live on this property",Dog(s),H...
1       {TV,Internet,"Wireless Internet","Air Conditioning",Kitchen,"Pets Allowed","Pets live on this pr...
2       {TV,"Cable TV","Wireless Internet","Air Conditioning",Kitchen,"Free Parking on Premises",Heating...
3       {TV,Internet,"Wireless Internet","Air Conditioning",Kitchen,"Free Parking on Premises",Gym,Break...
4       {Internet,"Wireless Internet","Air Conditioning",Kitchen,Breakfast,Heating,"Smoke Detector","Car...
                                                       ...                                                 
3568    {Internet,"Wireless Internet","Air Conditioning",Kitchen,"Free Parking on Premises",Heating,"Fam...
3569    {TV,Internet,"Wireless Internet","Air Conditioning",Kitchen,"Free Parking on Premises","Smoking ...
3570            {"translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50"}
3571    {Kitchen,Gym,"Family

In [39]:
listing['amenities1']=listing['amenities'].str.split(",")

listing['amenities2']=listing['amenities1'].explode(ignore_index=True)

listing['amenities3']=listing['amenities2'].str.replace(" ", "_").str.replace("{|}|\"", "")

listing['amenities3']

0                               TV
1                Wireless_Internet
2                          Kitchen
3         Free_Parking_on_Premises
4       Pets_live_on_this_property
                   ...            
3568                Smoke_Detector
3569      Carbon_Monoxide_Detector
3570                 First_Aid_Kit
3571                   Safety_Card
3572             Fire_Extinguisher
Name: amenities3, Length: 3573, dtype: object

In [41]:
# show top 10 amenities

listing['amenities3'].value_counts().head(10)

Wireless_Internet           217
Heating                     212
Smoke_Detector              194
Kitchen                     194
Essentials                  185
Carbon_Monoxide_Detector    171
Washer                      163
Dryer                       162
Internet                    156
Air_Conditioning            145
Name: amenities3, dtype: int64

In [46]:
# for the top 10 amenities, what are the average price by amenities.

top10_amenities=listing['amenities3'].value_counts().head(10).index

top10_amenities

listing[listing['amenities3'].isin(top10_amenities)].groupby('amenities3')['price'].mean().sort_values(ascending=False)

amenities3
Internet                    194.705128
Heating                     177.627358
Washer                      171.613497
Dryer                       170.006173
Air_Conditioning            169.862069
Wireless_Internet           169.294931
Carbon_Monoxide_Detector    168.497076
Essentials                  168.205405
Kitchen                     166.149485
Smoke_Detector              162.381443
Name: price, dtype: float64

In [55]:
# for the top 10 amenities, what are the average price by amenities by neighbourhood. Limit to top 5 neighbourhood
top5_neighbourhood=listing.groupby('neighbourhood')['price'].agg(['count', 'mean', 'min', 'max']).\
sort_values(by=['count'], ascending=False).head(5).index

condition= (listing['amenities3'].isin(top10_amenities)) & (listing['neighbourhood'].isin(top5_neighbourhood))

listing[condition].groupby(['neighbourhood', 'amenities3'])\
['price'].mean().sort_values(ascending=False).reset_index().sort_values(by='neighbourhood')

Unnamed: 0,neighbourhood,amenities3,price
0,Back Bay,Essentials,264.941176
12,Back Bay,Kitchen,228.368421
10,Back Bay,Wireless_Internet,233.4
7,Back Bay,Carbon_Monoxide_Detector,244.076923
6,Back Bay,Dryer,244.933333
9,Back Bay,Smoke_Detector,237.933333
4,Back Bay,Internet,247.153846
2,Back Bay,Washer,250.8125
5,Back Bay,Heating,246.9
18,Back Bay,Air_Conditioning,214.125


In [50]:
top5_neighbourhood

Index(['Jamaica Plain', 'South End', 'Back Bay', 'Fenway', 'Dorchester'], dtype='object', name='neighbourhood')

# Pivot table and crosstabs

In [18]:
# for the top 10 amenities, what are the average price by amenities by neighbourhood. Limit to top 5 neighbourhood

condition=(listing.amenities2.isin(top10_amenities)) & (listing.neighbourhood.isin(top5))

listing2=listing[condition]

listing2.pivot_table(
    index='neighbourhood',
    columns='amenities2',
    values='price',
    aggfunc='mean'
).reset_index()

amenities2,neighbourhood,"""Air Conditioning""","""Carbon Monoxide Detector""","""Smoke Detector""","""Wireless Internet""",Dryer,Essentials,Heating,Kitchen,Washer,{TV
0,Back Bay,214.125,244.076923,237.933333,249.928571,244.933333,288.384615,243.105263,228.368421,250.8125,219.25
1,Dorchester,88.0,82.714286,96.0,109.133333,73.416667,85.363636,101.714286,76.266667,96.692308,76.9
2,Fenway,181.875,251.571429,231.5,159.928571,212.777778,216.8,220.266667,184.785714,201.4,168.636364
3,Jamaica Plain,152.764706,123.333333,148.1,109.619048,120.153846,154.25,138.047619,112.833333,175.230769,163.0
4,South End,224.692308,221.066667,211.111111,219.611111,173.647059,147.8,196.904762,172.631579,214.647059,237.181818


In [19]:
# modify the above query to show the average price by amenities by property type for each neighbourhood

condition=(listing.amenities2.isin(top10_amenities)) & (listing.neighbourhood.isin(top5))

listing2=listing[condition]

listing2.pivot_table(
    index=['neighbourhood','property_type'],
    columns='amenities3',
    values='price',
    aggfunc='mean'
).reset_index()

amenities3,neighbourhood,property_type,Air_Conditioning,Carbon_Monoxide_Detector,Dryer,Essentials,Heating,Kitchen,Smoke_Detector,TV,Washer,Wireless_Internet
0,Back Bay,Apartment,244.0,247.75,241.785714,262.272727,268.285714,228.588235,237.933333,219.25,244.2,266.076923
1,Back Bay,Condominium,89.0,200.0,289.0,432.0,172.0,235.0,,,350.0,
2,Back Bay,Loft,160.0,,,,175.0,,,,,
3,Back Bay,Other,,,,,,218.0,,,,40.0
4,Dorchester,Apartment,94.666667,78.7,84.333333,61.333333,81.375,78.625,73.5,73.333333,114.857143,152.857143
5,Dorchester,Bed & Breakfast,,,,,89.5,,,,,100.0
6,Dorchester,Condominium,80.0,,,189.0,,61.333333,155.0,,,
7,Dorchester,House,77.333333,92.75,64.0,95.5,148.5,90.666667,70.75,85.333333,75.5,66.714286
8,Dorchester,Townhouse,,,55.0,,,59.0,,73.0,,
9,Fenway,Apartment,181.875,256.307692,212.777778,216.8,235.166667,184.785714,223.6,168.0,201.4,150.769231


We can use the pd.crosstab() function to create a frequency table. For examplem, we want to see number of listing by neighbourhood and property type

In [20]:
pd.crosstab(
    index=listing2.neighbourhood,
    columns=listing2.property_type,
    colnames=['property type'] # name the columns index
)

property type,Apartment,Bed & Breakfast,Condominium,Dorm,House,Loft,Other,Townhouse
neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Back Bay,129,0,11,0,0,2,2,0
Dorchester,70,3,9,0,43,0,0,3
Fenway,119,1,4,0,2,0,0,0
Jamaica Plain,98,5,10,0,48,2,0,8
South End,134,3,11,1,6,3,2,4


In [21]:
# We can normalize with the row or column totals with the normalize parameter. This shows percentage of the total

pd.crosstab(
    index=listing2.neighbourhood,
    columns=listing2.property_type,
    colnames=['property type'],  # name the columns index
    normalize='columns'         # normalize by column
)

property type,Apartment,Bed & Breakfast,Condominium,Dorm,House,Loft,Other,Townhouse
neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Back Bay,0.234545,0.0,0.244444,0.0,0.0,0.285714,0.5,0.0
Dorchester,0.127273,0.25,0.2,0.0,0.434343,0.0,0.0,0.2
Fenway,0.216364,0.083333,0.088889,0.0,0.020202,0.0,0.0,0.0
Jamaica Plain,0.178182,0.416667,0.222222,0.0,0.484848,0.285714,0.0,0.533333
South End,0.243636,0.25,0.244444,1.0,0.060606,0.428571,0.5,0.266667


In [22]:
pd.crosstab(
    index=listing2.neighbourhood,
    columns=listing2.property_type,
    colnames=['property type'],  # name the columns index
    normalize='index'  # normalize by row/index
)

property type,Apartment,Bed & Breakfast,Condominium,Dorm,House,Loft,Other,Townhouse
neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Back Bay,0.895833,0.0,0.076389,0.0,0.0,0.013889,0.013889,0.0
Dorchester,0.546875,0.023438,0.070312,0.0,0.335938,0.0,0.0,0.023438
Fenway,0.944444,0.007937,0.031746,0.0,0.015873,0.0,0.0,0.0
Jamaica Plain,0.573099,0.02924,0.05848,0.0,0.280702,0.011696,0.0,0.046784
South End,0.817073,0.018293,0.067073,0.006098,0.036585,0.018293,0.012195,0.02439


##  Working with date
extract year, month, weekday

In [23]:
reviews=pd.read_csv('https://raw.githubusercontent.com/Suhong88/AA630_Spring2023/main/airbnb_boston/reviews.csv')

# covert listing_id from int to object

reviews['listing_id']=reviews['listing_id'].astype('object')

# covert date from object to date

reviews['date']=pd.to_datetime(reviews.date)

reviews.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,1178162,4724140,2013-05-21,4298113,Olivier,My stay at islam's place was really cool! Good...
1,1178162,4869189,2013-05-29,6452964,Charlotte,Great location for both airport and city - gre...
2,1178162,5003196,2013-06-06,6449554,Sebastian,We really enjoyed our stay at Islams house. Fr...
3,1178162,5150351,2013-06-15,2215611,Marine,The room was nice and clean and so were the co...
4,1178162,5171140,2013-06-16,6848427,Andrew,Great location. Just 5 mins walk from the Airp...


In [11]:
# create a column for year, month and weekday



In [12]:
# show number of review by weekday



In [13]:
# show number of reviews by year



# Take home assgnments

## Please complete the following five questions based on cleaned listing data.

- 1. List top 5 neighbourhood that have the best review scores rating
- 2. Display number of listing, average review score rating, highest and highest review scores rating by nighbourhood by property type
- 3. List top 10 amenities that have the highest average review score rating
- 4. For each of the top 5 amenities with the highest average review scores rating, show the number of listing, average price and average review scores rating by neighbourhood.
- 5. Which neighbourhood has the best review scores in location? Are listings with higher review scores in location also has the higher price? (hint use corr() function)

In [28]:
import pandas as pd
import numpy as np

## Load the data for further analysis

#listing=pd.read_csv('airbnb_boston/listings_cleaned.csv')

df=pd.read_csv('https://raw.githubusercontent.com/Suhong88/AA630_Spring2023/main/listings_cleaned.csv')

df.head()

Unnamed: 0,id,listing_url,name,neighbourhood,city,state,zipcode,latitude,longitude,host_name,...,bed_type,amenities,price,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value
0,12147973,https://www.airbnb.com/rooms/12147973,Sunny Bungalow in the City,Roslindale,Boston,MA,2131.0,42.282619,-71.133068,Virginia,...,Real Bed,"{TV,""Wireless Internet"",Kitchen,""Free Parking ...",250.0,91.916667,9.431571,9.258041,9.646293,9.646549,9.414043,9.168234
1,3075044,https://www.airbnb.com/rooms/3075044,Charming room in pet friendly apt,Roslindale,Boston,MA,2131.0,42.286241,-71.134374,Andrea,...,Real Bed,"{TV,Internet,""Wireless Internet"",""Air Conditio...",65.0,94.0,10.0,9.0,10.0,10.0,9.0,9.0
2,6976,https://www.airbnb.com/rooms/6976,Mexican Folk Art Haven in Boston,Roslindale,Boston,MA,2131.0,42.292438,-71.135765,Phil,...,Real Bed,"{TV,""Cable TV"",""Wireless Internet"",""Air Condit...",65.0,98.0,10.0,9.0,10.0,10.0,9.0,10.0
3,1436513,https://www.airbnb.com/rooms/1436513,Spacious Sunny Bedroom Suite in Historic Home,Roslindale,Boston,MA,,42.281106,-71.121021,Meghna,...,Real Bed,"{TV,Internet,""Wireless Internet"",""Air Conditio...",75.0,100.0,10.0,10.0,10.0,10.0,10.0,10.0
4,7651065,https://www.airbnb.com/rooms/7651065,Come Home to Boston,Roslindale,Boston,MA,2131.0,42.284512,-71.136258,Linda,...,Real Bed,"{Internet,""Wireless Internet"",""Air Conditionin...",79.0,99.0,10.0,10.0,10.0,10.0,9.0,10.0


# 1. List top 5 neighbourhood that have the best review scores rating

# 2. Display number of listing, average review score rating, highest and highest review scores rating by nighbourhood by property type

# 3. List top 10 amenities that have the highest average review score rating

# 4. For each of the top 5 amenities with the highest average review scores rating, show the number of listing, average price and average review scores rating by neighbourhood. 

# 5. Which neighbourhood has the best review scores in location? Are listings with higher review scores in location also has the higher price? (hint use corr() function)