# Exploring the Airbnb London Listings dataset 
This jupyter notebook will start by exploring some simple questions about the types of listings in london.

Firstly, importing pandas, numpy, and the dataset.

In [130]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
airbnb = pd.read_csv('data/listings.csv', low_memory=False, header=0)
#list(airbnb.columns) 

Airbnb hosts can list entire homes/apartments, private or shared rooms.

Private rooms are more like hotels, and a shared room more like a hostel. This could be disruptive to neighbourhoods, as Airbnb has said: 'Depending on the room type, availability, and activity, an airbnb listing could be more like a hotel, disruptive for neighbours, taking away housing, and illegal.'

In [132]:
airbnb_room_type = airbnb.groupby(['room_type']).count()['id']
print(airbnb_room_type)

room_type
Entire home/apt    45065
Private room       34964
Shared room          738
Name: id, dtype: int64


We therefore get, a count of 45065 for entire home or appartment, 34964 for Private Room, and 738 for share rooms. 

But it would be useful to know the proportion of these that are of each category. 

In [133]:
percent_room_type = airbnb_room_type/(airbnb['room_type'].count())
print(percent_room_type)

room_type
Entire home/apt    0.557963
Private room       0.432900
Shared room        0.009137
Name: id, dtype: float64


We therefore see that the largest proportion of listings, 56% are for entire home of apartments, followed by 43% of the listings that are for private rooms, and finally 1% of the listings that are for shared rooms. 

Other important information, include the fact that Airbnb, has stated that: 'Entire homes or apartments highly available year-round for tourists, probably don't have the owner present, could be illegal, and more importantly, are displacing residents.' It would be interesting to observe, how many of the listings are available all year round, and what room type these often are. Airbnb 

availability_365 is the column name

Low availability 0-60 days a year
Mediam availability 60-90 days a year
High availability 90-365 days a year (defined by Airbnb). 

In [141]:
# calculate what percentage/count of listings are 
# highly avialable > 90 days. 
listings = airbnb['id'].count()

print("Total Number of Listings: " + str(listings))

availability_low = airbnb[airbnb['availability_365'] <= 60]['id']
availability_medium = airbnb[(airbnb['availability_365'] <= 90) &
                                (airbnb['availability_365'] > 60)]['id']
availability_high = airbnb[airbnb['availability_365'] > 90]['id']

print("Low availability: " + str(availability_low.count()) + "Percentage: " + str((availability_low.count())/listings))
print("Medium availability: " + str(availability_medium.count()) + "Percentage: " + str((availability_medium.count())/listings))
print("High availability: " + str(availability_high.count()) + "Percentage: " + str((availability_high.count())/listings))



Total Number of Listings: 80767
Low availability: 40557Percentage: 0.5021481545680786
Medium availability: 6958Percentage: 0.08614904602127106
High availability: 33252Percentage: 0.4117027994106504


In [None]:
#calculate the percentage of each room type that has each type of availability. 

In [73]:
#Will need this later on, when I want to do visualisations 
#about other parts of the data etc.


airbnb_entire = airbnb[airbnb.room_type == 'Entire home/apt']
airbnb_private = airbnb[airbnb.room_type == 'Private room']
airbnb_shared = airbnb[airbnb.room_type == 'Shared room']