![NYC Skyline](img/nyc.jpg)

Welcome to New York City, one of the most-visited cities in the world. There are many [Airbnb](https://www.airbnb.com/) listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this notebook, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.

Recall that **CSV**, **TSV**, and **Excel** files are three common formats for storing data. 
Three files containing data on 2019 Airbnb listings are available to you:

**data/airbnb_price.csv**
- **`listing_id`**: unique identifier of listing
- **`price`**: nightly listing price in USD
- **`nbhood_full`**: name of borough and neighborhood where listing is located

**data/airbnb_room_type.xlsx**
This is an Excel file containing data on Airbnb listing descriptions and room types.
- **`listing_id`**: unique identifier of listing
- **`description`**: listing description
- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments

**data/airbnb_last_review.tsv**
This is a TSV file containing data on Airbnb host names and review dates.
- **`listing_id`**: unique identifier of listing
- **`host_name`**: name of listing host
- **`last_review`**: date when the listing was last reviewed

Our goals are to convert untidy data into appropriate formats to analyze, and answer key questions including:

- What is the average price, per night, of an Airbnb listing in NYC?
- How does the average price of an Airbnb listing, per month, compare to the private rental market?
- How many adverts are for private rooms?
- How do Airbnb listing prices compare across the five NYC boroughs?


In [56]:
# We've loaded your first package for you! You can add as many cells as you need.
import numpy as np
import pandas as pd

In [57]:
# Load Data
prices = pd.read_csv('data/airbnb_price.csv')
room_types = pd.read_excel('data/airbnb_room_type.xlsx', sheet_name=0)
reviews = pd.read_csv('data/airbnb_last_review.tsv', sep='\t')

# Check data head
display(prices.head(), room_types.head(), reviews.head())

Unnamed: 0,listing_id,price,nbhood_full
0,2595,225 dollars,"Manhattan, Midtown"
1,3831,89 dollars,"Brooklyn, Clinton Hill"
2,5099,200 dollars,"Manhattan, Murray Hill"
3,5178,79 dollars,"Manhattan, Hell's Kitchen"
4,5238,150 dollars,"Manhattan, Chinatown"


Unnamed: 0,listing_id,description,room_type
0,2595,Skylit Midtown Castle,Entire home/apt
1,3831,Cozy Entire Floor of Brownstone,Entire home/apt
2,5099,Large Cozy 1 BR Apartment In Midtown East,Entire home/apt
3,5178,Large Furnished Room Near B'way,private room
4,5238,Cute & Cozy Lower East Side 1 bdrm,Entire home/apt


Unnamed: 0,listing_id,host_name,last_review
0,2595,Jennifer,May 21 2019
1,3831,LisaRoxanne,July 05 2019
2,5099,Chris,June 22 2019
3,5178,Shunichi,June 24 2019
4,5238,Ben,June 09 2019


In [58]:
# Clean the price column on the prices dataframe
prices['price'] = prices['price'].str.replace(r'[^0-9]', '', regex=True).astype(int)

In [59]:
# Calculate average price
display(prices.describe())

avg_price = prices[prices['price'] > 0]['price'].mean().round(2)
print(avg_price)

Unnamed: 0,listing_id,price
count,25209.0,25209.0
mean,20689220.0,141.777936
std,11029280.0,147.349137
min,2595.0,0.0
25%,12022730.0,69.0
50%,22343910.0,105.0
75%,30376690.0,175.0
max,36455810.0,7500.0


141.82


In [63]:
# Comparing costs to the private rental market
prices['price_per_month'] = (prices[prices['price'] > 0]['price']*365)/12
average_price_per_month = prices['price_per_month'].mean().round(2)
difference = round(abs(average_price_per_month - 3100),2)

print(f'Average price: {average_price_per_month}, difference: {difference}')

Average price: 4313.61, difference: 1213.61


In [64]:
# Calculating room frequencies by type
display(room_types['room_type'].unique())

# Cleaning room type
room_types['room_type'] = room_types['room_type'].str.lower().astype('category')
room_frequencies = room_types['room_type'].value_counts()
print(room_frequencies)


array(['Entire home/apt', 'private room', 'Private room',
       'entire home/apt', 'PRIVATE ROOM', 'shared room',
       'ENTIRE HOME/APT', 'Shared room', 'SHARED ROOM'], dtype=object)

entire home/apt    13266
private room       11356
shared room          587
Name: room_type, dtype: int64


In [65]:
# Check the timeframe of the reviews
display(reviews.info())

# Convert the last_review column to datetime
reviews['last_review'] = pd.to_datetime(reviews['last_review'], format="%B %d %Y")

# Check first and last reviews
last_review = reviews['last_review'].dt.date.max()
first_review = reviews['last_review'].dt.date.min()

print(first_review, last_review)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25209 entries, 0 to 25208
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   listing_id   25209 non-null  int64 
 1   host_name    25201 non-null  object
 2   last_review  25209 non-null  object
dtypes: int64(1), object(2)
memory usage: 591.0+ KB


None

2019-01-01 2019-07-09


In [66]:
# Merging dataframes for final analysis
airbnb_merged = prices.merge(room_types, on='listing_id', how='outer')
airbnb_merged = airbnb_merged.merge(reviews, on='listing_id', how='outer')

# Clean final dataframe
airbnb_merged.dropna(inplace=True)

# Check for duplicates
print(airbnb_merged.duplicated().sum())

display(airbnb_merged.head())

0


Unnamed: 0,listing_id,price,nbhood_full,price_per_month,description,room_type,host_name,last_review
0,2595,225,"Manhattan, Midtown",6843.75,Skylit Midtown Castle,entire home/apt,Jennifer,2019-05-21
1,3831,89,"Brooklyn, Clinton Hill",2707.083333,Cozy Entire Floor of Brownstone,entire home/apt,LisaRoxanne,2019-07-05
2,5099,200,"Manhattan, Murray Hill",6083.333333,Large Cozy 1 BR Apartment In Midtown East,entire home/apt,Chris,2019-06-22
3,5178,79,"Manhattan, Hell's Kitchen",2402.916667,Large Furnished Room Near B'way,private room,Shunichi,2019-06-24
4,5238,150,"Manhattan, Chinatown",4562.5,Cute & Cozy Lower East Side 1 bdrm,entire home/apt,Ben,2019-06-09


In [67]:
# Analyzing prices by borough
airbnb_merged['borough'] = airbnb_merged['nbhood_full'].str.partition(',')[0]

# group by borough to calculate summary statistics

boroughs = airbnb_merged.groupby('borough')['price'].agg(['sum','mean','median','count']).round(2).sort_values('mean', ascending=False)
display(boroughs)

Unnamed: 0_level_0,sum,mean,median,count
borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Manhattan,1898417,184.04,149.0,10315
Brooklyn,1275250,122.02,95.0,10451
Queens,320715,92.83,70.0,3455
Staten Island,22974,86.04,71.0,267
Bronx,55156,79.25,65.0,696


In [68]:
# Price range by borough
# Define label names and price ranges
label_names = ["Budget", "Average", "Expensive", "Extravagant"]
ranges = [0, 69, 175, 350, np.inf]

# Create the price_range column using pd.cut
airbnb_merged['price_range'] = pd.cut(airbnb_merged['price'], bins=ranges, labels=label_names)

# Group by borough and price_range and calculate the count for each label
prices_by_borough = airbnb_merged.groupby(['borough', 'price_range'])['price_range'].count()

# Print the result
print(prices_by_borough)

borough        price_range
Bronx          Budget          381
               Average         285
               Expensive        25
               Extravagant       5
Brooklyn       Budget         3194
               Average        5532
               Expensive      1466
               Extravagant     259
Manhattan      Budget         1148
               Average        5285
               Expensive      3072
               Extravagant     810
Queens         Budget         1631
               Average        1505
               Expensive       291
               Extravagant      28
Staten Island  Budget          124
               Average         123
               Expensive        20
               Extravagant       0
Name: price_range, dtype: int64


In [69]:
# Final result
airbnb_analysis = {
    'avg_price':avg_price,
    'average_price_per_month':average_price_per_month,
    'difference':difference,
    'room_frequencies':room_frequencies,
    'first_reviewed':first_review,
    'last_reviewed':last_review,
    'prices_by_borough':prices_by_borough
}

print(airbnb_analysis)

{'avg_price': 141.82, 'average_price_per_month': 4313.61, 'difference': 1213.61, 'room_frequencies': entire home/apt    13266
private room       11356
shared room          587
Name: room_type, dtype: int64, 'first_reviewed': datetime.date(2019, 1, 1), 'last_reviewed': datetime.date(2019, 7, 9), 'prices_by_borough': borough        price_range
Bronx          Budget          381
               Average         285
               Expensive        25
               Extravagant       5
Brooklyn       Budget         3194
               Average        5532
               Expensive      1466
               Extravagant     259
Manhattan      Budget         1148
               Average        5285
               Expensive      3072
               Extravagant     810
Queens         Budget         1631
               Average        1505
               Expensive       291
               Extravagant      28
Staten Island  Budget          124
               Average         123
               Expensive  