# Exploring Airbnb Market Trends

Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.

## Objective

Perform exploratory Data Analysis by combining data from multiple sources and file types to a single Pandas DataFrame, as well as obtaining insights from market trends. 

**data/airbnb_price.csv**
This is a CSV file containing data on Airbnb listing prices and locations.
- **`listing_id`**: unique identifier of listing
- **`price`**: nightly listing price in USD
- **`nbhood_full`**: name of borough and neighborhood where listing is located

**data/airbnb_room_type.xlsx**
This is an Excel file containing data on Airbnb listing descriptions and room types.
- **`listing_id`**: unique identifier of listing
- **`description`**: listing description
- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments

**data/airbnb_last_review.tsv**
This is a TSV file containing data on Airbnb host names and review dates.
- **`listing_id`**: unique identifier of listing
- **`host_name`**: name of listing host
- **`last_review`**: date when the listing was last reviewed

In [106]:
# Import necessary packages
import pandas as pd
import numpy as np

# Import and create DataFrames, adjust dtypes accordingly
airbnb_price = pd.read_csv('data/airbnb_price.csv')
airbnb_room_type = pd.read_excel('data/airbnb_room_type.xlsx', names=['listing_id','description','room_type'])
airbnb_last_review = np.loadtxt('data/airbnb_last_review.tsv', delimiter='\t',dtype=str)
airbnb_last_review_df = pd.DataFrame(airbnb_last_review[1:],columns=airbnb_last_review[0])
airbnb_last_review_df['listing_id'] = airbnb_last_review_df['listing_id'].astype('int')

# Merge DataFrames on 'listing_id'
room_type_and_price_df = pd.merge(airbnb_room_type,airbnb_price,how='inner',on='listing_id')
airbnb = pd.merge(room_type_and_price_df, airbnb_last_review_df, how='inner', on='listing_id')

# Check for duplicates: 
print('Duplicated IDs:',airbnb.duplicated(subset='listing_id').sum())

airbnb.head()


Duplicated IDs: 0


Unnamed: 0,listing_id,description,room_type,price,nbhood_full,host_name,last_review
0,2595,Skylit Midtown Castle,Entire home/apt,225 dollars,"Manhattan, Midtown",Jennifer,May 21 2019
1,3831,Cozy Entire Floor of Brownstone,Entire home/apt,89 dollars,"Brooklyn, Clinton Hill",LisaRoxanne,July 05 2019
2,5099,Large Cozy 1 BR Apartment In Midtown East,Entire home/apt,200 dollars,"Manhattan, Murray Hill",Chris,June 22 2019
3,5178,Large Furnished Room Near B'way,private room,79 dollars,"Manhattan, Hell's Kitchen",Shunichi,June 24 2019
4,5238,Cute & Cozy Lower East Side 1 bdrm,Entire home/apt,150 dollars,"Manhattan, Chinatown",Ben,June 09 2019


In [107]:
# ---------------------------------- # 
# Assigning appropriate column types |
# ---------------------------------- # 

# Room_type column
airbnb['room_type'] = airbnb['room_type'].str.title() # Adjusts for formatting variations. 
airbnb['room_type'] = airbnb['room_type'].astype('category') # Adjusts for it to be a categorical column. 

# Price column
airbnb['price'] = airbnb['price'].str.replace(' dollars','').astype('int') # Removes str and assigns it to be int type.

# Last_review column 
airbnb['last_review'] = pd.to_datetime(airbnb['last_review']) # Converts column to datetime

print(airbnb.dtypes)

listing_id              int64
description            object
room_type            category
price                   int64
nbhood_full            object
host_name              object
last_review    datetime64[ns]
dtype: object


In [108]:
# ---------------------------------- # 
# Earliest and most recent reviews 
# ---------------------------------- # 

earliest = airbnb['last_review'].min().strftime('%Y-%m-%d')
latest = airbnb['last_review'].max().strftime('%Y-%m-%d')

print('Earliest review:', earliest)
print('Lastest review:', latest)

Earliest review: 2019-01-01 00:00:00
Lastest review: 2019-07-09 00:00:00


In [109]:
# ---------------------------------- # 
# How many listings are private rooms? 
# ---------------------------------- # 

num_private_rooms = airbnb['room_type'].value_counts()[1]

print('Number of Listings that are private rooms:', num_private_rooms)

Number of Listings that are private rooms: 11356


In [110]:
# ---------------------------------- # 
# Avg Listing Price
# ---------------------------------- # 

avg_list_price = round(airbnb['price'].mean(),2)

print(f'The average listing price is {avg_list_price}.')

The average listing price is 141.78.


In [111]:
# ---------------------------------- # 
# Combining results into a DataFrame
# ---------------------------------- # 

# Create a dictionary storing the results
review_dates_dict = {
    'first_reviewed':[earliest],
    'last_reviewed':[latest],
    'nb_private_rooms':[num_private_rooms],
    'avg_price':[avg_list_price]
}

# Convert dictionary to pandas DataFrame
review_dates = pd.DataFrame(review_dates_dict)

print(review_dates)

  first_reviewed last_reviewed  nb_private_rooms  avg_price
0     2019-01-01    2019-07-09             11356     141.78
