# New York City Airbnb Market Analysis

Welcome to this notebook, where we will explore the short-term rental market in New York City by analyzing Airbnb listings data. As a consultant working for a real estate start-up, I've gathered Airbnb data from various sources to help the company gain insights into private room listings. In particular, we'll focus on understanding the prices, room types, and reviews associated with these listings.

## Project Overview

New York City is one of the most-visited cities in the world, attracting millions of travelers each year. The demand for temporary lodging, especially through platforms like Airbnb, is high, ranging from a few nights to several months. This project aims to dive into this bustling market by analyzing data on Airbnb listings.

To conduct this analysis, we will combine data from multiple file types—CSV, Excel, and TSV files—that provide various information about listings, such as pricing, room type, and review dates.

## Data Files Overview

The following files are available in the `data/` folder for this analysis:

1. **airbnb_price.csv**  
   A CSV file containing data on listing prices and their locations:
   - `listing_id`: Unique identifier of the listing
   - `price`: Nightly price of the listing (USD)
   - `nbhood_full`: Borough and neighborhood where the listing is located

2. **airbnb_room_type.xlsx**  
   An Excel file with data on listing descriptions and room types:
   - `listing_id`: Unique identifier of the listing
   - `description`: Text description of the listing
   - `room_type`: Type of room offered (shared room, private room, or entire home/apartment)

3. **airbnb_last_review.tsv**  
   A TSV file containing host names and the dates of the last review:
   - `listing_id`: Unique identifier of the listing
   - `host_name`: Name of the listing host
   - `last_review`: Date of the last review for the listing

## Key Questions to Address
- What are the dates of the earliest and most recent reviews?
- How many of the listings are private rooms?
- What is the average price of all listings?

## Objective

Using the data provided, we will extract and process key information to answer the above questions. In particular, we will:
- Identify the earliest and most recent review dates.
- Count the number of private room listings.
- Calculate the average price of all listings, rounded to two decimal places.

Finally, we will combine this information into a single DataFrame called `review_dates` with four columns: `first_reviewed`, `last_reviewed`, `nb_private_rooms`, and `avg_price`. The DataFrame will contain one row summarizing these insights.

Let's dive into the analysis!

In [1]:
import pandas as pd

In [2]:
#Loading data from different file types
airbnb_price = pd.read_csv("Dataset/airbnb_data/airbnb_price.csv")
airbnb_room_type = pd.read_excel("Dataset/airbnb_data/airbnb_room_type.xlsx")
airbnb_last_review = pd.read_csv('Dataset/airbnb_data/airbnb_last_review.tsv', sep='\t')

In [3]:
airbnb_price.head()

Unnamed: 0,listing_id,price,nbhood_full
0,2595,225 dollars,"Manhattan, Midtown"
1,3831,89 dollars,"Brooklyn, Clinton Hill"
2,5099,200 dollars,"Manhattan, Murray Hill"
3,5178,79 dollars,"Manhattan, Hell's Kitchen"
4,5238,150 dollars,"Manhattan, Chinatown"


In [4]:
airbnb_room_type.head()

Unnamed: 0,listing_id,description,room_type
0,2595,Skylit Midtown Castle,Entire home/apt
1,3831,Cozy Entire Floor of Brownstone,Entire home/apt
2,5099,Large Cozy 1 BR Apartment In Midtown East,Entire home/apt
3,5178,Large Furnished Room Near B'way,private room
4,5238,Cute & Cozy Lower East Side 1 bdrm,Entire home/apt


In [5]:
airbnb_last_review.head()

Unnamed: 0,listing_id,host_name,last_review
0,2595,Jennifer,May 21 2019
1,3831,LisaRoxanne,July 05 2019
2,5099,Chris,June 22 2019
3,5178,Shunichi,June 24 2019
4,5238,Ben,June 09 2019


So, we've decided to **merge** the three datasets into one for a clearer overall understanding and improved analysis. 

In [7]:
# Make a copy of the datasets before manipulating them
airbnb_price_copy = airbnb_price.copy()
airbnb_room_type_copy = airbnb_room_type.copy()
airbnb_last_review_copy = airbnb_last_review.copy()

In [9]:
#We can merge the three datasets on listing_id
merged_data = airbnb_price_copy.merge(airbnb_room_type_copy, on='listing_id', how='inner') \
                               .merge(airbnb_last_review_copy, on='listing_id', how='inner')

In [10]:
merged_data.head()

Unnamed: 0,listing_id,price,nbhood_full,description,room_type,host_name,last_review
0,2595,225 dollars,"Manhattan, Midtown",Skylit Midtown Castle,Entire home/apt,Jennifer,May 21 2019
1,3831,89 dollars,"Brooklyn, Clinton Hill",Cozy Entire Floor of Brownstone,Entire home/apt,LisaRoxanne,July 05 2019
2,5099,200 dollars,"Manhattan, Murray Hill",Large Cozy 1 BR Apartment In Midtown East,Entire home/apt,Chris,June 22 2019
3,5178,79 dollars,"Manhattan, Hell's Kitchen",Large Furnished Room Near B'way,private room,Shunichi,June 24 2019
4,5238,150 dollars,"Manhattan, Chinatown",Cute & Cozy Lower East Side 1 bdrm,Entire home/apt,Ben,June 09 2019


### 1. Determining the earliest and most recent review dates

In [15]:
merged_data["last_review"].dtypes

dtype('O')

In [17]:
#Convert the 'last_review' column to datetime format
merged_data["last_review"] = pd.to_datetime(merged_data["last_review"], errors='coerce')

In [18]:
merged_data["last_review"].dtypes

dtype('<M8[ns]')

In [24]:
earliest_review = merged_data["last_review"].min()

The earliest reviews date back to January 1, 2019.

In [21]:
merged_data["last_review"].max()

Timestamp('2019-07-09 00:00:00')

In [None]:
The most recent reviews date back to July 9, 2019.