# Exploring Airbnb Market Trends

Source: Datacamp

![NYC Skyline](nyc.jpg)

Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.

Recall that **CSV**, **TSV**, and **Excel** files are three common formats for storing data. 
Three files containing data on 2019 Airbnb listings are available to you:

**data/airbnb_price.csv**
This is a CSV file containing data on Airbnb listing prices and locations.
- **`listing_id`**: unique identifier of listing
- **`price`**: nightly listing price in USD
- **`nbhood_full`**: name of borough and neighborhood where listing is located

**data/airbnb_room_type.xlsx**
This is an Excel file containing data on Airbnb listing descriptions and room types.
- **`listing_id`**: unique identifier of listing
- **`description`**: listing description
- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments

**data/airbnb_last_review.tsv**
This is a TSV file containing data on Airbnb host names and review dates.
- **`listing_id`**: unique identifier of listing
- **`host_name`**: name of listing host
- **`last_review`**: date when the listing was last reviewed

In [1]:
# Import necessary packages
import pandas as pd
import numpy as np

# Begin coding here ...
# Use as many cells as you like

## Overview 

As a consultant working for a real estate start-up, you have collected Airbnb listing data from various sources to investigate the short-term rental market in New York. You'll analyze this data to provide insights on private rooms to the real estate company.

There are three files in the `data` folder: `airbnb_price.csv`, `airbnb_room_type.xlsx`, `airbnb_last_review.tsv`.

## Read and explore files

In [2]:
df_price = pd.read_csv('data/airbnb_price.csv')
df_price.head(3)

Unnamed: 0,listing_id,price,nbhood_full
0,2595,225 dollars,"Manhattan, Midtown"
1,3831,89 dollars,"Brooklyn, Clinton Hill"
2,5099,200 dollars,"Manhattan, Murray Hill"


In [3]:
df_room_type = pd.read_excel('data/airbnb_room_type.xlsx')
df_room_type.head(3)

Unnamed: 0,listing_id,description,room_type
0,2595,Skylit Midtown Castle,Entire home/apt
1,3831,Cozy Entire Floor of Brownstone,Entire home/apt
2,5099,Large Cozy 1 BR Apartment In Midtown East,Entire home/apt


In [4]:
df_last_review = pd.read_csv('data/airbnb_last_review.tsv', sep='\t')
df_last_review.head(3)

Unnamed: 0,listing_id,host_name,last_review
0,2595,Jennifer,May 21 2019
1,3831,LisaRoxanne,July 05 2019
2,5099,Chris,June 22 2019


### Information about the DataFrames

In [5]:
df_price.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25209 entries, 0 to 25208
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   listing_id   25209 non-null  int64 
 1   price        25209 non-null  object
 2   nbhood_full  25209 non-null  object
dtypes: int64(1), object(2)
memory usage: 591.0+ KB


In [6]:
df_room_type.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25209 entries, 0 to 25208
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   listing_id   25209 non-null  int64 
 1   description  25199 non-null  object
 2   room_type    25209 non-null  object
dtypes: int64(1), object(2)
memory usage: 591.0+ KB


In [7]:
df_last_review.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25209 entries, 0 to 25208
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   listing_id   25209 non-null  int64 
 1   host_name    25201 non-null  object
 2   last_review  25209 non-null  object
dtypes: int64(1), object(2)
memory usage: 591.0+ KB


### Merge the three DataFrames

In [8]:
df = pd.merge(
    left=df_price,
    right=df_room_type,
    on='listing_id'
)

df = pd.merge(
    left=df,
    right=df_last_review
)

df.head(3)

Unnamed: 0,listing_id,price,nbhood_full,description,room_type,host_name,last_review
0,2595,225 dollars,"Manhattan, Midtown",Skylit Midtown Castle,Entire home/apt,Jennifer,May 21 2019
1,3831,89 dollars,"Brooklyn, Clinton Hill",Cozy Entire Floor of Brownstone,Entire home/apt,LisaRoxanne,July 05 2019
2,5099,200 dollars,"Manhattan, Murray Hill",Large Cozy 1 BR Apartment In Midtown East,Entire home/apt,Chris,June 22 2019


## Requirements

- What are the dates of the earliest and most recent reviews? Store these values as two separate variables with your preferred names.
- How many of the listings are private rooms? Save this into any variable.
- What is the average listing price? Round to the nearest two decimal places and save into a variable.
- Combine the new variables into one DataFrame called `review_dates` with four columns in the following order: `first_reviewed`, `last_reviewed`, `nb_private_rooms`, and `avg_price`. The DataFrame should only contain one row of values.

### Requirement 1

What are the dates of the earliest and most recent reviews? Store these values as two separate variables with your preferred names.

In [9]:
# check the column data types of df_last_review
str(df['last_review'].dtype)

'object'

In [10]:
# convert the last_review_column to a datetime type
df['last_review'] = pd.to_datetime(df['last_review'])

df.head(3)

Unnamed: 0,listing_id,price,nbhood_full,description,room_type,host_name,last_review
0,2595,225 dollars,"Manhattan, Midtown",Skylit Midtown Castle,Entire home/apt,Jennifer,2019-05-21
1,3831,89 dollars,"Brooklyn, Clinton Hill",Cozy Entire Floor of Brownstone,Entire home/apt,LisaRoxanne,2019-07-05
2,5099,200 dollars,"Manhattan, Murray Hill",Large Cozy 1 BR Apartment In Midtown East,Entire home/apt,Chris,2019-06-22


In [11]:
first_reviewed = df['last_review'].min()
last_reviewed = df['last_review'].max()

print(f'first_reviewed={first_reviewed}, last_reviewed={last_reviewed}')

first_reviewed=2019-01-01 00:00:00, last_reviewed=2019-07-09 00:00:00


### Requirement 2

How many of the listings are private rooms? Save this into any variable.

In [12]:
df['room_type'].value_counts()

room_type
Entire home/apt    8458
Private room       7241
entire home/apt    2665
private room       2248
ENTIRE HOME/APT    2143
PRIVATE ROOM       1867
Shared room         380
shared room         110
SHARED ROOM          97
Name: count, dtype: int64

In [13]:
df_private_rooms = df[df['room_type'].str.lower() == 'private room']
display(df_private_rooms.head(3))
nb_private_rooms = df_private_rooms.shape[0]
print(f'nb_private_rooms={nb_private_rooms}')

Unnamed: 0,listing_id,price,nbhood_full,description,room_type,host_name,last_review
3,5178,79 dollars,"Manhattan, Hell's Kitchen",Large Furnished Room Near B'way,private room,Shunichi,2019-06-24
6,5441,85 dollars,"Manhattan, Hell's Kitchen",Central Manhattan/near Broadway,Private room,Kate,2019-06-23
7,5803,89 dollars,"Brooklyn, South Slope","Lovely Room 1, Garden, Best Area, Legal rental",Private room,Laurie,2019-06-24


nb_private_rooms=11356


### Requirement 3

What is the average listing price? Round to the nearest two decimal places and save into a variable.

In [14]:
df['price'].head()

0    225 dollars
1     89 dollars
2    200 dollars
3     79 dollars
4    150 dollars
Name: price, dtype: object

In [15]:
# if the price column's data type is object (i.e., string), convert the values to integers
if df['price'].dtype == object:
    df['price'] = df['price'].str.replace(' dollars', '').astype(np.int64)
    
display(df.head(3))
print(f"The price column's data type is {str(df['price'].dtype)}.")

Unnamed: 0,listing_id,price,nbhood_full,description,room_type,host_name,last_review
0,2595,225,"Manhattan, Midtown",Skylit Midtown Castle,Entire home/apt,Jennifer,2019-05-21
1,3831,89,"Brooklyn, Clinton Hill",Cozy Entire Floor of Brownstone,Entire home/apt,LisaRoxanne,2019-07-05
2,5099,200,"Manhattan, Murray Hill",Large Cozy 1 BR Apartment In Midtown East,Entire home/apt,Chris,2019-06-22


The price column's data type is int64.


In [16]:
# calculate the average price
avg_price = df['price'].mean().round(2)
print(f"The average listing price is {avg_price}.")

The average listing price is 141.78.


### Requirement 4

Combine the new variables into one DataFrame called `review_dates` with four columns in the following order: `first_reviewed`, `last_reviewed`, `nb_private_rooms`, and `avg_price`. The DataFrame should only contain one row of values.

In [17]:
review_dates = pd.DataFrame({
    'first_reviewed': [first_reviewed],
    'last_reviewed': [last_reviewed],
    'nb_private_rooms': [nb_private_rooms],
    'avg_price': [avg_price]
})

review_dates

Unnamed: 0,first_reviewed,last_reviewed,nb_private_rooms,avg_price
0,2019-01-01,2019-07-09,11356,141.78
