![NYC Skyline](nyc.jpg)

Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.

Recall that **CSV**, **TSV**, and **Excel** files are three common formats for storing data. 
Three files containing data on 2019 Airbnb listings are available to you:

**data/airbnb_price.csv**
This is a CSV file containing data on Airbnb listing prices and locations.
- **`listing_id`**: unique identifier of listing
- **`price`**: nightly listing price in USD
- **`nbhood_full`**: name of borough and neighborhood where listing is located

**data/airbnb_room_type.xlsx**
This is an Excel file containing data on Airbnb listing descriptions and room types.
- **`listing_id`**: unique identifier of listing
- **`description`**: listing description
- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments

**data/airbnb_last_review.tsv**
This is a TSV file containing data on Airbnb host names and review dates.
- **`listing_id`**: unique identifier of listing
- **`host_name`**: name of listing host
- **`last_review`**: date when the listing was last reviewed

As a consultant working for a real estate start-up, you have collected Airbnb listing data from various sources to investigate the short-term rental market in New York. You'll analyze this data to provide insights on private rooms to the real estate company.

There are three files in the data folder: airbnb_price.csv, airbnb_room_type.xlsx, airbnb_last_review.tsv.

1. What are the dates of the earliest and most recent reviews? Store these values as two separate variables with your preferred names.

2. How many of the listings are private rooms? Save this into any variable.

3. What is the average listing price? Round to the nearest two decimal places and save into a variable.

4. Combine the new variables into one DataFrame called review_dates with four columns in the following order: first_reviewed, last_reviewed, nb_private_rooms, and avg_price. The DataFrame should only contain one row of values.
Submissions and help


In [2]:
# importing the packages
import pandas as pd
import numpy as np

In [3]:
# importing the dataframes
airbnb_price = pd.read_csv("data/airbnb_price.csv")
airbnb_room_type = pd.read_excel("data/airbnb_room_type.xlsx")
airbnb_last_review = pd.read_csv("data/airbnb_last_review.tsv", delimiter='\t')

In [43]:
# checking the dataframes
airbnb_price.head(n=5)

Unnamed: 0,listing_id,price,nbhood_full
0,2595,225 dollars,"Manhattan, Midtown"
1,3831,89 dollars,"Brooklyn, Clinton Hill"
2,5099,200 dollars,"Manhattan, Murray Hill"
3,5178,79 dollars,"Manhattan, Hell's Kitchen"
4,5238,150 dollars,"Manhattan, Chinatown"


In [44]:
airbnb_room_type.sample(n=5)

Unnamed: 0,listing_id,description,room_type
18937,30393853,Spacious 1 Bedroom in Historic Brownstone,Entire home/apt
5667,10034187,Unique Studio Loft in Brooklyn,Private room
7321,14151551,Great location 15minutes to South central park...,PRIVATE ROOM
4516,7461910,Brooklyn Brownstone In Historic Bed Stuy - LEGAL,entire home/apt
5448,9594995,Beautiful private 1BR apt in Harlem,ENTIRE HOME/APT


In [45]:
airbnb_last_review.sample(n=5)

Unnamed: 0,listing_id,host_name,last_review
12056,21800820,Sydney,April 30 2019
1980,1954450,Erika,June 18 2019
20365,31647516,Ryan,May 31 2019
4618,7712750,Galina,June 12 2019
1943,1906993,Jonathan,June 23 2019


1. What are the dates of the earliest and most recent reviews? Store these values as two separate variables with your preferred names.

In [46]:
# let's combine the dataframes
airbnb_listings = pd.merge(airbnb_price, airbnb_room_type, on='listing_id')
airbnb_listings = pd.merge(airbnb_listings, airbnb_last_review, on='listing_id')

airbnb_listings.sample(n=5)

Unnamed: 0,listing_id,price,nbhood_full,description,room_type,host_name,last_review
6940,13459691,90 dollars,"Brooklyn, Williamsburg",Private Bedroom in Williamsburg!,Private room,Andi,March 24 2019
25127,36078121,164 dollars,"Manhattan, Lower East Side",Large Sunny Apartment-By Subway-Lower East Side,entire home/apt,Kevin,July 06 2019
22235,33497642,120 dollars,"Manhattan, Upper East Side","Walk distance to Central Park, 10min Time Square",Private room,Raquel,July 04 2019
267,103806,249 dollars,"Manhattan, East Village",BOHEMIAN EAST VILLAGE 2 BED HAVEN,Entire home/apt,Jason,May 27 2019
10097,19190579,118 dollars,"Manhattan, Upper East Side","Charming, Sleek UES Studio",Entire home/apt,Deaton,July 06 2019


In [47]:
# let's get the last_review column
airbnb_listings['last_review_date'] = pd.to_datetime(airbnb_listings['last_review'], format='%B %d %Y')

earliest_date = airbnb_listings.last_review_date.min()
recent_date = airbnb_listings.last_review_date.max()

In [48]:
# returning the earliest_date
earliest_date

Timestamp('2019-01-01 00:00:00')

In [49]:
recent_date

Timestamp('2019-07-09 00:00:00')

2. How many of the listings are private rooms? Save this into any variable.

In [54]:
# let's convert the room type to lowercase
airbnb_listings.room_type = airbnb_listings.room_type.str.lower()
number_of_pivate_room = airbnb_listings.loc[airbnb_listings.room_type == 'private room'].shape[0]
number_of_pivate_room

11356

3. What is the average listing price? Round to the nearest two decimal places and save into a variable.

In [51]:
airbnb_listings

Unnamed: 0,listing_id,price,nbhood_full,description,room_type,host_name,last_review,last_review_date
0,2595,225 dollars,"Manhattan, Midtown",Skylit Midtown Castle,entire home/apt,Jennifer,May 21 2019,2019-05-21
1,3831,89 dollars,"Brooklyn, Clinton Hill",Cozy Entire Floor of Brownstone,entire home/apt,LisaRoxanne,July 05 2019,2019-07-05
2,5099,200 dollars,"Manhattan, Murray Hill",Large Cozy 1 BR Apartment In Midtown East,entire home/apt,Chris,June 22 2019,2019-06-22
3,5178,79 dollars,"Manhattan, Hell's Kitchen",Large Furnished Room Near B'way,private room,Shunichi,June 24 2019,2019-06-24
4,5238,150 dollars,"Manhattan, Chinatown",Cute & Cozy Lower East Side 1 bdrm,entire home/apt,Ben,June 09 2019,2019-06-09
...,...,...,...,...,...,...,...,...
25204,36425863,129 dollars,"Manhattan, Upper East Side",Lovely Privet Bedroom with Privet Restroom,private room,Rusaa,July 07 2019,2019-07-07
25205,36427429,45 dollars,"Queens, Flushing",No.2 with queen size bed,private room,H Ai,July 07 2019,2019-07-07
25206,36438336,235 dollars,"Staten Island, Great Kills",Seas The Moment,private room,Ben,July 07 2019,2019-07-07
25207,36442252,100 dollars,"Bronx, Mott Haven",1B-1B apartment near by Metro,entire home/apt,Blaine,July 07 2019,2019-07-07


In [52]:
# let's strip the dollar and get the mean of the price column
airbnb_listings["price"] = airbnb_listings["price"].astype(str)
airbnb_listings["clean_price"] = (
    airbnb_listings["price"].str.replace("dollars", "").astype(float)
)

airbnb_listings

Unnamed: 0,listing_id,price,nbhood_full,description,room_type,host_name,last_review,last_review_date,clean_price
0,2595,225 dollars,"Manhattan, Midtown",Skylit Midtown Castle,entire home/apt,Jennifer,May 21 2019,2019-05-21,225.0
1,3831,89 dollars,"Brooklyn, Clinton Hill",Cozy Entire Floor of Brownstone,entire home/apt,LisaRoxanne,July 05 2019,2019-07-05,89.0
2,5099,200 dollars,"Manhattan, Murray Hill",Large Cozy 1 BR Apartment In Midtown East,entire home/apt,Chris,June 22 2019,2019-06-22,200.0
3,5178,79 dollars,"Manhattan, Hell's Kitchen",Large Furnished Room Near B'way,private room,Shunichi,June 24 2019,2019-06-24,79.0
4,5238,150 dollars,"Manhattan, Chinatown",Cute & Cozy Lower East Side 1 bdrm,entire home/apt,Ben,June 09 2019,2019-06-09,150.0
...,...,...,...,...,...,...,...,...,...
25204,36425863,129 dollars,"Manhattan, Upper East Side",Lovely Privet Bedroom with Privet Restroom,private room,Rusaa,July 07 2019,2019-07-07,129.0
25205,36427429,45 dollars,"Queens, Flushing",No.2 with queen size bed,private room,H Ai,July 07 2019,2019-07-07,45.0
25206,36438336,235 dollars,"Staten Island, Great Kills",Seas The Moment,private room,Ben,July 07 2019,2019-07-07,235.0
25207,36442252,100 dollars,"Bronx, Mott Haven",1B-1B apartment near by Metro,entire home/apt,Blaine,July 07 2019,2019-07-07,100.0


In [53]:
avg_listing_price = airbnb_listings.clean_price.mean()
avg_listing_price

141.7779364512674

4. Combine the new variables into one DataFrame called review_dates with four columns in the following order: first_reviewed, last_reviewed, nb_private_rooms, and avg_price. The DataFrame should only contain one row of values.

In [55]:
review_dates = pd.DataFrame(
    {
        "first_reviewed": [earliest_date],
        "last_reviewed": [recent_date],
        "nb_private_rooms": [number_of_pivate_room],
        "avg_private": [avg_listing_price]
    }
)

review_dates

Unnamed: 0,first_reviewed,last_reviewed,nb_private_rooms,avg_private
0,2019-01-01,2019-07-09,11356,141.777936
