![NYC Skyline](nyc.jpg)

Welcome to New York City, one of the most-visited cities in the world. There are many Airbnb listings in New York City to meet the high demand for temporary lodging for travelers, which can be anywhere between a few nights to many months. In this project, we will take a closer look at the New York Airbnb market by combining data from multiple file types like `.csv`, `.tsv`, and `.xlsx`.

Recall that **CSV**, **TSV**, and **Excel** files are three common formats for storing data. 
Three files containing data on 2019 Airbnb listings are available to you:

**data/airbnb_price.csv**
This is a CSV file containing data on Airbnb listing prices and locations.
- **`listing_id`**: unique identifier of listing
- **`price`**: nightly listing price in USD
- **`nbhood_full`**: name of borough and neighborhood where listing is located

**data/airbnb_room_type.xlsx**
This is an Excel file containing data on Airbnb listing descriptions and room types.
- **`listing_id`**: unique identifier of listing
- **`description`**: listing description
- **`room_type`**: Airbnb has three types of rooms: shared rooms, private rooms, and entire homes/apartments

**data/airbnb_last_review.tsv**
This is a TSV file containing data on Airbnb host names and review dates.
- **`listing_id`**: unique identifier of listing
- **`host_name`**: name of listing host
- **`last_review`**: date when the listing was last reviewed

In [64]:
# Import necessary packages
import pandas as pd
import numpy as np

airbnb_price=pd.read_csv("data/airbnb_price.csv")
airbnb_last_review=pd.read_csv("data/airbnb_last_review.tsv", sep="\t")
airbnb_room_type=pd.read_excel("data/airbnb_room_type.xlsx")

In [65]:
airbnb_last_review.head()

Unnamed: 0,listing_id,host_name,last_review
0,2595,Jennifer,May 21 2019
1,3831,LisaRoxanne,July 05 2019
2,5099,Chris,June 22 2019
3,5178,Shunichi,June 24 2019
4,5238,Ben,June 09 2019


In [66]:
airbnb_last_review["last_review"] = pd.to_datetime(airbnb_last_review["last_review"]).dt.date
earliest_review=airbnb_last_review["last_review"].min()
most_recent_review=airbnb_last_review["last_review"].max()
most_recent_review

datetime.date(2019, 7, 9)

In [67]:
airbnb_room_type.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25209 entries, 0 to 25208
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   listing_id   25209 non-null  int64 
 1   description  25199 non-null  object
 2   room_type    25209 non-null  object
dtypes: int64(1), object(2)
memory usage: 591.0+ KB


In [68]:
airbnb_room_type["room_type"].value_counts()

Entire home/apt    8458
Private room       7241
entire home/apt    2665
private room       2248
ENTIRE HOME/APT    2143
PRIVATE ROOM       1867
Shared room         380
shared room         110
SHARED ROOM          97
Name: room_type, dtype: int64

In [69]:
airbnb_room_type["room_type"]=airbnb_room_type["room_type"].str.lower()
airbnb_room_type["room_type"].value_counts()

entire home/apt    13266
private room       11356
shared room          587
Name: room_type, dtype: int64

In [70]:
private_rooms_count=airbnb_room_type["room_type"].value_counts()[1]

In [71]:
airbnb_price.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25209 entries, 0 to 25208
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   listing_id   25209 non-null  int64 
 1   price        25209 non-null  object
 2   nbhood_full  25209 non-null  object
dtypes: int64(1), object(2)
memory usage: 591.0+ KB


In [72]:
airbnb_price.head()

Unnamed: 0,listing_id,price,nbhood_full
0,2595,225 dollars,"Manhattan, Midtown"
1,3831,89 dollars,"Brooklyn, Clinton Hill"
2,5099,200 dollars,"Manhattan, Murray Hill"
3,5178,79 dollars,"Manhattan, Hell's Kitchen"
4,5238,150 dollars,"Manhattan, Chinatown"


In [73]:
airbnb_price["price"]=airbnb_price["price"].str.replace(r"dollar(s)?","",regex=True).str.strip()
airbnb_price["price"]=airbnb_price["price"].astype("int")
airbnb_price["price"].unique()

array([ 225,   89,  200,   79,  150,  135,   85,  140,  215,   99,  130,
         80,  110,  120,   60,   44,  180,   50,   52,   55,   70,   40,
         68,  151,  228,  144,   69,   49,  250,  275,   51,   65,  105,
        190,   95,  145,  285,   94,  131,   98,  175,  500,  101,  125,
        100,   59,  325,  235,  170,  185,  115,   77,   76,  160,  195,
        156,  219,  165,  196,  350,   90,   75,  299,   83,  123,  265,
        249,  121,   45,   71,  199,   64,  159,  189,  239,  305,  155,
         92,   36,   37,  205,   39,  390,  129,  212,  124,  122,  109,
        575,  169,  179,  349,  139,   67,  211,  290,  395,   97,  259,
        295,  451,  300,  255,   72,   88,   42,  198,   46,   33,   91,
        400,  429,   43,  149,  248,   41,  230,  146,  116,  220,  288,
        438,  279,  137,  226,  154,  700,  850,   54,  495,  760,  153,
         73,  167,   96,   34,   93,  402,  800,  240,  209,  157,   86,
        106,   87,   56,  549,   20,  104,  298,  1

In [74]:
average_price=airbnb_price["price"].mean().round(2)
average_price

141.78

In [75]:
review_dates=pd.DataFrame({
    'first_reviewed':[earliest_review],
    'last_reviewed': [most_recent_review],
    'nb_private_rooms': [private_rooms_count],
    'avg_price': [average_price]
})