# Short-term Rental Price in NYC EDA

A property management company rents rooms and properties for short periods on various rental platforms. The company needs to estimate the typical price for a given property based on the cost of similar properties. Hence, we will build a model base on historical data. But before initializing the learning task, we will perform an exploratory data analysis to gain a deeper understanding of the data.

In [1]:
import wandb
import pandas as pd
import pandas_profiling

To guarantee reproducibility, we will use W&B data versioning system:

In [2]:
run = wandb.init(
    project="nyc_airbnb", 
    group="eda",
    save_code=True
)

[34m[1mwandb[0m: Currently logged in as: [33mheber[0m (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: wandb version 0.12.11 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


We download the data:

In [3]:
local_path = wandb.use_artifact("sample.csv:latest").file()
df = pd.read_csv(filepath_or_buffer=local_path)

Then we will use pandas_profiling to automate the data analysis:

In [None]:
profile = pandas_profiling.ProfileReport(df)
profile.to_notebook_iframe()

Summarize dataset:   0%|          | 0/29 [00:00<?, ?it/s]

We encounter the following:

1. There are missing values in a few columns; we will input the missing values within the inference pipeline.
2. The column last_review is a date, but it is in string format; we will transform it to date format.
3. The price column has some outliers, after talking with the stakeholders, we decided to consider from a minimum of 10 to a 
maximum of 350 per night.

In [None]:
# Drop outliers
min_price = 10
max_price = 350
idx = df['price'].between(min_price, max_price)
df = df[idx].copy()
# Convert last_review to datetime
df['last_review'] = pd.to_datetime(df['last_review'])

In [None]:
profile = pandas_profiling.ProfileReport(df)
profile.to_notebook_iframe()

In [None]:
run.finish()