# NYC Airbnb Dataset - Exploratory Data Analysis (EDA)
In this notebook, we will perform initial data loading and basic preprocessing on the NYC Airbnb dataset. We will:
1. Load the dataset using Weights & Biases (wandb) artifact.
2. Clean the dataset by removing outliers.
3. Convert relevant columns to appropriate data types for analysis.

In [1]:
import wandb
import pandas as pd

  import pkg_resources


## Initialize W&B Run
We initialize a Weights & Biases (wandb) run to track this EDA experiment.


In [2]:
run = wandb.init(project="nyc_airbnb", group="eda", save_code=True) 

wandb: Currently logged in as: schuyler-helder (schuyler-helder-s). Use `wandb login --relogin` to force relogin


## Load Dataset
We use the latest version of the `sample.csv` artifact stored in W&B.


In [3]:
local_path = wandb.use_artifact("sample.csv:latest").file() 
df = pd.read_csv(local_path)

## Data Cleaning: Remove Outliers
We filter out listings with extreme prices that are likely to be errors or not representative of the typical Airbnb listing.


In [4]:
# Drop outliers 
min_price = 10 
max_price = 350 
idx = df['price'].between(min_price, max_price) 
df = df[idx].copy()
# Convert last_review to datetime 
df['last_review'] = pd.to_datetime(df['last_review'])

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 19001 entries, 0 to 19999
Data columns (total 16 columns):
 #   Column                          Non-Null Count  Dtype         
---  ------                          --------------  -----         
 0   id                              19001 non-null  int64         
 1   name                            18994 non-null  object        
 2   host_id                         19001 non-null  int64         
 3   host_name                       18993 non-null  object        
 4   neighbourhood_group             19001 non-null  object        
 5   neighbourhood                   19001 non-null  object        
 6   latitude                        19001 non-null  float64       
 7   longitude                       19001 non-null  float64       
 8   room_type                       19001 non-null  object        
 9   price                           19001 non-null  int64         
 10  minimum_nights                  19001 non-null  int64         
 11  number_

## Finish W&B Run
We end the wandb run to log all changes and close the experiment.


In [6]:
run.finish()