# Exploratory Data Analysis on Airbnb Listings
#### Objective: Use a dataset like Airbnb NYC listings: https://www.kaggle.com/datasets/dgomonov/new-york-city-airbnb-open-data to extract insights, clean messy data, and uncover pricing dynamics.


In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
df = pd.read_csv("AB_NYC_2019.csv")
df

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.94190,Private room,150,3,0,,,1,365
3,3831,Cozy Entire Floor of Brownstone,4869,LisaRoxanne,Brooklyn,Clinton Hill,40.68514,-73.95976,Entire home/apt,89,1,270,2019-07-05,4.64,1,194
4,5022,Entire Apt: Spacious Studio/Loft by central park,7192,Laura,Manhattan,East Harlem,40.79851,-73.94399,Entire home/apt,80,10,9,2018-11-19,0.10,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48890,36484665,Charming one bedroom - newly renovated rowhouse,8232441,Sabrina,Brooklyn,Bedford-Stuyvesant,40.67853,-73.94995,Private room,70,2,0,,,2,9
48891,36485057,Affordable room in Bushwick/East Williamsburg,6570630,Marisol,Brooklyn,Bushwick,40.70184,-73.93317,Private room,40,4,0,,,2,36
48892,36485431,Sunny Studio at Historical Neighborhood,23492952,Ilgar & Aysel,Manhattan,Harlem,40.81475,-73.94867,Entire home/apt,115,10,0,,,1,27
48893,36485609,43rd St. Time Square-cozy single bed,30985759,Taz,Manhattan,Hell's Kitchen,40.75751,-73.99112,Shared room,55,1,0,,,6,2


In [4]:
df.isnull().sum()

## 💡 Tasks:


In [6]:

# Clean price fields ($1,200 → 1200)
# The price columns values were cleaned at first sight, no symbols such as "$" was present


# Analyze:


### Top 10 neighborhoods by revenue


In [44]:
dx = df[["neighbourhood", "price"]].groupby("neighbourhood")[["price"]].sum().sort_values(by= "price", ascending= False)
dx.head(10)

Unnamed: 0_level_0,price
neighbourhood,Unnamed: 1_level_1
Williamsburg,442227
Bedford-Stuyvesant,332817
Hell's Kitchen,284545
Upper West Side,276560
Midtown,263837
East Village,260242
Harlem,258195
Upper East Side,235061
Chelsea,183424
Bushwick,165321


## Average price per room type


In [48]:
dx = df[["room_type", "price"]].groupby("room_type")[["price"]].mean().sort_values(by= "price", ascending= False)
dx.head(10)

Unnamed: 0_level_0,price
room_type,Unnamed: 1_level_1
Entire home/apt,196.315929
Private room,83.985272
Shared room,63.213948


## Availability vs. price correlation


In [50]:
df[['availability_365', "price"]].corr()

Unnamed: 0,availability_365,price
availability_365,1.0,0.078276
price,0.078276,1.0


## Hosts with >10 listings (potential businesses)


In [68]:
dz = df[["calculated_host_listings_count", "host_name", "host_id"]]['host_id'].value_counts()
listing_count_greater_than_10 = dz[dz > 10]
listing_count_greater_than_10

# 

host_id
219517861    207
61391963      79
16098958      61
137358866     51
7503643       49
            ... 
14898658      11
10457196      11
310670        11
5144567       11
4291007       11
Name: count, Length: 65, dtype: int64

## Detect outliers (e.g., prices > $1000/night)


In [70]:
dx = df[["host_name", 'neighbourhood', "room_type", "minimum_nights", "price"]]
dx[dx['price'] > 1000]

Unnamed: 0,host_name,neighbourhood,room_type,minimum_nights,price
496,Henry,Upper West Side,Entire home/apt,30,2000
762,West Village,West Village,Entire home/apt,5,1300
1480,Henry,Upper West Side,Entire home/apt,30,2000
2018,Martin,East Village,Entire home/apt,30,2500
2236,Loretta,Carroll Gardens,Entire home/apt,1,1395
...,...,...,...,...,...
45185,Cheryl,Crown Heights,Entire home/apt,1,2500
45666,Sandra,East Flatbush,Private room,1,7500
45967,Matt,Greenwich Village,Entire home/apt,31,1099
46533,Viberlyn,Chelsea,Entire home/apt,30,2995
