# Airbnb NYC Analysis

This notebook analyzes Airbnb listings data from the provided Excel file (`Airbnb_Open_Data.xlsx`).

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Display options
pd.set_option('display.max_columns', None)
sns.set_style("whitegrid")

In [None]:
# Load Excel data
file_path = "/content/Airbnb_Open_Data.xlsx"
df = pd.read_excel(file_path, sheet_name="in")

# Preview data
df.head()

In [None]:
# Dataset information
df.info()
df.describe(include='all').T.head(20)

In [None]:
# Convert date columns
if 'last review' in df.columns:
    df['last review'] = pd.to_datetime(df['last review'], errors='coerce')

# Fill missing values in reviews with 0
df['number of reviews'] = df['number of reviews'].fillna(0)
df['reviews per month'] = df['reviews per month'].fillna(0)

# Drop duplicates if any
df = df.drop_duplicates()

df.shape

In [None]:
plt.figure(figsize=(10,5))
sns.histplot(df['price'], bins=100, kde=False)
plt.xlim(0,1000)
plt.title("Price Distribution (limited to 1000 for visibility)")
plt.xlabel("Price")
plt.ylabel("Count")
plt.show()

In [None]:
plt.figure(figsize=(7,5))
sns.countplot(data=df, x='room type', order=df['room type'].value_counts().index)
plt.title("Room Type Distribution")
plt.xlabel("Room Type")
plt.ylabel("Count")
plt.show()

In [None]:
plt.figure(figsize=(7,5))
sns.countplot(data=df, x='neighbourhood group', order=df['neighbourhood group'].value_counts().index)
plt.title("Neighbourhood Group Distribution")
plt.xlabel("Neighbourhood Group")
plt.ylabel("Count")
plt.show()

In [None]:
plt.figure(figsize=(10,5))
sns.histplot(df['availability 365'], bins=50, kde=False)
plt.title("Availability (days per year) Distribution")
plt.xlabel("Days available")
plt.ylabel("Count")
plt.show()

In [None]:
plt.figure(figsize=(12,6))
df['last review'].dropna().dt.year.value_counts().sort_index().plot(kind='bar')
plt.title("Number of Reviews by Year")
plt.xlabel("Year")
plt.ylabel("Count of Reviews")
plt.show()

In [None]:
plt.figure(figsize=(10,8))
sns.heatmap(df[['price','minimum nights','number of reviews','reviews per month','calculated host listings count','availability 365']].corr(), annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()

# Conclusion and Insights

This study analyzed Airbnb’s New York City open dataset to uncover market dynamics, pricing strategies, availability patterns, and customer satisfaction factors.  

### Key Findings
- **Property Types & Market Diversity (Q1, Q2, Q5):**  
  Entire homes/apartments dominate, and Manhattan has the highest listing concentration. A few hosts control a large number of listings, showing professionalization of the platform.  

- **Pricing Strategies (Q3, Q4, Q7):**  
  Manhattan commands the highest average prices, followed by Brooklyn. Construction year showed little correlation with price, while service fees scaled moderately with listing price.  

- **Customer Satisfaction (Q6, Q8):**  
  Hosts with verified identities tend to receive slightly higher review scores, but statistical significance is modest. Review scores vary somewhat by neighborhood and room type, with entire homes in central areas rated more consistently.  

- **Availability Patterns (Q9):**  
  Hosts with many listings often maintain higher annual availability, suggesting they manage properties professionally rather than casually.  

### Recommendations
- **For Hosts:** Verified profiles and competitive service fees can boost guest trust and satisfaction.  
- **For Guests:** Higher prices do not always mean higher review scores — focusing on verified hosts may provide better experiences.  
- **For Policymakers:** Concentration of listings among “superhosts” highlights the need to distinguish between casual and professional operators for regulation.  

---

This analysis provides actionable insights into Airbnb’s NYC market, bridging technical data exploration with strategic understanding.
