## Conclusion & Key Insights

This project explored the New York City Airbnb dataset end-to-end, starting from
raw, inconsistent data and progressing through cleaning, exploratory analysis,
visualization, and clustering. The objective was to transform raw listing data
into meaningful insights that reflect real-world hosting and demand behavior.

---

### Data Preparation & Quality

- Extensive data cleaning was required due to missing values, inconsistent
  formats, and extreme outliers.
- Monetary fields were standardized, invalid values were corrected, and
  redundant or low-information columns were removed.
- Feature engineering introduced meaningful indicators such as occupancy rate
  and long-term stay flags to better capture listing behavior.

---


### Exploratory & Temporal Insights

- Entire homes and private rooms dominate the NYC Airbnb market, while shared
  and hotel rooms represent niche segments.
- Prices vary significantly across boroughs, with Manhattan consistently
  commanding the highest average prices.
- Time-based analysis using review dates revealed clear growth in Airbnb
  activity prior to 2020, followed by a noticeable decline during the COVID-19
  period and partial recovery thereafter.
- Seasonal patterns indicate increased activity during summer months,
  reflecting tourism-driven demand.

---

### Clustering & Segmentation Findings

- KMeans clustering identified distinct listing segments based on pricing,
  availability, demand, and stay duration:
  - **High-demand listings** with strong occupancy and review activity
  - **Underperforming listings** with high availability and low engagement
  - **Budget listings** with balanced demand
  - **Long-term rental listings** characterized by extended minimum stay
    requirements


---

### Explaination of the Segments

## üîç Cluster Interpretation & Business Segments

Based on KMeans clustering of pricing, availability, demand, and stay behavior,
five distinct Airbnb listing segments were identified. Cluster values represent
relative magnitudes derived from log-transformed and standardized features.

---

### üîµ Cluster 0 ‚Äî High Occupancy, Low Engagement
**Key Characteristics**
-  Price: High  
-  Reviews: Very low  
-  Minimum nights: Low  
-  Occupancy rate: **Highest**

**Interpretation**
- Listings are booked frequently but accumulate few reviews
- Likely short stays, repeat guests, or newly listed properties
- Represents steady-demand listings with limited guest feedback

 **Segment Label:** *Frequently booked, low-review-volume listings*

---

###  Cluster 1 ‚Äî Low Occupancy, Moderate Reviews
**Key Characteristics**
-  Price: High  
-  Reviews: Moderate  
-  Occupancy rate: **Very low**

**Interpretation**
- Listings exist but are rarely booked
- Potentially overpriced, poorly located, or less competitive
- May include older listings that have lost market appeal

 **Segment Label:** *Low-demand or underperforming listings*

---

###  Cluster 2 ‚Äî Balanced Listings
**Key Characteristics**
-  Price: **Lowest among clusters**  
-  Reviews: Moderate  
-  Occupancy rate: Moderate

**Interpretation**
- Affordable listings with consistent engagement
- Balanced demand and availability
- Commonly associated with private rooms or budget accommodations

 **Segment Label:** *Budget, well-balanced listings*

---

###  Cluster 3 ‚Äî Long-Term Rentals
**Key Characteristics**
-  Price: High  
-  Reviews: Low  
-  Minimum nights: **Highest**  
-  Occupancy rate: Moderate‚Äìlow

**Interpretation**
- Clearly defined by extended minimum stay requirements
- Fewer reviews due to longer guest stays
- Primarily monthly or extended-stay rental strategies

 **Segment Label:** *Long-term / monthly rentals*

---

###  Cluster 4 ‚Äî High Engagement Listings
**Key Characteristics**
- Price: High  
- Reviews: **Highest among clusters**  
- Occupancy rate: High

**Interpretation**
- Highly popular and frequently reviewed listings
- Strong demand and consistent guest engagement
- Likely well-located, high-quality properties

 **Segment Label:** *High-demand, high-engagement listings*

---

###  Key Takeaway
The clustering results highlight that **price alone does not determine listing
success**. Occupancy and engagement metrics reveal distinct hosting strategies,
ranging from high-demand short stays to long-term rental models.


### Business & Analytical Takeaways

- High price alone does not guarantee strong demand; occupancy and engagement
  metrics provide a more reliable indicator of listing performance.
- Long-term rental strategies form a clearly distinct segment within the
  short-term rental market.
- Density-based methods are valuable for identifying outliers and niche
  behaviors but are less suitable for full market segmentation.
- Scalable clustering techniques such as KMeans are essential when working
  with large, real-world datasets.

---



### Tools & Skills Demonstrated

- **Python**: Data cleaning, feature engineering, visualization, clustering
- **Power BI**: Interactive dashboards and KPI reporting
- **Statistical Thinking**: Handling skewed data, scaling, and model validation
- **Unsupervised Learning**: KMeans

---

### üîπ Final Remarks

This project demonstrates a practical approach to data
analysis and unsupervised learning. By combining robust preprocessing,
thoughtful visualization, and multiple clustering techniques, meaningful
patterns were uncovered in a complex and noisy real-world dataset.

Future work could extend this analysis by incorporating geographic clustering,
predictive modeling, or host-level behavior analysis.
