## Market Clustering Assignment

In this assignment, you'll explore how clustering can help improve our car pricing models. Building on our previous work with individual location models, you'll investigate whether grouping similar markets together provides a better balance of accuracy and coverage.

### Assignment Goals
1. Compare different approaches to market clustering
2. Evaluate the tradeoff between model granularity and accuracy
3. Make a data-driven recommendation for clustering strategy

### Tasks
1. Try three different cluster sizes (e.g., 5, 8, 12 clusters)
2. For each clustering:
   - Build pricing models for each cluster
   - Compare accuracy to national model
   - Analyze coverage vs individual market approach
3. Make a recommendation supported by:
   - Quantitative metrics (accuracy, coverage)
   - Visualizations showing cluster characteristics
   - Clear explanation of tradeoffs

### Tips
- Start by exploring what makes markets "similar"
- Consider different features for clustering
- Think about how to handle categorical variables
- Pay attention to missing data

---

### Assignment Boilerplate

#### Purpose of Assignments
- Develop practical implementation skills
- Build intuition through experimentation
- Create portfolio of working examples
- Practice real-world data workflows

#### Using This Notebook
- Start by exploring the data
- Build incrementally
- Test each step
- Document your process

#### Using AI Assistants
AI coding assistants (ChatGPT, Claude, GitHub Copilot) are allowed for:
- ✅ Understanding concepts
- ✅ Debugging errors
- ✅ Exploring approaches
- ❌ Complete solutions

Document any AI assistance with links to chats.

---

In [None]:
# Essential imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn_extra.cluster import KMedoids
import gower

# Helper functions
def load_data():
    """Load market and listing data"""
    loc_data = pd.read_csv("../data/location_summary_data.csv")
    listing_data = pd.read_csv('../data/processed_listing_pages.csv')
    return loc_data, listing_data

# Load data
loc_data, listing_data = load_data()

# TODO: Explore market characteristics
# What makes markets similar?

# TODO: Prepare data for clustering
# Which features will you use?

# TODO: Implement clustering
# Try different numbers of clusters

# TODO: Evaluate results
# Compare accuracy and coverage

# TODO: Visualize results
# Support your recommendation