# Airbnb Recommendation System with Clustering

This notebook is part of our AI exam project. It recommends Airbnb listings across European cities based on user preferences and budget.

The system supports two recommendation modes:

- **Duration**: Finds listings where the user can stay the longest within their budget.
- **Value**: Ranks listings based on a composite score using:
  - Guest satisfaction
  - Price per night
  - Distance to city center
  - Distance to metro

We also apply **KMeans clustering** to group listings into 6 behavioral types (e.g., budget shared, high-end central, suburban). Users can optionally filter results by these cluster types for more tailored recommendations.


In [None]:
def recommend_airbnbs(filepath="../ML-exam/data/clustered_airbnb.csv"):
    import pandas as pd
    from sklearn.preprocessing import MinMaxScaler

    # ## Load and Preview Data
    df = pd.read_csv(filepath)

    # --- Available cities ---
    cities = sorted(df['City'].unique())
    print("Available cities:")
    for c in cities:
        print("-", c)

    # ## User Input Section
    print("\n--- Enter your preferences ---")
    try:
        user_budget = float(input("Total budget (€): "))
        duration_input = input("Number of nights (leave blank to maximize duration): ").strip()
        min_bedrooms = int(input("Minimum number of bedrooms: "))
        max_city_dist = float(input("Max distance to city center (e.g. 3.0): "))
        max_metro_dist = float(input("Max distance to metro (e.g. 1.0): "))
        weekend = input("Is your stay during a weekend? (yes/no): ").lower() == "yes"
        city_input = input("Pick a city (leave blank to search all): ").strip()

        # --- Optional Cluster Filter ---
        use_cluster = input("Would you like to filter by listing type (cluster)? (yes/no): ").lower() == "yes"
        if use_cluster:
            print("\nAvailable clusters:")
            print("  0: Large and Expensive")
            print("  1: Budget and Shared")
            print("  2: Compact Private Rooms")
            print("  3: High-End Central")
            print("  4: Suburban Midrange")
            print("  5: Poorly Rated")
            selected_cluster = int(input("Enter cluster number (0–5): "))
    except ValueError:
        print("Invalid input. Please try again.")
        return

    # ## Determine Recommendation Mode
    if duration_input == "":
        mode = "duration"
        user_duration = None
    else:
        mode = "value"
        try:
            user_duration = int(duration_input)
        except ValueError:
            print("Invalid number of nights. Please enter an integer.")
            return

    # ## Filter Listings
    filtered_df = df.copy()

    # Apply basic filters
    if weekend:
        filtered_df = filtered_df[filtered_df['Is_weekend_bool'] == 1]
    if city_input:
        filtered_df = filtered_df[filtered_df['City'].str.lower() == city_input.lower()]

    # Filter by distance and bedrooms
    filtered_df = filtered_df[
        (filtered_df['bedrooms'] >= min_bedrooms) &
        (filtered_df['dist'] <= max_city_dist) &
        (filtered_df['metro_dist'] <= max_metro_dist)
    ]

    # Filter by selected cluster if enabled
    if use_cluster:
        filtered_df = filtered_df[filtered_df['cluster'] == selected_cluster]

    # Stop early if no listings match
    if filtered_df.empty:
        print("\nNo listings match your criteria.")
        return

    # ## Recommendation Logic
    if mode == "duration":
        filtered_df['max_nights'] = (user_budget / filtered_df['realSum']).apply(int)
        recommended = filtered_df.sort_values(by='max_nights', ascending=False)
        display_cols = ['City', 'realSum', 'bedrooms', 'dist', 'metro_dist',
                        'guest_satisfaction_overall', 'max_nights']
    else:
        filtered_df = filtered_df[filtered_df['realSum'] * user_duration <= user_budget]
        if filtered_df.empty:
            print("\nNo listings within your budget for the selected duration.")
            return

        # Normalize relevant features
        scaler = MinMaxScaler()
        features = filtered_df[['guest_satisfaction_overall', 'realSum', 'dist', 'metro_dist']]
        scaled = pd.DataFrame(scaler.fit_transform(features), columns=[
            'norm_satisfaction', 'norm_price', 'norm_dist', 'norm_metro'
        ])
        filtered_df = pd.concat([filtered_df.reset_index(drop=True), scaled], axis=1)

        # Invert price and distance to reward lower values
        filtered_df['inv_price'] = 1 - filtered_df['norm_price']
        filtered_df['inv_dist'] = 1 - filtered_df['norm_dist']
        filtered_df['inv_metro'] = 1 - filtered_df['norm_metro']

        # Weighted composite score
        filtered_df['value_score'] = (
            0.4 * filtered_df['norm_satisfaction'] +
            0.3 * filtered_df['inv_price'] +
            0.2 * filtered_df['inv_dist'] +
            0.1 * filtered_df['inv_metro']
        ) * 100

        recommended = filtered_df.sort_values(by='value_score', ascending=False)
        display_cols = ['City', 'realSum', 'bedrooms', 'dist', 'metro_dist',
                        'guest_satisfaction_overall', 'value_score']

    # ## Output Results
    print(f"\nTop 10 Recommended Listings (Mode: {mode}):\n")
    print(recommended[display_cols].head(10).to_string(index=False))


In [None]:
recommend_airbnbs()

### How the Value Score Is Calculated

When the user provides a fixed number of nights (i.e., not leaving the "number of nights" field blank), the system calculates a **value score** for each listing to rank the most cost-effective options.

The formula used is:

```python
value_score = guest_satisfaction_overall / price_per_day
```

Where:

* `guest_satisfaction_overall`: A rating from 0 to 100 indicating how satisfied previous guests were.
* `price_per_day`: Calculated as:

  ```python
  price_per_day = realSum / number_of_nights
  ```

  `realSum` is the total cost of the listing (for the entire stay).

### Interpretation

* A **higher value score** indicates **better guest satisfaction per euro spent per day**.
* This helps highlight listings that are not just cheap, but also highly rated — ensuring you get the **best bang for your buck**.