UserInput fetches the clustered dataset that was created in KMeans file. The function uses the dataset to search for avaliable holliday trips after being provided with search parameters

Check for Kmeans model pararmeters

In [18]:
def recommend_airbnbs(filepath="../ML-exam/data/clustered_airbnb.csv"):
    import pandas as pd
    from sklearn.preprocessing import MinMaxScaler

    # ## Load and Preview Data
    df = pd.read_csv(filepath)

    # --- Available cities ---
    cities = sorted(df['City'].unique())
    print("Available cities:")
    for c in cities:
        print("-", c)

    # ## User Input Section
    print("\n--- Enter your preferences ---")
    try:
        user_budget = float(input("Total budget (€): "))
        duration_input = input("Number of nights (leave blank to maximize duration): ").strip()
        min_bedrooms = int(input("Minimum number of bedrooms: "))
        max_city_dist = float(input("Max distance to city center (e.g. 3.0): "))
        max_metro_dist = float(input("Max distance to metro (e.g. 1.0): "))
        weekend = input("Is your stay during a weekend? (yes/no): ").lower() == "yes"
        city_input = input("Pick a city (leave blank to search all): ").strip()
    except ValueError:
        print("Invalid input. Please try again.")
        return

    # ## Determine Recommendation Mode
    # If user provides number of nights → value mode, otherwise duration mode
    if duration_input == "":
        mode = "duration"
        user_duration = None
    else:
        mode = "value"
        try:
            user_duration = int(duration_input)
        except ValueError:
            print("Invalid number of nights. Please enter an integer.")
            return

    # ## Filter Listings
    # Start with a filtered copy of the dataset
    filtered_df = df.copy()
    if weekend:
        filtered_df = filtered_df[filtered_df['Is_weekend_bool'] == 1]
    if city_input:
        filtered_df = filtered_df[filtered_df['City'].str.lower() == city_input.lower()]

    # Filter by room size and distances
    filtered_df = filtered_df[
        (filtered_df['bedrooms'] >= min_bedrooms) &
        (filtered_df['dist'] <= max_city_dist) &
        (filtered_df['metro_dist'] <= max_metro_dist)
    ].copy()

    # Stop early if no listings match
    if filtered_df.empty:
        print("\nNo listings match your criteria.")
        return

    # ## Recommendation Logic
    # MODE 1: DURATION — where can I stay the longest for my money?
    if mode == "duration":
        # Calculate how many nights the user can afford for each listing
        filtered_df['max_nights'] = (user_budget / filtered_df['realSum']).apply(int)
        # Sort by longest stay
        recommended = filtered_df.sort_values(by='max_nights', ascending=False)
        # Columns to display
        display_cols = ['City', 'realSum', 'bedrooms', 'dist', 'metro_dist',
                        'guest_satisfaction_overall', 'max_nights']

    # MODE 2: VALUE — what gives me the best combination of quality and location?
    else:
        # Filter out listings that exceed total cost (price per night × nights)
        filtered_df = filtered_df[filtered_df['realSum'] * user_duration <= user_budget]

        if filtered_df.empty:
            print("\nNo listings within your budget for the selected duration.")
            return

        # --- Normalize components and compute value score ---
        scaler = MinMaxScaler()
        to_normalize = filtered_df[['guest_satisfaction_overall', 'realSum', 'dist', 'metro_dist']].copy()
        scaled = pd.DataFrame(scaler.fit_transform(to_normalize), columns=[
            'norm_satisfaction', 'norm_price', 'norm_dist', 'norm_metro'
        ])

        # Merge normalized columns back into the filtered DataFrame
        filtered_df = pd.concat([filtered_df.reset_index(drop=True), scaled], axis=1)

        # Invert "bad" features so that low price and short distances count positively
        filtered_df['inv_price'] = 1 - filtered_df['norm_price']
        filtered_df['inv_dist'] = 1 - filtered_df['norm_dist']
        filtered_df['inv_metro'] = 1 - filtered_df['norm_metro']

        # --- Compute final composite value score ---
        # Weighted sum: adjust weights to prioritize certain features
        filtered_df['value_score'] = (
            0.4 * filtered_df['norm_satisfaction'] +
            0.3 * filtered_df['inv_price'] +
            0.2 * filtered_df['inv_dist'] +
            0.1 * filtered_df['inv_metro']
        ) * 100  # Scale to 0–100 for readability

        # Sort by best value
        recommended = filtered_df.sort_values(by='value_score', ascending=False)
        # Columns to show the user
        display_cols = ['City', 'realSum', 'bedrooms', 'dist', 'metro_dist',
                        'guest_satisfaction_overall', 'value_score']

    # ## Output Results
    print(f"\nTop 10 Recommended Listings (Mode: {mode}):\n")
    print(recommended[display_cols].head(10).to_string(index=False))

In [16]:
recommend_airbnbs()

Available cities:
- Amsterdam
- Athens
- Barcelona
- Berlin
- Budapest
- Lisbon
- London
- Paris
- Rome
- Vienna

--- Enter your preferences ---


Total budget (€):  500
Number of nights (leave blank to maximize duration):  5
Minimum number of bedrooms:  2
Max distance to city center (e.g. 3.0):  6
Max distance to metro (e.g. 1.0):  3
Is your stay during a weekend? (yes/no):  yes
Pick a city (leave blank to search all):  



Top 10 Recommended Listings (Mode: value):

    City  realSum  bedrooms  dist  metro_dist  guest_satisfaction_overall  value_score
  Athens    60.46         3   1.7         1.1                          94    84.313782
  Athens    57.88         2   2.7         0.3                          89    82.962574
  Athens    72.41         2   1.7         0.1                          97    81.747776
  Athens    67.26         2   2.3         0.5                          96    80.937339
  Athens    69.60         2   2.6         0.6                         100    79.912488
  Athens    69.60         2   1.8         0.4                          92    79.590810
  Athens    74.05         3   2.0         0.1                          96    79.071014
  Athens    69.60         2   2.7         0.8                         100    78.779621
  Athens    76.63         2   1.8         0.4                          99    78.436208
Budapest    54.28         3   2.5         0.8                          77    78.15035

### How the Value Score Is Calculated

When the user provides a fixed number of nights (i.e., not leaving the "number of nights" field blank), the system calculates a **value score** for each listing to rank the most cost-effective options.

The formula used is:

```python
value_score = guest_satisfaction_overall / price_per_day
```

Where:

* `guest_satisfaction_overall`: A rating from 0 to 100 indicating how satisfied previous guests were.
* `price_per_day`: Calculated as:

  ```python
  price_per_day = realSum / number_of_nights
  ```

  `realSum` is the total cost of the listing (for the entire stay).

### Interpretation

* A **higher value score** indicates **better guest satisfaction per euro spent per day**.
* This helps highlight listings that are not just cheap, but also highly rated — ensuring you get the **best bang for your buck**.