# Airbnb Price Prediction in Copenhagen: Inference

This notebook demonstrates how to use the previously trained XGBoost model to predict the price of an Airbnb listing in Copenhagen. We'll load the saved model and preprocessing objects, then use them to make a prediction for a single observation.

## Setup

First, let's import the necessary libraries and load our saved objects.

In [None]:
!pip install xgboost shap -q

In [None]:
import pandas as pd
import numpy as np
import joblib
from xgboost import XGBRegressor
from sklearn.preprocessing import StandardScaler, OneHotEncoder
import shap

In [None]:
# Load the saved model and preprocessing objects
model_xgb = joblib.load('model_xgb.joblib')
scaler = joblib.load('scaler.joblib')
ohe = joblib.load('ohe.joblib')

print("Model and preprocessing objects loaded successfully.")

## Create a Sample Observation

Let's create a sample Airbnb listing to predict its price.

In [None]:
# Create a sample observation
sample_listing = pd.DataFrame({
    'neighbourhood_cleansed': ['Indre By'],
    'room_type': ['Entire home/apt'],
    'instant_bookable': [False],
    'accommodates': [2],
    'bedrooms': [1],
    'beds': [2],
    'minimum_nights_avg_ntm': [3]
})

print("Sample listing:")
print(sample_listing)

## Preprocess the Sample Observation

Now, let's preprocess our sample observation using the same steps as in the training process.

In [None]:
# Separate categorical and numerical features
cat_features = ['neighbourhood_cleansed', 'room_type']
num_features = ['instant_bookable', 'accommodates', 'bedrooms', 'beds', 'minimum_nights_avg_ntm']

# One-hot encode categorical features
X_cat = pd.DataFrame(ohe.transform(sample_listing[cat_features]).todense(),
                     columns=ohe.get_feature_names_out(cat_features))

# Scale numerical features
X_num = pd.DataFrame(scaler.transform(sample_listing[num_features]),
                     columns=num_features)

# Combine processed features
X_processed = pd.concat([X_num, X_cat], axis=1)

print("Processed features:")
print(X_processed)

## Make a Prediction

With our preprocessed sample, we can now make a price prediction.

In [None]:
# Make a prediction
predicted_price = model_xgb.predict(X_processed)[0]

print(f"Predicted price: {predicted_price:.2f}")

## Explain the Prediction

Let's use SHAP values to explain this prediction.

In [None]:
# Create a SHAP explainer
explainer = shap.TreeExplainer(model_xgb)

# Calculate SHAP values for our sample
shap_values = explainer.shap_values(X_processed)

# Initialize JavaScript visualization code
shap.initjs()

# Create a force plot
shap.force_plot(explainer.expected_value, shap_values[0,:], X_processed.iloc[0,:])

This force plot shows how each feature contributes to pushing the model output from the base value (the average model output over the training dataset) to the output for this specific prediction. Features pushing the prediction higher are shown in red, those pushing the prediction lower are in blue.

## Interpretation

Let's interpret the results:

1. The predicted price for this Airbnb listing is seen above.
2. The SHAP force plot shows which features had the most significant impact on this prediction.
3. [Interpret the top features from the force plot, e.g.]:
   - The location (Indre By) seems to increase the price.
   - Being an entire home/apartment also positively affects the price.
   - The number of people it accommodates (2) might be pushing the price down slightly.

Remember, this is just one example. The model's predictions and the importance of different features can vary significantly depending on the specific characteristics of each listing.

This notebook demonstrates how to use our trained model for making predictions on new data, as well as how to interpret these predictions using SHAP values. This can be valuable for both hosts trying to price their listings competitively and for guests trying to understand what factors influence the prices they see.