# Predicting Airbnb Prices for Munich

The goal of our data mining project is to predict prices for new Airbnb listings in Munich. To achieve this, we will train a regression model on existing Airbnb data from www.insideairbnb.com.

## Table of Contents
##### [1 Preprocessing](#preprocessing)
##### [2 Data Mining (Support Vector Machine)](#data_mining)
##### [3 Interpretation and Evaluation](#interpretation_evaluation)

<a id='preprocessing'></a>
## 1 Preprocessing

In [2]:
%run modules/preprocessing.py

df = load_and_preprocess_dataset()

# Split the features and labels 
X = df.loc[:, df.columns.intersection(['property_type', 'room_type', 'accommodates', 'distance_centre'])]
X_norm = pd.DataFrame(preprocessing.MinMaxScaler().fit_transform(X), columns=X.columns)
Y = df['max_price']

# Instance Selection: Randomly sample a training dataset
from sklearn.model_selection import train_test_split
xTrain, xTest, yTrain, yTest = train_test_split(X_norm, Y, test_size = 0.2, random_state = 0)

2019-11-15 23:42:12 : Dataset loaded successfully.
2019-11-15 23:42:32 : Dataset preprocessed successfully.


<a id='data_mining'></a>
## 2 Data Mining (Support Vector Machine)

In [4]:
%run modules/evaluation.py

import itertools

# Generate all feature combinations
feature_combinations = []
for index, column in enumerate(X.columns, start = 1):
    if index > 0:
        combinations = itertools.combinations(X.columns, index)
        for combination in combinations:
            tmp = []
            for i in range(0, index):
                tmp.append(str(combination[i]))
            feature_combinations.append(tmp)

# Run SVM for each combination
for feature_combination in feature_combinations:
    x_combination = X[feature_combination]
    xTrain, xTest, yTrain, yTest = train_test_split(x_combination, Y, test_size = 0.2, random_state = 0)
    
    # Train support vector regressor
    from sklearn.svm import SVR
    regressor = SVR(kernel='rbf')
    regressor.fit(xTrain, yTrain)

    # Predict price for the test dataset
    count = -1
    test_price_prediction = [0] * len(xTest)
    for index, row in xTest.iterrows():
        count += 1
        tmp = []
        for i in range(0, len(row)):
            tmp.append(row[i])
        test_price_prediction[count] = regressor.predict([tmp])[0]  

    # Print results
    print("Results for combination: ", feature_combination)
    calculate_r_squared(yTest, test_price_prediction)
    calculate_root_mean_squared_error(yTest, test_price_prediction)

Results for combination:  ['property_type']
2019-11-15 23:43:23 : R_Squared =  -0.09915255159697134
2019-11-15 23:43:23 : Root Mean Squared Error =  91.63297561096516
Results for combination:  ['room_type']
2019-11-15 23:43:29 : R_Squared =  -0.01715802395489341
2019-11-15 23:43:29 : Root Mean Squared Error =  88.14892456830171
Results for combination:  ['accommodates']
2019-11-15 23:43:35 : R_Squared =  0.3100870902811871
2019-11-15 23:43:35 : Root Mean Squared Error =  72.59721145975206
Results for combination:  ['distance_centre']
2019-11-15 23:43:40 : R_Squared =  -0.07049660426310678
2019-11-15 23:43:40 : Root Mean Squared Error =  90.4306079396739
Results for combination:  ['property_type', 'room_type']
2019-11-15 23:43:46 : R_Squared =  -0.028255197011331612
2019-11-15 23:43:46 : Root Mean Squared Error =  88.62847162852276
Results for combination:  ['property_type', 'accommodates']
2019-11-15 23:43:53 : R_Squared =  0.25520381110314816
2019-11-15 23:43:53 : Root Mean Squared Er

In [3]:
%run modules/evaluation.py

# Train support vector regressor
from sklearn.svm import SVR
regressor = SVR(kernel='rbf')
regressor.fit(xTrain, yTrain)

# Predict price for the training dataset
count = -1
train_price_prediction = [0] * len(xTrain)
for index, row in xTrain.iterrows():
    count += 1
    train_price_prediction[count] = regressor.predict([[row[0], row[1], row[2], row[3]]])[0]
    
# Predict price for the test dataset
count = -1
test_price_prediction = [0] * len(xTest)
for index, row in xTest.iterrows():
    count += 1
    test_price_prediction[count] = regressor.predict([[row[0], row[1], row[2], row[3]]])[0]  

# Evaluate predictions
print("Training dataset:")
calculate_r_squared(yTrain, train_price_prediction)
calculate_root_mean_squared_error(yTrain, train_price_prediction)

print("Test dataset:")
calculate_r_squared(yTest, test_price_prediction)
calculate_root_mean_squared_error(yTest, test_price_prediction)

Training dataset:
2019-11-15 22:20:12 : R_Squared =  0.0877828443714822
2019-11-15 22:20:12 : Root Mean Squared Error =  84.42029689385168
Test dataset:
2019-11-15 22:20:12 : R_Squared =  0.10027971262409352
2019-11-15 22:20:12 : Root Mean Squared Error =  82.9042052091836


<a id='interpretation_evaluation'></a>
## 3 Interpretation and Evaluation