<h1> Mini project: content-based collaborative filtering</h1>

Description

__Goal:__

Develop a recommender system model for a real-estate promotional email campaign. The gaol is to improve the performance of a real-estete promotional email campaign, by recommending three properties deemed most similar to user's preference from past searches on a real-estate website. The features used for modeling are price, distance to city, number of bedrooms and batchrooms, land size, building size, and year built.

__Technique:__

User-based collaborative filtering via K-nearest neighbors clustering algorithm

__Dataset:__

Real-estate search history, provided by Domain.com.au.

- Dataset source [link](https://www.kaggle.com/anthonypino/melbourne-housing-market/)

# Packages and data


## Load packages

In [1]:
import os, gc, warnings
import numpy as np
import pandas as pd
warnings.filterwarnings('ignore')

from sklearn.neighbors import NearestNeighbors


## load dataset


In [4]:
# load dataset
root = '/content/drive/MyDrive/Projects/RecommendationSystems/Book1/data/p03/'
df = pd.read_csv(root + 'Melbourne_housing_FULL.csv')
print(df.columns)
display(df.shape, df.head())

Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',
       'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',
       'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',
       'Longtitude', 'Regionname', 'Propertycount'],
      dtype='object')


(34857, 21)

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,68 Studley St,2,h,,SS,Jellis,3/09/2016,2.5,3067.0,...,1.0,1.0,126.0,,,Yarra City Council,-37.8014,144.9958,Northern Metropolitan,4019.0
1,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra City Council,-37.7996,144.9984,Northern Metropolitan,4019.0
2,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra City Council,-37.8079,144.9934,Northern Metropolitan,4019.0
3,Abbotsford,18/659 Victoria St,3,u,,VB,Rounds,4/02/2016,2.5,3067.0,...,2.0,1.0,0.0,,,Yarra City Council,-37.8114,145.0116,Northern Metropolitan,4019.0
4,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra City Council,-37.8093,144.9944,Northern Metropolitan,4019.0


# Preprocessing

- imputation: remove rows with missing values
- independent variables: `['Price', 'Distance', 'Bedroom2', 'Bathroom', 'Landsize', 'BuildingArea', 'YearBuilt']`


In [7]:
X = df[['Price', 'Distance', 'Bedroom2', 'Bathroom', 'Landsize', 'BuildingArea', 'YearBuilt']]
#convert features to numeric for KNN
X = X.apply(pd.to_numeric, errors='coerce')

# remove rows with missing values
X = X.dropna()
print(X.columns, X.shape)

Index(['Price', 'Distance', 'Bedroom2', 'Bathroom', 'Landsize', 'BuildingArea',
       'YearBuilt'],
      dtype='object') (9028, 7)


# Develop model: KNN classifier
Model specs:
- type: KNN classifier
- number of neighbors: 3 (we need to provide 3 recommendations in the promotion email)

In [8]:
model = model = NearestNeighbors(n_neighbors = 3).fit(X)

# Make inference

In [19]:
# generate recommendation for target user (the information for the target is collected based on the search history)
target = [950000, #price
           2, #km distance to city
           2, # no.bedrooms
           2, # no. bathrooms
           220, #land size
           200, #buidling size
           2005]# yr built

# make prediction
print(f"prediction distance to target: {model.kneighbors([target])[0]} | 3 closest properties to user preference, index : {model.kneighbors([target])[1]}")
predictions = {}
for i in range(3):
  predictions[i] = df.iloc[model.kneighbors([target])[1][0][i]]

# show results seperated by line

print("3 recommendation for the promotoional email campaign for the provided user target")
print('\n')
print(f"1:\n{predictions[0]}")
print('\n')
print(f"2:\n{predictions[1]}")
print('\n')
print(f"3:\n{predictions[2]}")

prediction distance to target: [[48.41735226 53.52046338 64.6470417 ]] | 3 closest properties to user preference, index : [[8810 8811 2741]]
3 recommendation for the promotoional email campaign for the provided user target


1:
Suburb                        Richmond
Address                  35A Hunter St
Rooms                                2
Type                                 t
Price                        1430000.0
Method                               S
SellerG                         Jellis
Date                        12/11/2016
Distance                           2.6
Postcode                        3121.0
Bedroom2                           2.0
Bathroom                           2.0
Car                                2.0
Landsize                         153.0
BuildingArea                     125.0
YearBuilt                       2004.0
CouncilArea         Yarra City Council
Lattitude                     -37.8209
Longtitude                    145.0055
Regionname       Northern Metro

The result show the target user is expectedto click on the advertisement and the ads is recommended to be sent to the user.