## Netherlands Rent Prediction

Given data about properties in the Netherlands, let's try to predict the rent for a given property.

We will use a random forest pipeline regression model to make our predictions. 

Data source: https://www.kaggle.com/datasets/juangesino/netherlands-rent-properties?select=properties.json

### Getting Started

In [3]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

from sklearn.ensemble import RandomForestRegressor

In [6]:
data = pd.read_json('properties.json', lines=True)
data

Unnamed: 0,_id,externalId,areaRaw,areaSqm,city,coverImageUrl,crawlStatus,crawledAt,datesPublished,firstSeenAt,furnish,lastSeenAt,latitude,longitude,postalCode,postedAgo,propertyType,rawAvailability,rent,rentDetail,rentRaw,source,title,url,additionalCosts,additionalCostsRaw,deposit,depositRaw,descriptionNonTranslated,descriptionNonTranslatedRaw,descriptionTranslated,descriptionTranslatedRaw,detailsCrawledAt,energyLabel,gender,internet,isRoomActive,kitchen,living,matchAge,matchAgeBackup,matchCapacity,matchGender,matchGenderBackup,matchLanguages,matchStatus,matchStatusBackup,pageDescription,pageTitle,pets,registrationCost,registrationCostRaw,roommates,shower,smokingInside,toilet,userDisplayName,userId,userLastLoggedOn,userMemberSince,userPhotoUrl,additionalCostsDescription
0,{'$oid': '5d2b113a43cbfd7c77a998f4'},room-1686123,14 m2,14,Rotterdam,https://resources.kamernet.nl/image/913b4b03-5...,done,{'$date': '2019-07-26T22:18:23.018+0000'},"[{'$date': '2019-07-14T11:25:46.511+0000'}, {'...",{'$date': '2019-07-14T11:25:46.511+0000'},Unfurnished,{'$date': '2019-07-26T22:18:23.142+0000'},51.896601,4.514993,3074HN,4w,Room,26-06-'19 - Indefinite period,500,,"€ 500,-",kamernet,West-Varkenoordseweg,https://kamernet.nl/en/for-rent/room-rotterdam...,50.0,\n € 50\n ...,500.0,\n € 500\n ...,"Nice room for rent, accros the Feyenoord stadi...","\nNice room for rent, accros the Feyenoord sta...","Nice room for rent, accros the Feyenoord stadi...","\nNice room for rent, accros the Feyenoord sta...",{'$date': '2019-07-22T07:10:41.849+0000'},Unknown,Mixed,Yes,true,Shared,,16 years -\n 99 years,16 years -\n 99 years,1 person,Not important,Not important,Not important,Not important,Not important,"Room for rent in Rotterdam, West-Varkenoordse...",Room for rent in Rotterdam €500 | Kamernet,No,0,\n € 0\n ...,5,Shared,No,Shared,Huize west,4680711.0,21-07-2019,26-06-2019,https://resources.kamernet.nl/Content/images/s...,
1,{'$oid': '5d2b113a43cbfd7c77a9991a'},studio-1691193,30 m2,30,Amsterdam,https://resources.kamernet.nl/image/5e11d6b5-8...,done,{'$date': '2019-08-10T22:28:46.099+0000'},"[{'$date': '2019-07-14T11:25:46.677+0000'}, {'...",{'$date': '2019-07-14T11:25:46.677+0000'},Furnished,{'$date': '2019-08-10T22:28:46.229+0000'},52.370200,4.920721,1018AS,4w,Studio,15-08-'19 - Indefinite period,950,Utilities incl.,"€ 950,- Utilities incl.",kamernet,Parelstraat,https://kamernet.nl/en/for-rent/studio-amsterd...,0.0,\n € 0\n ...,895.0,\n € 895\n ...,"Efficiently furnished, with a large balcony, a...","\nEfficiently furnished, with a large balcony,...","Efficiently furnished, with a large balcony, a...","\nEfficiently furnished, with a large balcony,...",{'$date': '2019-07-22T06:29:33.112+0000'},Unknown,Unknown,Yes,true,Own,Own,18 years -\n 99 years,18 years -\n 99 years,1 person,Not important,Not important,Not important,"Working student, Working","Working student, Working","Studio for rent in Amsterdam, Parelstraat, fo...",Studio for rent in Amsterdam €950 | Kamernet,No,0,\n € 0\n ...,,Own,No,Own,Cor,1865530.0,20-07-2019,05-01-2012,https://resources.kamernet.nl/Content/images/p...,
2,{'$oid': '5d2b113a43cbfd7c77a99931'},room-1690545,11 m2,11,Amsterdam,https://resources.kamernet.nl/image/74b93a27-a...,done,{'$date': '2019-10-02T22:00:33.141+0000'},"[{'$date': '2019-07-14T11:25:46.834+0000'}, {'...",{'$date': '2019-07-14T11:25:46.834+0000'},Furnished,{'$date': '2019-10-02T22:00:33.264+0000'},52.350880,4.854786,1075SB,09 Jul,Room,01-08-'19 - Indefinite period,1000,Utilities incl.,"€ 1000,- Utilities incl.",kamernet,Zeilstraat,https://kamernet.nl/en/for-rent/room-amsterdam...,,\n -\n ...,1000.0,\n € 1000\n ...,Kamer van 11m2 vlakbij het Vondelpark. Met een...,\nKamer van 11m2 vlakbij het Vondelpark. Met e...,Kamer van 11m2 vlakbij het Vondelpark. Met een...,\nKamer van 11m2 vlakbij het Vondelpark. Met e...,{'$date': '2019-07-21T08:44:32.816+0000'},Unknown,Mixed,Yes,true,Shared,Shared,16 years -\n 93 years,16 years -\n 93 years,1 person,Not important,Not important,Not important,Not important,Not important,"Room for rent in Amsterdam, Zeilstraat, for €...",Room for rent in Amsterdam €1000 | Kamernet,Yes,,\n -\n ...,1,Shared,Yes,Shared,Felix,4466569.0,20-07-2019,05-07-2018,https://resources.kamernet.nl/Content/images/p...,
3,{'$oid': '5d2b113a43cbfd7c77a9994a'},room-1680036,16 m2,16,Assen,https://resources.kamernet.nl/image/84e95365-6...,done,{'$date': '2019-07-18T22:00:31.018+0000'},"[{'$date': '2019-07-14T11:25:46.988+0000'}, {'...",{'$date': '2019-07-14T11:25:46.988+0000'},Unfurnished,{'$date': '2019-07-18T22:00:31.174+0000'},53.013494,6.561012,9407BG,17 Jun,Room,16-06-'19 - Indefinite period,290,Utilities incl.,"€ 290,- Utilities incl.",kamernet,Ruiterakker,https://kamernet.nl/en/for-rent/room-assen/rui...,,-,290.0,€ 290,De kamer is 16m2De kamer is voorzien van een z...,De kamer is 16m2<br><br>De kamer is voorzien ...,De kamer is 16m2De kamer is voorzien van een z...,De kamer is 16m2<br><br>De kamer is voorzien ...,{'$date': '2019-07-27T19:03:44.443+0000'},Unknown,Female,Yes,false,Shared,,18 years - 32 years,18 years - 32 years,1 person,Female,Female,Not important,"Student, Working student","Student, Working student","Room for rent in Assen, Ruiterakker, for €290...",Room for rent in Assen €290 | Kamernet,No,,-,4,Shared,Yes,Shared,Albert,783341.0,26-07-2019,09-11-2006,https://resources.kamernet.nl/Content/images/p...,
4,{'$oid': '5d2b113b43cbfd7c77a9997c'},room-1691356,22 m2,22,Rotterdam,https://resources.kamernet.nl/Content/images/p...,done,{'$date': '2019-08-12T02:06:14.635+0000'},"[{'$date': '2019-07-14T11:25:47.193+0000'}, {'...",{'$date': '2019-07-14T11:25:47.193+0000'},Unfurnished,{'$date': '2019-08-12T02:06:14.755+0000'},51.932871,4.479732,3035AK,4w,Room,01-08-'19 - Indefinite period,475,Utilities incl.,"€ 475,- Utilities incl.",kamernet,Zwart Janstraat,https://kamernet.nl/en/for-rent/room-rotterdam...,,\n -\n ...,500.0,\n € 500\n ...,"gedeeltelijk gemeubileerd,met een kitchenette ...","\ngedeeltelijk gemeubileerd,met een kitchenett...",,\n,{'$date': '2019-07-21T08:13:53.217+0000'},Unknown,Male,Unknown,true,Own,Own,16 years -\n 99 years,16 years -\n 99 years,1 person,Male,Male,Not important,"Student, Working student, Working","Student, Working student, Working","Room for rent in Rotterdam, Zwart Janstraat, ...",Room for rent in Rotterdam €475 | Kamernet,No,,\n -\n ...,1,Shared,No,Shared,John,3338401.0,19-07-2019,24-08-2014,https://resources.kamernet.nl/image/3177baf7-5...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
46717,{'$oid': '5e5dada77fc8c93d83042924'},room-1774711,28 m2,28,Rotterdam,https://resources.kamernet.nl/image/198697b0-6...,done,{'$date': '2020-03-03T01:06:47.516+0000'},[{'$date': '2020-03-03T01:06:47.640+0000'}],{'$date': '2020-03-03T01:06:47.640+0000'},Furnished,{'$date': '2020-03-03T01:06:47.640+0000'},51.928624,4.507187,3061AG,2d,Room,01-03-'20 - Indefinite period,800,,"€ 800,-",kamernet,Oudedijk,https://kamernet.nl/en/for-rent/room-rotterdam...,200.0,€ 200,1000.0,€ 1000,"Recent available, completely furnished room in...","Recent available, completely furnished room i...","Recent available, completely furnished room in...","Recent available, completely furnished room i...",{'$date': '2020-03-03T09:32:18.481+0000'},Unknown,Unknown,Yes,true,Shared,Shared,16 years - 99 years,16 years - 99 years,> 5 persons,Not important,Not important,Not important,Not important,Not important,"Room for rent in Rotterdam, Oudedijk, for €80...",Room for rent in Rotterdam €800 | Kamernet,No,,,Unknown,Shared,No,Shared,Xandra,4490595.0,03-03-2020,08-08-2018,https://resources.kamernet.nl/image/1317498e-a...,
46718,{'$oid': '5e5daddf7fc8c93d83043cfd'},room-1774600,16 m2,16,Harmelen,https://resources.kamernet.nl/image/8d75725a-d...,done,{'$date': '2020-03-03T01:07:43.775+0000'},[{'$date': '2020-03-03T01:07:43.898+0000'}],{'$date': '2020-03-03T01:07:43.898+0000'},Furnished,{'$date': '2020-03-03T01:07:43.898+0000'},52.086568,4.959942,3481VE,3d,Room,08-02-'20 - Indefinite period,400,Utilities incl.,"€ 400,- Utilities incl.",kamernet,Wilhelminalaan,https://kamernet.nl/en/for-rent/room-harmelen/...,,,,-,Gerenoveerde zolderkamer te huur per direct.De...,Gerenoveerde zolderkamer te huur per direct.<...,,,{'$date': '2020-03-03T08:30:24.981+0000'},Unknown,Male,Yes,true,Shared,Shared,16 years - 99 years,16 years - 99 years,1 person,Not important,Not important,Not important,Student,Student,"Room for rent in Harmelen, Wilhelminalaan, fo...",Room for rent in Harmelen €400 | Kamernet,No,,,2,Shared,No,Shared,Michel,4356840.0,24-02-2020,06-12-2017,https://resources.kamernet.nl/image/336ab03e-f...,
46719,{'$oid': '5e5dade07fc8c93d83043d2d'},room-1774595,30 m2,30,Rotterdam,https://resources.kamernet.nl/image/481fb7dd-a...,done,{'$date': '2020-03-03T01:07:43.947+0000'},[{'$date': '2020-03-03T01:07:44.071+0000'}],{'$date': '2020-03-03T01:07:44.071+0000'},Furnished,{'$date': '2020-03-03T01:07:44.071+0000'},51.928624,4.507187,3061AG,3d,Room,01-03-'20 - Indefinite period,950,,"€ 950,-",kamernet,Oudedijk,https://kamernet.nl/en/for-rent/room-rotterdam...,300.0,€ 300,1250.0,€ 1250,"Beautiful, new furnished room/apartment in 5 m...","Beautiful, new furnished room/apartment in 5 ...","Beautiful, new furnished room/apartment in 5 m...","Beautiful, new furnished room/apartment in 5 ...",{'$date': '2020-03-03T02:35:17.266+0000'},Unknown,Unknown,Yes,true,Shared,Shared,16 years - 99 years,16 years - 99 years,> 5 persons,Not important,Not important,Not important,"Student, Working student, Working","Student, Working student, Working","Room for rent in Rotterdam, Oudedijk, for €95...",Room for rent in Rotterdam €950 | Kamernet,No,,,Unknown,Shared,No,Shared,Xandra,4490595.0,02-03-2020,08-08-2018,https://resources.kamernet.nl/image/1317498e-a...,
46720,{'$oid': '5e5dade27fc8c93d83043e7b'},room-1774582,35 m2,35,Rotterdam,https://resources.kamernet.nl/image/809ca8b1-c...,done,{'$date': '2020-03-03T01:07:45.950+0000'},[{'$date': '2020-03-03T01:07:46.072+0000'}],{'$date': '2020-03-03T01:07:46.072+0000'},Furnished,{'$date': '2020-03-03T01:07:46.072+0000'},51.928624,4.507187,3061AG,3d,Room,01-03-'20 - Indefinite period,1050,Utilities incl.,"€ 1050,- Utilities incl.",kamernet,Oudedijk,https://kamernet.nl/en/for-rent/room-rotterdam...,300.0,€ 300,1350.0,€ 1350,"Large, completely furnished room in 5 minutes ...","Large, completely furnished room in 5 minutes...","Large, completely furnished room in 5 minutes ...","Large, completely furnished room in 5 minutes...",{'$date': '2020-03-03T06:31:13.654+0000'},Unknown,Unknown,Yes,true,Shared,Shared,16 years - 99 years,16 years - 99 years,1 person,Not important,Not important,Not important,"Student, Working student, Working","Student, Working student, Working","Room for rent in Rotterdam, Oudedijk, for €10...",Room for rent in Rotterdam €1050 | Kamernet,No,,,6,Shared,No,Shared,Xandra,4490595.0,03-03-2020,08-08-2018,https://resources.kamernet.nl/image/1317498e-a...,


In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 46722 entries, 0 to 46721
Data columns (total 62 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   _id                          46722 non-null  object 
 1   externalId                   46722 non-null  object 
 2   areaRaw                      46722 non-null  object 
 3   areaSqm                      46722 non-null  int64  
 4   city                         46722 non-null  object 
 5   coverImageUrl                46722 non-null  object 
 6   crawlStatus                  46722 non-null  object 
 7   crawledAt                    46722 non-null  object 
 8   datesPublished               46722 non-null  object 
 9   firstSeenAt                  46722 non-null  object 
 10  furnish                      46722 non-null  object 
 11  lastSeenAt                   46722 non-null  object 
 12  latitude                     46722 non-null  float64
 13  longitude       

### Preprocessing

In [42]:
def preprocess_inputs(df):
    df = df.copy()

    # Drop bad rows
    bad_rows = df.query('crawlStatus == "unavailable"').index
    df = df.drop(bad_rows, axis=0).reset_index(drop=True)
    
    # Use only select features 
    df = df[[
        'areaSqm',
        'city',
        'furnish',
        'latitude',
        'longitude',
        'propertyType',
        'rent',
        'internet',
        'kitchen',
        'living',
        'pets',
        'shower',
        'smokingInside',
        'toilet'
    ]]

    # Encode improper values
    df = df.replace({'': np.NaN, 'Unknown': np.NaN})

    # Fill missing values
    missing_value_columns = df.columns[df.isna().sum() > 0]
    for column in missing_value_columns:
        df[column] = df[column].fillna(df[column].mode()[0])

    # Split df into X and y
    y = df['rent']
    X = df.drop('rent', axis=1)

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True, random_state=1)
    
    return X_train, X_test, y_train, y_test, X

In [43]:
X_train, X_test, y_train, y_test, X = preprocess_inputs(data)

In [40]:
X_train

Unnamed: 0,areaSqm,city,furnish,latitude,longitude,propertyType,internet,kitchen,living,pets,shower,smokingInside,toilet
29791,61,Rotterdam,Unfurnished,51.925125,4.486212,Apartment,Yes,Shared,Shared,No,Shared,No,Shared
44827,45,Rotterdam,Unfurnished,51.893369,4.517075,Apartment,Yes,Own,Own,No,Own,No,Own
37089,105,Amsterdam,Furnished,52.376979,4.839116,Apartment,Yes,Own,Own,No,Own,No,Own
13269,20,Delft,Uncarpeted,51.996010,4.352954,Room,Yes,Shared,Shared,By mutual agreement,Shared,Not important,Shared
14654,44,Rotterdam,Furnished,51.891709,4.480317,Apartment,Yes,Shared,Shared,No,Shared,No,Shared
...,...,...,...,...,...,...,...,...,...,...,...,...,...
43723,11,Enschede,Unfurnished,52.234413,6.849043,Room,Yes,Shared,Shared,No,Shared,No,Shared
32511,21,Utrecht,Furnished,52.086425,5.125311,Room,Yes,Shared,Shared,No,Shared,No,Shared
5192,16,Groningen,Furnished,53.229623,6.524759,Room,Yes,Shared,Shared,No,Shared,No,Shared
12172,7,Utrecht,Furnished,52.102119,5.096010,Room,Yes,Shared,,No,Shared,No,Shared


In [41]:
y_train

29791    1195
44827     950
37089    1500
13269     330
14654     845
         ... 
43723     310
32511     435
5192      325
12172     175
33003    1495
Name: rent, Length: 32635, dtype: int64

In [36]:
{column: list(X[column].unique()) for column in X.select_dtypes('object').columns}

{'city': ['Rotterdam',
  'Amsterdam',
  'Assen',
  'Groningen',
  'Zeist',
  'Maastricht',
  'Callantsoog',
  'Alphen aan den Rijn',
  'Tilburg',
  'Enschede',
  'Leeuwarden',
  'Eindhoven',
  'Wageningen',
  'Diemen',
  'Utrecht',
  'Almere',
  'Alkmaar',
  'Harderwijk',
  'Hilversum',
  'Delft',
  'Den Bosch',
  'Stoutenburg',
  'Leiden',
  'Den Haag',
  'Boxtel',
  'Badhoevedorp',
  'Veenendaal',
  'Amstelveen',
  'Nijmegen',
  'Venlo',
  'Zwolle',
  'Ubbena',
  'Arnhem',
  'Leimuiden',
  'Riel',
  'Nieuwegein',
  'Haren Gn',
  'Uitgeest',
  'Beverwijk',
  'Ede',
  'Nijkerk',
  'Amersfoort',
  'Loosdrecht',
  'Apeldoorn',
  'Vaals',
  'Velp',
  'Vlaardingen',
  'Montfoort',
  'Heemstede',
  'Breda',
  'Purmerend',
  'Baarn',
  'Spijkenisse',
  'Deventer',
  'Hengelo',
  'Capelle aan den IJssel',
  'Bovenkarspel',
  'Weesp',
  'Harskamp',
  'Zeeland',
  'Waalre',
  'IJsselstein',
  'Pijnacker',
  'Sittard',
  'Putten',
  'Vlissingen',
  'Haarlem',
  'Rijswijk',
  'Zandvoort',
  'Zutp

In [37]:
X.isna().mean()

areaSqm          0.0
city             0.0
furnish          0.0
latitude         0.0
longitude        0.0
propertyType     0.0
rent             0.0
internet         0.0
kitchen          0.0
living           0.0
pets             0.0
shower           0.0
smokingInside    0.0
toilet           0.0
dtype: float64

### Building Pipeline and Training

In [45]:
{column: len(X[column].unique()) for column in X.select_dtypes('object').columns}

{'city': 737,
 'furnish': 3,
 'propertyType': 5,
 'internet': 2,
 'kitchen': 3,
 'living': 3,
 'pets': 3,
 'shower': 3,
 'smokingInside': 3,
 'toilet': 3}

In [49]:
pd.get_dummies(X['internet'], dtype=int).drop('No', axis=1)

Unnamed: 0,Yes
0,1
1,1
2,1
3,1
4,1
...,...
46617,1
46618,1
46619,1
46620,1


In [53]:
nominal_features = [
    'city',
    'furnish',
    'propertyType',
    'kitchen',
    'living',
    'pets',
    'shower',
    'smokingInside',
    'toilet'
]

binary_transformer = Pipeline(steps=[
    ('ordinal', OrdinalEncoder())
])

nominal_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(sparse_output=False, handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(transformers=[
    ('binary', binary_transformer, ['internet']),
    ('nominal', nominal_transformer, nominal_features)
], remainder='passthrough')

model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('regressor', RandomForestRegressor())
])

In [54]:
model.fit(X_train, y_train)

The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.
At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).



### Results

In [56]:
y_pred = model.predict(X_test)

In [57]:
y_pred

array([ 444.195     ,  911.9       ,  339.985     , ..., 1334.43333333,
        362.47238095,  447.27      ])

In [62]:
np.sqrt(np.mean((y_test - y_pred)**2))

157.26574575056125

In [63]:
y_test.describe()

count    13987.000000
mean       664.911561
std        413.872062
min          1.000000
25%        390.000000
50%        550.000000
75%        800.000000
max       5000.000000
Name: rent, dtype: float64

In [75]:
r2 = 1 - (np.sum((y_test - y_pred)**2) / np.sum((y_test - y_test.mean())**2))
rmse = np.sqrt(np.mean((y_test - y_pred)**2))
print('RMSE: {:.2f}'.format(rmse))
print('R^2 Score: {:.5f}'.format(r2))

RMSE: 157.27
R^2 Score: 0.85560


157.26574575056125