# Hackney Room Prices

This notebook is used to implement a regression model, designed to predict the going rate of rooms in shared houses in the London borough of Hackney. The data is scraped from [SpareRoom](https://spareroom.co.uk).

The project will use custom built classes and functions. These will be written in a text editor / IDE for convenience, and the `autoreload` method below ensures that the notebook does not have to be restarted every time this code is altered.

In [1]:
%load_ext autoreload
%autoreload 2

## Load Packages

In [2]:
import pandas as pd
pd.set_option('display.max_columns', None)

## Scraping the Data

In [None]:
from scrape import SpareRoomScraper

scraper = SpareRoomScraper()
df = scraper.get_data()
df.head(60)

In [None]:
df.to_feather('./data/peckham.feather')

## Preprocess

In [26]:
df = pd.read_feather('./data/peckham.feather')
df.head(60)

Unnamed: 0,ad_ref,area,available_in,balcony,bills,broadband,deposit,disabled_access,distance_to_station,furnished,gender,house_type,living_room,max_term,min_term,num_flatmates,parking,postcode,price,url
0,Ad ref# 14705491,London SE15,Now,No,Yes,Yes,£500.00,No,,Furnished,Male preferred,House share,shared,,1 month,8,Yes,SE15 Area info,£145 pw (double),https://www.spareroom.co.uk/flatshare/flatshar...
1,Ad ref# 16057872,Peckham,28 Feb 2022,No,,,,No,0-5 minutes walk away,Furnished,,Flat to rent,,,6 months,Yes,No,SE15 Area info,"£1,100 pcm (whole property)",https://www.spareroom.co.uk/flatshare/flatshar...
2,Ad ref# 15993666,Peckham,28 Feb 2022,No,,,,No,0-5 minutes walk away,Furnished,,Flat to rent,,,6 months,Yes,No,SE15 Area info,"£1,100 pcm (whole property)",https://www.spareroom.co.uk/flatshare/flatshar...
3,Ad ref# 16057268,Peckham,14 Feb 2022,Yes,Yes,Yes,£900.00,No,0-5 minutes walk away,Furnished,Female preferred,Flat share,shared,,1 month,1,Yes,SE15 Area info,£900 pcm (double/en suite),https://www.spareroom.co.uk/flatshare/flatshar...
4,Ad ref# 598455,Peckham Rye,16 Jan 2022,No,Some,Yes,£900.00,No,5-10 minutes walk away,Furnished,Female preferred,Flat share,shared,,2 months,1,No,SE15 Area info,£900 pcm (double),https://www.spareroom.co.uk/flatshare/flatshar...
5,Ad ref# 16056358,Peckham,20 Feb 2022,No,Yes,Yes,£850.00,No,0-5 minutes walk away,Furnished,Male preferred,House share,shared,,6 months,1,Yes,SE15 Area info,"£1,000 pcm (double/en suite)",https://www.spareroom.co.uk/flatshare/flatshar...
6,Ad ref# 8789731,London SE15,Now,Yes,Some,Yes,£675.00,No,10-15 minutes away,Furnished,Males or females,Flat share,No,,6 months,1,Yes,SE15 Area info,£675 pcm (double),https://www.spareroom.co.uk/flatshare/flatshar...
7,Ad ref# 16056195,Nunhead,01 Mar 2022,No,,,,No,,Part Furnished,,Flat to rent,,,12 months,Yes,Yes,SE15 Area info,"£1,400 pcm (whole property)",https://www.spareroom.co.uk/flatshare/flatshar...
8,Ad ref# 15568641,Peckham,03 Feb 2022,,No,Yes,£748.00,,,Furnished,Males or females,Flat share,shared,,,3,,SE15 Area info,£687 pcm (double),https://www.spareroom.co.uk/flatshare/flatshar...
9,Ad ref# 15171013,Peckham Rye,01 Feb 2022,Yes,Yes,Yes,£999.00,,0-5 minutes walk away,Furnished,Males or females,House share,shared,,3 months,1,Yes,SE15 Area info,"£1,099 pcm (double/en suite)",https://www.spareroom.co.uk/flatshare/flatshar...


The preprocessing steps are undertaken by the classes in `preprocessing.py`. These include:

  - The removal of all listings that are whole properties rather than individual rooms
  - The extraction of separation of distinct pieces of information within some features into their own separate features. E.g. the original `price` feature is split into the the price value, the rate (p/w or pcm) and whether or not the room has an ensuite.
  - Transformation of some features into a more useful form - `availability` is transformed into `available_in` which is simply and integer of the number of days between the current day and the room's availability date.
  - Encoding of Yes/No features in a binary format, and other categorical features in a one-hot format where appropriate
  
  

In [None]:
from transformers import DataFrameTransformer
t = DataFrameTransformer()
df = t.fit_transform(df)
df.head()

In [None]:
from preprocessing import remove_non_rooms
df = remove_non_rooms(df)
df.head()

In [None]:
from preprocessing import PriceExtractor

pe = PriceExtractor()
pe.fit_transform(df).head()

In [None]:
from preprocessing import AvailabilityTransformer

at = AvailabilityTransformer()
at.fit_transform(df).head()

In [None]:
from preprocessing import TermTransformer

tt = TermTransformer()
tt.fit_transform(df).head()

In [None]:
from preprocessing import AdRefExtractor

are = AdRefExtractor()
are.fit_transform(df).head()

In [None]:
from preprocessing import PostcodeExtractor

pe = PostcodeExtractor()
pe.fit_transform(df).head()

In [None]:
from preprocessing import DepositTransformer

dt = DepositTransformer()
dt.fit_transform(df).head()

In [None]:
from preprocessing import TimeToStationExtractor

ttse = TimeToStationExtractor()
ttse.fit_transform(df).head()

In [None]:
from preprocessing import FeatureEncoder
fe = FeatureEncoder()
fe.fit_transform(df).head()

In [None]:
df.head()

In [27]:
from model import build_model
model = build_model()
model.fit(df)

Pipeline(steps=[('remover', Pipeline(steps=[('remover', PropertyRemover())])),
                ('transformer',
                 ColumnTransformer(transformers=[('price', PriceExtractor(),
                                                  ['price']),
                                                 ('available_in',
                                                  AvailabilityTransformer(),
                                                  ['available_in']),
                                                 ('lease_terms',
                                                  TermTransformer(),
                                                  ['max_term', 'min_term']),
                                                 ('ad_ref', AdRefExtractor(),
                                                  ['ad_ref']),
                                                 ('deposit',
                                                  DepositTransformer(),
                                                  ['d

In [28]:
pd.DataFrame(model.transform(df)).head(20)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
1,900.0,1,1,30,0,1,16057268,0,900.0,0,1.0,1.0,1,No,1.0,1,0,1.0,2,0,0,0,0
2,900.0,1,0,1,0,2,598455,0,900.0,5,1.0,0.0,1,No,1.0,1,0,0.0,1,0,0,0,0
3,1000.0,1,1,36,0,6,16056358,0,850.0,0,1.0,0.0,1,No,1.0,1,1,1.0,2,0,1,0,0
6,1099.0,1,1,17,0,3,15171013,0,999.0,0,1.0,1.0,1,,1.0,1,1,1.0,2,0,0,1,0
9,750.0,1,0,28,0,0,16055415,0,0.0,0,1.0,0.0,1,No,1.0,1,0,0.0,2,0,0,1,0
10,725.0,1,0,0,1,6,8566882,0,725.0,5,1.0,0.0,1,No,0.0,1,0,0.0,2,0,0,1,0
12,900.0,1,0,0,1,2,11606833,0,500.0,0,1.0,0.0,1,No,1.0,1,1,0.0,2,0,0,1,0
15,700.0,1,0,0,0,0,16011336,0,500.0,10,0.0,0.0,1,No,1.0,1,1,0.0,2,0,1,0,0
17,780.0,1,0,0,0,12,13761043,0,700.0,5,1.0,0.0,1,No,1.0,1,1,1.0,2,0,0,1,0
20,600.0,1,0,0,0,1,3252724,0,600.0,0,1.0,0.0,1,No,1.0,1,1,1.0,2,0,0,1,0


# 