# Hackney Room Prices

This notebook is used to implement a regression model, designed to predict the going rate of rooms in shared houses in the London borough of Hackney. The data is scraped from [SpareRoom](https://spareroom.co.uk).

The project will use custom built classes and functions. These will be written in a text editor / IDE for convenience, and the `autoreload` method below ensures that the notebook does not have to be restarted every time this code is altered.

In [None]:
%load_ext autoreload
%autoreload 2

## Load Packages

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)

## Scraping the Data

In [None]:
from scrape import SpareRoomScraper

scraper = SpareRoomScraper()
df = scraper.get_data()
df.head(60)

In [None]:
df.to_feather('./data/peckham.feather')

## Preprocess

In [None]:
df = pd.read_feather('./data/peckham.feather')

The preprocessing steps are undertaken by the `DataFrameTransformer` class. These include:

  - The removal of all listings that are whole properties rather than individual rooms
  - The extraction of separation of distinct pieces of information within some features into their own separate features. E.g. the original `price` feature is split into the the price value, the rate (p/w or pcm) and whether or not the room has an ensuite.
  - Transformation of some features into a more useful form - `availability` is transformed into `available_in` which is simply and integer of the number of days between the current day and the room's availability date.
  - Encoding of Yes/No features in a binary format, and other categorical features in a one-hot format where appropriate
  
for more detail see `transformers.py`
  

In [None]:
from transformers import DataFrameTransformer
t = DataFrameTransformer()
df = t.fit_transform(df)
df.head(60)

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)
df.head(20)

In [None]:
df.living_room.value_counts()

In [None]:
df.area.value_counts()