# House Prices

The aim of this project was to build an interactive web app for display house prices in England and Wales, using a heat-map overlay to visualise the differences by region.

##### Initial questions

- What granularity to show average price at? e.g. county, town, postcode, council?
- Could add feature for user to zoom in e.g. initially show at county level, then zoom into postcode level
- Mean or median average price? - allow user to choose, outliers will be interesting as well
- Might be interesting to see range of house prices by area too
    - Min, Max
    - Mean
    - 10%, 25%, 50%, 75%, 90% percentiles

A next step would be to see how house prices have changed over time. This could be a separate overlay on the map (user chooses which overlay to view), effectively a different page on the website. Challenge will be how to visualise this change over time by area?

- How far back in time will data go?
- Maybe take snapshots of average price by area every 5 or 10 years
- One simple view would be to allow user to select a date range, e.g. 1960-2020, then visual show average price difference between the 2 dates
    - Allowing this to be controlled with a slider would make it easier to find trends

##### How to get the data?
Write description here of how I got the data and how it's created/published.

##### Preparing the data

~~~~sql
DROP DATABASE IF EXISTS `houseprices`;
CREATE DATABASE `houseprices`;
USE `houseprices`;

CREATE TABLE `pricepaid` (
    `unique_id` VARCHAR(100),
    `price_paid` DECIMAL,
    `deed_date` DATE,
    `postcode` VARCHAR(8),
    `property_type` VARCHAR(1),
    `new_build` VARCHAR(1),
    `estate_type` VARCHAR(1),
    `saon` VARCHAR(50),
    `paon` VARCHAR(50),
    `street` VARCHAR(50),
    `locality` VARCHAR(50),
    `town` VARCHAR(50),
    `district` VARCHAR(50),
    `county` VARCHAR(50),
    `transaction_category` VARCHAR(1),
    `linked_data_uri` VARCHAR(1),
    PRIMARY KEY (unique_id)
);

SET GLOBAL local_infile=ON;SET autocommit=0;
SET unique_checks=1;
SET foreign_key_checks=0;

LOAD DATA LOW_PRIORITY 
LOCAL INFILE 'Path/To/Project/pricepaid.csv'
INTO TABLE pricepaid 
CHARACTER SET armscii8
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n' 
(`unique_id`,`price_paid`,`deed_date`,`postcode`,`property_type`,`new_build`,`estate_type`,`saon`,`paon`,`street`,`locality`,`town`,`district`,`county`,`transaction_category`,`linked_data_uri`);000
~~
d for


Creating an index 

~~~~sql
CREATE INDEX idx_lastname
ON Persons (LastName);
~~~~

The following columns were added to the pricepaid table.

~~~~sql
substr(postcode, 1, locate(' ', postcode) - 1)
regexp_replace(Outcode, '[0-9]+', '')
case when (Year < 2005) then '1995 - 2004' when (Year < 2015) then '2005 - 2014' else '2015 +' end
~~~~

Taking a simple random sample of 100 observations for each distinct OutCode and YearBin.

~~~~sql
SELECT t.* FROM
(SELECT pp.*,
	   ROW_NUMBER() OVER (PARTITION BY OutCode, YearBin ORDER BY RAND()) AS SeqNum
FROM pricepaid pp) t
WHERE t.SeqNum <= 100
INTO LOCAL OUTFPath/To/Project/tion/pricepaidsample.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED 
~~~~BY '\n';