# Objective

To develop a Python program with two main features: 

1. The ability to clasify flood risk for UK postcodes & locations based on a subset of labelled data.
2. The ability to visualize and analyse rainfall data in conjunction with the above tool to present risk information to the user.

## Risk Tool

#### Core functionality

My tool :

1. Task 1 : Provide at least one classifier/regression for postcodes in England into a ten class flood probability scale based on provided labelled samples.

2. Task 2 : Provide a regression tool for median house price for postcodes in England, given sampled data.
   
3. Task 3 Provide a classifier for historic flooding for postcodes in England, given labelled sampled data.
   
4. Task 4: Provide a regression tool & a classifier taking in an arbitrary location and predicting the Local Authority and flood risk.
   
5. Finally, to calculate a predicted risk for input postcodes.

In [58]:
import pandas as pd

In [59]:
# Read training data
train = pd.read_csv('resources/postcodes_labelled.csv')
train.head()

Unnamed: 0,postcode,easting,northing,soilType,elevation,localAuthority,riskLabel,medianPrice,historicallyFlooded
0,OL9 7NS,390978,403269,Unsurveyed/Urban,130,Oldham,1,119100.0,False
1,WV13 2LR,396607,298083,Unsurveyed/Urban,130,Walsall,1,84200.0,False
2,LS12 1LZ,427859,432937,Unsurveyed/Urban,60,Leeds,1,134900.0,False
3,SK15 1TS,395560,397900,Unsurveyed/Urban,120,Tameside,1,170200.0,False
4,TS17 9NN,445771,515362,Unsurveyed/Urban,20,Stockton-on-Tees,1,190600.0,False


In [60]:
# Read test data
test = pd.read_csv('resources/postcodes_unlabelled.csv')
test.head()

Unnamed: 0,postcode,easting,northing,soilType,elevation
0,M34 7QL,393470,394371,Unsurveyed/Urban,110
1,OL4 3NQ,395420,405669,Unsurveyed/Urban,210
2,B36 8TE,411900,289400,Unsurveyed/Urban,90
3,NE16 3AT,420400,562300,Unsurveyed/Urban,10
4,WS10 8DE,397726,296656,Unsurveyed/Urban,140


### Task 4 : add Local Authority

In [61]:
from src.task4 import Task4

model_instance = Task4(train)
predictions = model_instance.predict(test)

test['localAuthority'] = predictions
test.head()

Unnamed: 0,postcode,easting,northing,soilType,elevation,localAuthority
0,M34 7QL,393470,394371,Unsurveyed/Urban,110,Tameside
1,OL4 3NQ,395420,405669,Unsurveyed/Urban,210,Oldham
2,B36 8TE,411900,289400,Unsurveyed/Urban,90,Birmingham
3,NE16 3AT,420400,562300,Unsurveyed/Urban,10,Gateshead
4,WS10 8DE,397726,296656,Unsurveyed/Urban,140,Walsall


### Task 3 : add Historic Flooding

In [62]:
from src.task3 import Task3

model_instance = Task3(train)
predictions = model_instance.predict(test)

test['historicallyFlooded'] = predictions
test.head()

Unnamed: 0,postcode,easting,northing,soilType,elevation,localAuthority,postcode_area,postcode_district,postcode_sector,postcode_unit,historicallyFlooded
0,M34 7QL,393470,394371,Unsurveyed/Urban,110,Tameside,M3,M34,M347,M347QL,False
1,OL4 3NQ,395420,405669,Unsurveyed/Urban,210,Oldham,OL,OL4,OL43,OL43NQ,False
2,B36 8TE,411900,289400,Unsurveyed/Urban,90,Birmingham,B3,B36,B368,B368TE,False
3,NE16 3AT,420400,562300,Unsurveyed/Urban,10,Gateshead,NE,NE16,NE163,NE163AT,False
4,WS10 8DE,397726,296656,Unsurveyed/Urban,140,Walsall,WS,WS10,WS108,WS108DE,False


### Task 1 : add Flood Risk Label

In [63]:
from src.task1 import Task1

model_instance = Task1(train)
predictions = model_instance.predict(test)

test['RiskLabel'] = predictions
test.head()

Unnamed: 0,postcode,easting,northing,soilType,elevation,localAuthority,postcode_area,postcode_district,postcode_sector,postcode_unit,historicallyFlooded,RiskLabel
0,M34 7QL,393470,394371,Unsurveyed/Urban,110,Tameside,M3,M34,M347,M347QL,False,1
1,OL4 3NQ,395420,405669,Unsurveyed/Urban,210,Oldham,OL,OL4,OL43,OL43NQ,False,1
2,B36 8TE,411900,289400,Unsurveyed/Urban,90,Birmingham,B3,B36,B368,B368TE,False,1
3,NE16 3AT,420400,562300,Unsurveyed/Urban,10,Gateshead,NE,NE16,NE163,NE163AT,False,1
4,WS10 8DE,397726,296656,Unsurveyed/Urban,140,Walsall,WS,WS10,WS108,WS108DE,False,1


### Task 2 : add Median House Price

In [64]:
from src.task2 import Task2

model_instance = Task2(train)
predictions = model_instance.predict(test)

test['medianPrice'] = predictions
test.head()

Unnamed: 0,postcode,easting,northing,soilType,elevation,localAuthority,postcode_area,postcode_district,postcode_sector,postcode_unit,historicallyFlooded,RiskLabel,medianPrice
0,M34 7QL,393470,394371,Unsurveyed/Urban,110,Tameside,M3,M34,M347,M347QL,False,1,178046.966667
1,OL4 3NQ,395420,405669,Unsurveyed/Urban,210,Oldham,OL,OL4,OL43,OL43NQ,False,1,186691.333333
2,B36 8TE,411900,289400,Unsurveyed/Urban,90,Birmingham,B3,B36,B368,B368TE,False,1,234667.083333
3,NE16 3AT,420400,562300,Unsurveyed/Urban,10,Gateshead,NE,NE16,NE163,NE163AT,False,1,202347.151948
4,WS10 8DE,397726,296656,Unsurveyed/Urban,140,Walsall,WS,WS10,WS108,WS108DE,False,1,187149.0


### Definition of risk

For this project flood risk is defined by combining both probability of a flooding event and the impact of an event (for which property value is a proxy). We use a risk score defined as 

$$ R := 0.05 \times (\textrm{total property value}) \times(\textrm{flood probability}) $$

Here 0.05 is an (arbitrary) estimate of the value lost when a flood affects a property. Potential additional considerations are the number of households impacted and the extent of the local area in which flooding appears likely.

In [85]:
test["Risk"] = 0.05 * test["medianPrice"] * test["RiskLabel"]
test['Risk']

0        8902.348333
1        9334.566667
2       11733.354167
3       10117.357597
4        9357.450000
            ...     
9995    16326.925000
9996     6189.133333
9997     8623.912500
9998    10543.201190
9999     9257.475000
Name: Risk, Length: 10000, dtype: float64