# DUBLIN HOUSE PRICES
# 02 - Analysing the Data
We cleaned the data sourced from the [https://www.propertypriceregister.ie/website/npsra/pprweb.nsf/page/ppr-home-en](https://www.propertypriceregister.ie/website/npsra/pprweb.nsf/page/ppr-home-en) website earlier, and pickled the final data frame. We'll import it now, and examine it to see what we can learn.

In [1]:
import pandas as pd
import bokeh.charts as bc
import pickle

In [2]:
with open('Pickles/2016/house_prices_pickle', 'r') as f:
    house_prices = pickle.load(f)
print house_prices.head()

                                             Address  County       Date  \
0  34 Mountpleasant Terrace, Dublin 6, D06 YC58, ...  Dublin 2016-01-01   
3  2 Brighton Rd, Brighton Hall, Kerrymount, Dubl...  Dublin 2016-01-04   
5  24 Woodstown Meadow, Ballycullen, Dublin 16, D...  Dublin 2016-01-04   
6  28 Belton Park Gardens, Clontarf, Dublin 9, D0...  Dublin 2016-01-04   
9  48A Beaufield Park, Stillorgan, Dublin, A94 XH...  Dublin 2016-01-04   

                             Description FullMarketPrice        Lat       Lon  \
0  Second-Hand Dwelling house /Apartment              No  53.328587 -6.261495   
3  Second-Hand Dwelling house /Apartment              No  53.258165 -6.174641   
5  Second-Hand Dwelling house /Apartment              No  53.273761 -6.327188   
6  Second-Hand Dwelling house /Apartment              No  53.375715 -6.226541   
9  Second-Hand Dwelling house /Apartment              No  53.290123 -6.203367   

  PostCode      Price Size VAT  
0      D06   170000.0       N

In [3]:
house_prices.dtypes

Address                    object
County                     object
Date               datetime64[ns]
Description                object
FullMarketPrice            object
Lat                       float64
Lon                       float64
PostCode                   object
Price                     float64
Size                       object
VAT                        object
dtype: object

In [4]:
house_prices['PostCode'].value_counts()

D15    774
A96    489
D24    472
D12    471
D18    459
D04    457
D06    410
D07    409
A94    403
D11    389
D16    386
D14    371
D13    370
D09    368
D03    308
D08    301
K78    293
D05    254
D01    204
D22    204
D02    125
D10    106
D20     61
D17     50
A98     17
A86      1
Name: PostCode, dtype: int64

### Understanding the Post Codes
As remarked in the earlier notebook, these post codes are more accurately referred to as Eircode Routing Keys. The Dublin keys are easy to recognise, because they're based on the old post codes - D04 for Dublin 4, D02 for Dublin 2, and so on.

Outside of Dublin, however, there is no way to intuit what each post code / Routing key stands for, and to this end I'm indebted to **Green Party Councillor Ossian Smith of Dun Laoghaire**, who has tabulated [all 139 Eircode Keys on his own site](http://www.ossiansmyth.ie/eircode-routing-keys/). We'll use that table to make our post codes more identifiable.

In [5]:
from bs4 import BeautifulSoup
import requests
ossian = requests.get('http://www.ossiansmyth.ie/eircode-routing-keys/')
soup = BeautifulSoup(ossian.text, 'lxml')

table_details = []
content = soup.find('div', {'id':'content'})
article = content.find('article', {'id':'post-603'})

table = article.find('table')
print type(table)

<class 'bs4.element.Tag'>


In [6]:
tr = table.find_all('tr')
table_data = []
for t in tr:
    cells = t.find_all('td')
    temp = []
    [temp.append(c.text) for c in cells]
    table_data.append(temp)
    
print len(table_data)

142


In [7]:
post_code_mapper = {}
for t in table_data[3:]:
    post_code_mapper[t[0]] = t[1]
    
house_prices['PostCodeName'] = house_prices['PostCode'].map(post_code_mapper)

In [8]:
names = house_prices['PostCodeName'].value_counts()
codes = house_prices['PostCode'].value_counts()
both = zip(names.keys(), codes.keys(), names)
formatter = "{:15}{:5}{:5}"
print formatter.format('POSTCODE NAME', 'CODE', 'COUNT')
for b in both:
    print formatter.format(b[0], b[1], b[2])

POSTCODE NAME  CODE COUNT
Dublin 15      D15    774
Dun Laoghaire  A96    489
Dublin 24      D24    472
Dublin 12      D12    471
Dublin 18      D18    459
Dublin 4       D04    457
Dublin 6       D06    410
Dublin 7       D07    409
Blackrock      A94    403
Dublin 11      D11    389
Dublin 16      D16    386
Dublin 14      D14    371
Dublin 13      D13    370
Dublin 9       D09    368
Dublin 3       D03    308
Dublin 8       D08    301
Lucan          K78    293
Dublin 5       D05    254
Dublin 22      D01    204
Dublin 1       D22    204
Dublin 2       D02    125
Dublin 10      D10    106
Dublin 20      D20     61
Dublin 17      D17     50
Bray           A98     17
Dunboyne       A86      1


### Breaking Down by Price and Post Code
Now let's look at the distribution of prices by post code in our data set. For the purposes of clarity, we're going to restrict the data to those post codes for which we have over one hundred properties sold, and we're also going to drop a major development that was in the dataset. It sold for over six million Euro, and such a property is out of place among these others.

In [14]:
post_codes = house_prices['PostCode'].value_counts()

target = list(post_codes[:-4].keys())
forGraphing = house_prices[house_prices['PostCode'].isin(target)]
forGraphing = forGraphing[forGraphing['Price'] < 1500000]

In [15]:
bc.output_notebook()

In [16]:
price_v_postcode_boxplot = bc.BoxPlot(forGraphing,
                                      values='Price',
                                      label='PostCode',
                                     color = 'skyblue',
                                     whisker_color = 'navy',
                                     marker = 'diamond')

In [17]:
price_v_postcode_boxplot.width = 800
from bokeh.models import NumeralTickFormatter
price_v_postcode_boxplot._yaxis.formatter = NumeralTickFormatter(format = ("€0,000 a"))
price_v_postcode_boxplot.title = 'Price v Post Code, 2016-to-Date'
bc.show(price_v_postcode_boxplot)

### Initial Conclusions
A boxplot is the ideal type of diagram for this data because it can convey so much information in so small a space. The box for each post code show where the middle half of the price values lie. The bottom whisker shows the bottom quarter of prices, the top whisker the most expensive quarter of houses. The red diamond outliers are values that are so high or so low relative to the others that they are mapped separately. So, looking at this data:
1. There is a huge range of price difference with postcodes themselves.
2. It is possible to buy a very expensive house in half the post codes in the city and environs.
3. Dublin 10 has the lowest house prices in the city, followed by Balbriggan (K32).
4. The neighbouring post codes of Dublin 6 and Dublin 14 have the most expensive houses in the city. We can go further and say that Dublin 6 has the most uniformly expensive houses in the city - there are no high outliers in the D06 boxplot.

### Dublin 15
As Dublin 15 seems the area of high growth in the city, it seems reasonable to take a closer look at it.

In [None]:
D15 = house_prices[house_prices.PostCode == 'D15']
print D15.Lat.describe()
print D15.Lon.describe()

In [None]:
D15[['Lat', 'Lon']].to_csv('lat_lon_d15.csv')

In [None]:
%%HTML
<div id="map"></div>
<!-- Replace the value of the key parameter with your own API key. -->
<script async defer
src="https://maps.googleapis.com/maps/api/js?key=AIzaSyCkUOdZ5y7hMm0yrcCQoCvLwzdM6M8s5qk&callback=initMap">
</script>
<script src="https://developers.google.com/maps/documentation/javascript/examples/markerclusterer/markerclusterer.js"></script>


In [None]:
%%HTML
<h1>Hi</h1>
<h3>there</h3>