# <center>Capstone Project - The Battle of Neighbourhoods: Welsh Towns Review</center>
## <center>Part 2 - Data Wrangling</center>
### Applied Data Science Capstone by IBM
### Part of our IBM Data Science Professional Certificate

***

## Table of contents
* [The Problem](#section1)
* [Data Wrangling](#section2)
* [Mapping the data](#section3)
    * [1. Analyse Welsh towns / localities data](#section4)
    * [2. Analyse Welsh secondary schools data](#section5)
    * [3. Property Prices](#section6)

## The Problem <a name="section1"></a>

A couple with young children is looking for a safe and quiet place to live. For their children they want a good state school and for the family a small, but vibrant town. They would like either to settle in that town or very close. They are flexible in regards to the location because they both work from home with only occasional business travels to a city. But where to start? Where are the good schools and which towns could be nice to live in?

## Data Wrangling<a name="section2"></a>

The notebook annotatde 'Part 1' dealt with the data collection. Summary of the saved data: <br>

Description | File Name
------------|----------
Geolocated towns/localities in Wales (Population between 2,000 and 20,000): | `towns_geo.csv`
List of rated secondary schools for wrangling and geolocating: | `schools_rated.csv`
Geolocated and trimmed list of 'good' schools: | `schools_geo.csv`
: | `.csv`

Following the data collection described in the notbook 'Part 1', here we will deal with cleaning and analysing the data. The aim of this notebook is to learn about the data to provide a **solution** to the **problem**. The solution will be presented in the following 
notebook number 3. 

#### Import required libraries

In [None]:
import pandas as pd
import numpy as np
# from bs4 import BeautifulSoup # this module helps in web scrapping
import requests  # this module helps us to download a web page
import geocoder # import the geocoder
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## Mapping the data<a name="section3"></a>

#### 1. Analyse Welsh towns / localities data<a name="section4"></a>

In [None]:
towns = pd.read_csv('towns_geo.csv')
towns.info()
towns.head()

In the previous notebook the data was trimmed on Population to 132 towns the population between 2,000 and 20,000. Thus we just need to plot the towns on the map.

In [None]:
# Create a centre point coordinations for Wales using the mean lat-lon from the dataframe:
lat = towns['Latitude'].mean()
lon = towns['Longitude'].mean()
print('Wales mean coordinates /lat - lon/: ',lat,' , ',lon)

Visualise the points on interactive map using Folium package: <br>
<a href="https://python-visualization.github.io/folium/">python-visualization.github.io/folium</a> and <a href="https://leafletjs.com/reference-1.6.0.html#circlemarker">leafletjs.com</a> for formatting.

In [None]:
map_w = folium.Map(location=[lat, lon], zoom_start=7)

# add markers to map
for lat, lng, town, population in zip(towns['Latitude'], towns['Longitude'], towns['Town'], towns['Population']):
    label = '{}'.format(str(town) + ' population: ' + str(population))
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        stroke=False,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_w)  
    
map_w

The data looks pictured wellon the map, lets move on to the schools.

#### 2. Analyse Welsh secondary schools data<a name="section5"></a>

In [None]:
df2 = pd.read_csv('schools_rated.csv')
df2.info()
df2.head()

The table contain many columns, which we will not use, let's get rid of all aprt from:
`['School_code', 'School_name', 'Local_authority', 'Rating', 'Postcode']`

In [None]:
df2.columns

In [None]:
df2 = df2.drop(['Consortium', 'School Name', 'LA Code', 'Local Authority', 'Sector',
       'Governance - see notes', 'WM Code', 'Welsh Medium Type - see notes',
       'School Type', 'Religious Character', 'Address 1', 'Address 2',
       'Address 3', 'Address 4', 'Phone Number',
       'Pupils - see notes'], axis = 1)
df2.columns

In [None]:
df2.head(10)

Trime the list to schools with green and amber support category by dropping rows with `Rating`: `Red/Coch` & `Amber/Oren`

In [None]:
index_rating = df2[(df2['Rating'] == 'Red/Coch')].index
df2.drop(index_rating, inplace = True)

index_rating = df2[(df2['Rating'] == 'Amber/Oren')].index
df2.drop(index_rating, inplace = True)

df2.info()
df2.head(20)

Now let's geocode the schools based on their postcode to plot on the map.

In [None]:
# Define two empty lists to store the grid location data, one for latitude, one for longtitude:
lati=[]
longi=[]

# Loop throught the postcodes to obtain goelocation. We use ArcGIS, becasuse google is not free anymore:
for code in df2['Postcode']:
    g = geocoder.arcgis('{}, Wales, UK'.format(code))
    #print(code, g.latlng)
    while (g.latlng is None):
        g = geocoder.arcgis('{}, Wales, UK'.format(code))
        #print(code, g.latlng)
    latlng = g.latlng
    lati.append(latlng[0])
    longi.append(latlng[1])

print('Coordination for ', df2.iloc[0,1], ' School: ', lati[0], ',', longi[0])

# Append the coordinates to the dataframe
df2['Latitude'] = lati
df2['Longitude'] = longi

# Check results:
print("Table size: ", df2.shape)
print("Type of data frame objects:")
df2.info()
df2.head()

In [None]:
# save the results:
#df2.to_csv('schools_geo.csv', index=False)

In [None]:
schools = pd.read_csv('schools_geo.csv')
schools.info()
schools.head()

Add the 142 schools to above map using red markers

In [None]:
# add markers to map
for lat_s, lng_s, school, rating in zip(schools['Latitude'], schools['Longitude'], schools['School_name'], schools['Rating']):
    label = '{}'.format(str(school) + ' support category: ' + str(rating))
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat_s, lng_s],
        radius=2,
        stroke=False,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_w)  
    
map_w

#### 3. Average property price in Wales per County<a name="section6"></a>

Load the average house prices:

In [None]:
counties = pd.read_csv('Prices_Wales.csv')
counties.head()

Create a plain map:

In [None]:
map_wc = folium.Map(location=[lat, lon], zoom_start=7, tiles='cartodbpositron')

Define the location of the layer data with county boundaries: 

In [None]:
# geo_data="http://geoportal1-ons.opendata.arcgis.com/datasets/687f346f5023410ba86615655ff33ca9_0.geojson" # defines a link to the uk.gov page with the layer
# The linked file has been downloaded and is available here in case the link is broken:
geo_data = "Counties_and_Unitary_Authorities_(December_2016)_Boundaries.geojson" #there is also a csv file with the same name


Format and plot the map:

In [None]:
# add tile layers to the map
tiles = ['stamenwatercolor', 'cartodbpositron', 'openstreetmap', 'stamenterrain']
for tile in tiles:
    folium.TileLayer(tile).add_to(map_wc)

# Add the counties Choropleth:

choropleth = folium.Choropleth(
    geo_data,
    data=counties, # my dataset with prices
    columns=['County', 'Ave_price_2021'], # 'County' is here for matching the geojson 'ctyua16nm', 'Ave_price_2021' is the column that changes the color of zipcode areas
    key_on='feature.properties.ctyua16nm', # this path contains counties in str type and should match with our 'County column
    fill_color='BuPu',
    fill_opacity=0.4,
    line_color= 'None',
    legend_name='Average Property Price, Wales 2021',
    nan_fill_color = 'None'
).add_to(map_wc)

# add labels indicating the name of the county
style_function = "font-size: 10px; font-weight: normal"
choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['ctyua16nm'], style=style_function, labels=False)
)
    
# create a layer control
folium.LayerControl().add_to(map_wc)

map_wc

This concludes the data wrangling section