Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

# Assignment 6 - Thinking about Place

### Nigel de Noronha - Sociology

**Setup code** - imports packages, sets up plotting of maps using `geopandas` Python package.

In [None]:
using DataFrames
using Base.Test
using Plots
import PyPlot
using PyCall

py"""
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

import matplotlib
matplotlib.rcParams['figure.figsize'] = [10, 8]

cov_map = gpd.read_file("cov.geojson")
cov_map = cov_map.to_crs({"init": "epsg:27700"}) # use ONS GB projection

def plot_map(map, df, column, left_on="lsoa11nm", right_on="LSOA", 
             cmap="PuBu", **kwargs):
    df = pd.DataFrame(df)
    map = map.merge(df, left_on=left_on, right_on=right_on)
    ax = map.plot(column=column, edgecolors="black", cmap=cmap, **kwargs)
    vmin = map[column].min()
    vmax = map[column].max()
    fig = ax.get_figure()
    cax = fig.add_axes([0.9, 0.1, 0.03, 0.8])
    sm = plt.cm.ScalarMappable(cmap=cmap, norm=plt.Normalize(vmin, vmax))
    sm._A = []
    fig.colorbar(sm, cax=cax)
"""
py_plot_map = py"plot_map"
cov_map = py"cov_map"

plot_map(map, df, column; args...) = 
    py_plot_map(map, Dict(zip(names(df), DataFrames.columns(df))), String(column); args...)

## 1. Exploring households by type and tenure (45%)

The dataset `hhtenure.csv` holds counts of households by type and tenure.  The column headings are a combination of these two characteristics as outlined in the tables below:

**Table 1 - Household type**

Prefix| Group                         | Household type
---  | ----                           | ---
sp	  | **Households without dependents** | Single person under 65
cpnc  |                               | Couple with no children
oth	  |	                              | Other household
cpdc  | **Households with dependent children** | Couple with dependent children
lpdc  |	                                   | Lone parent with dependent children
othdc |	 	                               | Other household with dependent children
cpndc |	**Households with non-dependent children** | Couple with non-dependent children
lpndc |		                                   | Lone parent with non-dependent children
sp65  |	**Older households**	                   | Single person aged 65 or over
cp65  |		                                   | Couple both aged 65 or over

**Table 2 - Tenure**

Suffix|Tenure type
------|------------
own | Owned with or without a mortgage and shared ownership
sr	| Social rented from the council or a housing association
pr	| Private rented

In [None]:
hhtenure = readtable("hhtenure.csv")
orig_cols = names(hhtenure)
head(hhtenure)

As an example, we can add a new `total` variable giving the total number of households in each LSOA

In [None]:
# add up total number of households in original columns
hhtenure[:total] = zeros(nrow(hhtenure))
for col in [:spown, :cpncown, :othown, :cpdcown, :lpdcown, :othdcown, 
            :cpndcown, :lpndcown, :sp65own, :cp65own, :spsr, :cpncsr, 
            :othsr, :cpdcsr, :lpdcsr, :othdcsr, :cpndcsr, :lpndcsr, 
            :sp65sr, :cp65sr, :sppr, :cpncpr, :othpr, :cpdcpr, :lpdcpr, 
            :othdcpr, :cpndcpr, :lpndcpr, :sp65pr, :cp65pr]
    hhtenure[:total] += hhtenure[col]
end

**Question 1.a) [15%]**	create new variables `ndep` to hold the counts of Households without dependents (including those who have non-dependent children), `dep` for Households with dependent children and `older` for Older households; and for each of the tenures `own`, `sr`, `pr`.

*Hint:* you might find the functions `startswith()` and `endswith()` useful to identify columns matching certain prefices and suffices; note that you can convert from a `Symbol` to a `String` with the `String()` function.

In [None]:
# YOUR CODE HERE

In [None]:
# sum of owners, private-renters and social-renters should match original totals
@test all(hhtenure[:own] + hhtenure[:sr] + hhtenure[:pr] .== hhtenure[:total])

# sum of non-dependent, dependent and older households should match original totals
@test all(hhtenure[:ndep] + hhtenure[:dep] + hhtenure[:older] .== hhtenure[:total])

# check total number of households in each category
@test sum(hhtenure[:dep]) == 39864
@test sum(hhtenure[:ndep]) == 64405
@test sum(hhtenure[:older]) == 24323

# check total number of households in each category
@test sum(hhtenure[:own]) == 78630
@test sum(hhtenure[:pr]) == 28048
@test sum(hhtenure[:sr]) == 21914

**Question 1. b) [10%]** calculate and store the proportion of different household groups and tenure in each LSOA in new columns named `prop_dep`, `prop_ndep`, `prop_older`, `prop_own`, `prop_pr` and `prop_sr`. 

*Hint:* use the `total` column created above.

In [None]:
# YOUR CODE HERE

In [None]:
# check all the new columns exist and that they are floating point numbers
@test all(eltype.([hhtenure[:prop_dep], hhtenure[:prop_ndep], hhtenure[:prop_older], 
                   hhtenure[:prop_own], hhtenure[:prop_pr], hhtenure[:prop_sr]]) .== Float64)

# propotions of dep + ndep + older should add to 1.0
@test all(hhtenure[:prop_dep] + hhtenure[:prop_ndep] + hhtenure[:prop_older] .≈ 1.0)

# propotions of own + pr + sr should add to 1.0
@test all(hhtenure[:prop_own] + hhtenure[:prop_pr] + hhtenure[:prop_sr] .≈ 1.0)

# now check the averages across the city match expected results
@test mean(hhtenure[:prop_dep]) ≈ 0.3157783879437741
@test mean(hhtenure[:prop_ndep]) ≈ 0.4929950444337319
@test mean(hhtenure[:prop_older]) ≈ 0.19122656762249396
@test mean(hhtenure[:prop_own]) ≈ 0.6278113067080776
@test mean(hhtenure[:prop_pr]) ≈ 0.21117127643988706
@test mean(hhtenure[:prop_sr]) ≈ 0.1610174168520353

*(Not for credit)* Experiment with visualising your data, e.g. using the code below for the proportion of owner occupiers. Modify the code to plot your other new variables. Do the maps match what you would expect?

In [None]:
plot_map(cov_map, hhtenure, :prop_own)

These new variables reflect the proportion within a single LSOA.  We might also be interested in the extent to which particular groups in the population are concentrated in some areas.  To do this we need to calculate a new variable. We can create a standardised likelihood that a particular group lives in an LSOA by multiplying it by the number of LSOAs in Coventry (195):

$$
\text{Standardised likelihood} = 195\cdot\frac{\text{Number of group within the LSOA}}{\text{Number of group within Coventry}}
$$

This will give a number where less than 1 can be interpreted as lower than average, 1 as average for Coventry and greater than 1 as higher than average or concentrated.

**Question 1. c) [5%]**	calculate new variables `sl_oth`, for the standarised likelihood of other households, `sl_lpdc`, for lone parents with dependent children, `sl_sr` for social Housing and `sl_pr` for private rented 

*Hint*, for the last two you can reuse your `sr` and `pr` variables, but for the first two you first need to add the revelant variables to determine the total number of other households and of lone parents with dependent children.

In [None]:
# YOUR CODE HERE

In [None]:
# Standardised likelihoods should sum to the number of LSOAs in Coventry, i.e. 195
@test sum(hhtenure[:sl_oth]) ≈ 195.
@test sum(hhtenure[:sl_lpdc]) ≈ 195.
@test sum(hhtenure[:sl_sr]) ≈ 195.
@test sum(hhtenure[:sl_pr]) ≈ 195.

# Average of each S.L. should be close to 1.0
@test mean(hhtenure[:sl_oth]) ≈ 1.0
@test mean(hhtenure[:sl_lpdc]) ≈ 1.0
@test mean(hhtenure[:sl_sr]) ≈ 1.0
@test mean(hhtenure[:sl_pr]) ≈ 1.0

**Question 1. d) [15%]** identify the number of LSOAs for each variable where these groups are concentrated (i.e. standardised likelihood of > 1) and store them in Julia variables (*not* new columns) `n_oth`, `n_lpdc`, `n_sr` and `n_pr`. 

In [None]:
# YOUR CODE HERE

In [None]:
# Check the new variables are defined and that they are of integer type
@test all(eltype.([n_oth, n_lpdc, n_sr, n_pr]) .== Int64)

Highlight the LSOAs with high concentrations of social renting on a map.

In [None]:
# YOUR CODE HERE

## 2.	Exploring ethnicity and deprivation (55%)

The dataset `ethnicity.csv` holds counts of people by ethnic group and an indicator of the level of deprivation where higher values mean areas are more deprived.  The column headings are explained below:

**Table 3 - ethnicity and area deprivation**

Code |	Description
--|--
wbrit |	White British
wirish |	White Irish
wother |	White other
mixed |	Mixed
indian | Indian
pakist |  Pakistani
bangla | Bangladeshi
chinese	| Chinese
othasian | Asian other
blafrican | Black African
blcarrb	| Black Caribbean
blother	| Black other
other |	Other ethnic group
IMD	 | Index of Multiple Deprivation score (higher means more deprived)

In [None]:
ethnicity = readtable("ethnicity.csv")
head(ethnicity)

**Question 2. a) [10%]**	calculate new columns for the white other, Indian, Pakistani, Bangladeshi and black African ethnic groups to show the standardised likelihood that they are living in each LSOA. Name your columns `sl_wother`, `sl_indian`, `sl_pakist`, `sl_bangla` and `sl_blafrican`. You might like to experiment with plotting your new columns on a map.

In [None]:
# YOUR CODE HERE

In [None]:
@test sum(ethnicity[:sl_wother] .> 1.0) == 67
@test sum(ethnicity[:sl_indian] .> 1.0) == 82
@test sum(ethnicity[:sl_pakist] .> 1.0) == 42
@test sum(ethnicity[:sl_bangla] .> 1.0) == 42
@test sum(ethnicity[:sl_blafrican] .> 1.0) == 62

**Question 2.b) [10%]**	create summary statistics (mean, standard deviation, minimum and maximum) of the standardised likelihoods for each ethnic group

In [None]:
# YOUR CODE HERE

**Question 2. c) [10%]** create a histogram showing the standardised likelihood for each ethnic minority group by LSOA. You may prefer to make a series of histograms, one for each ethnic minority.

In [None]:
# YOUR CODE HERE

Whilst there is heterogeneity within ethnic groups sociologists have suggested that some are more likely to live in deprived areas.

**Question 2. d) [15%] **	produce scatterplots between the concentration of each ethnic group and the Index of Multiple Deprivation.  Is there a relationship between each ethnic group and deprivation? Is it positive or negative?

*Hint*: It may be useful to fit a regression line to assess the relationship, e.g. using the `smooth=true` optional argument to `scatter()`, which adds a regression line to the plot.

In [None]:
# YOUR CODE HERE

YOUR ANSWER HERE

**Question 2. e [10%]** Devise your own metric to compare the concentrations of each ethnic minority in LSOAs across Coventry and plot a bar graph to compare the different ethnic groups.

In [None]:
# YOUR CODE HERE

Which minority group is the most concentrated?

YOUR ANSWER HERE

## 3. Bonus Question (no marks)

The file `migqual.csv` contains counts of the number of people who were born in the UK or when they moved here.  The first set of data is for all residents, the second for those with level 4 qualifications.  The variables in the file are shown in Table 4.

**Table 4 - born here or migrated to the UK by highest education qualification**

Code |	Description
----|---
|*For all residents*
bornhere|	Born in the UK
arrpre61|	Arrived before 1961
arr6170|	Arrived between 1961 and 1970
arr7180|	Arrived between 1971 and 1980
arr8190|	Arrived between 1981 and 1990
arr9100|	Arrived between 1991 and 2000
arr0111|	Arrived between 2001 and 2011
|*For residents with level 4 qualifications or above (classed as a degree)*
l4bornhere|	Born in the UK
l4arrpre61|	Arrived before 1961
l4arr6170|	Arrived between 1961 and 1970
l4arr7180|	Arrived between 1971 and 1980
l4arr8190|	Arrived between 1981 and 1990
l4arr9100|	Arrived between 1991 and 2000
l4arr0111|	Arrived between 2001 and 2011

A column showing the index of multiple deprivation score is also included.

In [None]:
migqual = readtable("migqual.csv")
mig_cols = names(migqual)[2:8]
head(migqual)

For example, we could map the number of residents born in the UK for each LSOA

In [None]:
plot_map(cov_map, migqual, :bornhere)

**Question 3. a)**	calculate new variables for all residents in each migrant cohort to show the standardised likelihood that they are living in each LSOA

In [None]:
# YOUR CODE HERE

**Question 3. b)**	create summary statistics (mean, standard deviation, minimum and maximum) for each of these groups

In [None]:
# YOUR CODE HERE

**Quesiton 3. c)**	identify the extent to which each migrant cohort is clustered in LSOAs in Coventry. Plot a suitable graph to illustrate your answer.

In [None]:
# YOUR CODE HERE

**Quesiton 3. d)**	calculate a new variable for migrants who arrived since 2001 with a degree level qualification to show the standardised likelihood that they are living in each LSOA, plotting the results on a map.

In [None]:
# YOUR CODE HERE

**Question 3. e)**	produce a scatterplot between the entire 2001 migrant cohort and those with degree level qualifications.

In [None]:
# YOUR CODE HERE

Is there a relationship between them? Is it positive or negative?

YOUR ANSWER HERE

**Question 3. f)**	produce a scatterplot between each of the 2001 migrant cohorts and the index of multiple deprivation score.

In [None]:
# YOUR CODE HERE

Is there a relationship between them? Is it positive or negative?

YOUR ANSWER HERE