# CUSP London Data Dive 2023

## Challenge we choose / Research Question:
1.  Do common mental health problems (depression and anxiety) cluster in particular geographical areas across the country (UK)? 
2.  **Are serious mental health problems (psychosis- schizophrenia and bipolar affective disorders) more prevalent in urban built areas?**
3.  What physical features of the built environment (green space, air pollution levels, street intersections (‘neighbourhood walkability’), any others- this list is not exhaustive) are associated with mental health conditions (common mental health conditions and/ or serious mental health conditions)?
4.  **What social features of the environment (population density, social disorganisation, any others- list is not exhaustive) are associated with mental health conditions (common mental health conditions and/ or serious mental health conditions)?**
5.  Do mental health issues correlate with known physical health issues?

## Data Source:

1. [Mental Health Index 2011](https://pldr.org/dataset/2noyv/small-area-mental-health-index-samhi)
2. [Population Density Census 2011](https://www.nomisweb.co.uk/census/2011/qs102ew)
3. [Deprivation 2010](https://www.gov.uk/government/statistics/english-indices-of-deprivation-2010)
4. [Statistical Boundary london 2011](https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london)
5. Household Composition England and Wales 2011(https://www.nomisweb.co.uk/census/2011/ks105ew)

## Assumption:


## Limitation:


# Import Modules

In [17]:
# Import required libraries

# Import visualisation modules
import matplotlib as mpl 
%matplotlib inline 
import matplotlib.pyplot as plt 

#Import modules
import osmnx as ox
import pandas as pd
import geopandas as gpd
import numpy as np
import contextily as ctx

#Import datareading modules
import fiona
import urllib
from urllib.request import urlopen
import csv
import os
import tempfile
import shutil
from pathlib import Path

import warnings 
warnings.simplefilter(action='ignore')
ox.__version__

'1.2.2'

# Data Pre-processing

## Population Density

In [19]:
# Read the file from github repo URL
url = "https://raw.githubusercontent.com/ListianingrumR/cusp_london_data_dive_2023/main/data/samhi_21_01_v4.00_2011_2019_LSOA.csv"
pd_df= pd.read_csv(url)

pd_df.head(5)

Unnamed: 0,lsoa11,samhi_index.2011,samhi_dec.2011,samhi_index.2012,samhi_dec.2012,samhi_index.2013,samhi_dec.2013,samhi_index.2014,samhi_dec.2014,samhi_index.2015,samhi_dec.2015,samhi_index.2016,samhi_dec.2016,samhi_index.2017,samhi_dec.2017,samhi_index.2018,samhi_dec.2018,samhi_index.2019,samhi_dec.2019
0,E01000001,-1.73307,1,-1.665014,1,-1.729767,1,-1.460658,1,-1.428309,1,-1.507283,1,-1.326553,1,-1.371607,1,-1.183468,1
1,E01000002,-1.704465,1,-1.719869,1,-1.783252,1,-1.749144,1,-1.53145,1,-1.456034,1,-1.383528,1,-1.242643,1,-1.18984,1
2,E01000003,-0.92087,4,-0.68642,5,-0.357678,7,-0.571222,4,-0.567158,4,-0.574356,3,-0.504734,2,-0.501422,2,-0.580351,1
3,E01000005,-1.21824,2,-1.262427,2,-0.951074,3,-0.720833,4,-1.016268,2,-0.679462,2,-0.805106,1,-0.695488,1,-0.89746,1
4,E01000006,-1.892813,1,-1.837497,1,-1.784586,1,-1.724196,1,-1.630328,1,-1.537457,1,-1.352359,1,-1.452954,1,-1.237533,1


In [13]:
pd_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32844 entries, 0 to 32843
Data columns (total 19 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   lsoa11            32844 non-null  object 
 1   samhi_index.2011  32844 non-null  float64
 2   samhi_dec.2011    32844 non-null  int64  
 3   samhi_index.2012  32844 non-null  float64
 4   samhi_dec.2012    32844 non-null  int64  
 5   samhi_index.2013  32844 non-null  float64
 6   samhi_dec.2013    32844 non-null  int64  
 7   samhi_index.2014  32844 non-null  float64
 8   samhi_dec.2014    32844 non-null  int64  
 9   samhi_index.2015  32844 non-null  float64
 10  samhi_dec.2015    32844 non-null  int64  
 11  samhi_index.2016  32844 non-null  float64
 12  samhi_dec.2016    32844 non-null  int64  
 13  samhi_index.2017  32844 non-null  float64
 14  samhi_dec.2017    32844 non-null  int64  
 15  samhi_index.2018  32844 non-null  float64
 16  samhi_dec.2018    32844 non-null  int64 

## Deprivation

In [20]:
# Read the file from github repo URL
url = "https://raw.githubusercontent.com/ListianingrumR/cusp_london_data_dive_2023/main/data/CLSHHD_LSOADZ_England_Scotland_Wales_Descriptions.csv"
dep_df= pd.read_csv(url)

dep_df.head(5)

Unnamed: 0,GEO_CODE,GEO_LABEL,GEO_TYPE,GEO_TYP2,Deprivation; classification of household [E][S][W] : Total\ Classification of household deprivation - Unit : Households,Deprivation; classification of household [E][S][W] : Household is not deprived in any dimension - Unit : Households,Deprivation; classification of household [E][S][W] : Household is deprived in 1 dimension - Unit : Households,Deprivation; classification of household [E][S][W] : Household is deprived in 2 dimensions - Unit : Households,Deprivation; classification of household [E][S][W] : Household is deprived in 3 dimensions - Unit : Households,Deprivation; classification of household [E][S][W] : Household is deprived in 4 dimensions - Unit : Households
0,E01000001,City of London 001A,Lower Super Output Areas and Data Zones,LSOADZ,876,488,314,61,12,1
1,E01000002,City of London 001B,Lower Super Output Areas and Data Zones,LSOADZ,830,490,288,47,4,1
2,E01000003,City of London 001C,Lower Super Output Areas and Data Zones,LSOADZ,817,235,359,169,47,7
3,E01000005,City of London 001E,Lower Super Output Areas and Data Zones,LSOADZ,467,107,187,113,51,9
4,E01000006,Barking and Dagenham 016A,Lower Super Output Areas and Data Zones,LSOADZ,543,198,195,113,35,2


In [15]:
dep_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41729 entries, 0 to 41728
Data columns (total 10 columns):
 #   Column                                                                                                                   Non-Null Count  Dtype 
---  ------                                                                                                                   --------------  ----- 
 0   GEO_CODE                                                                                                                 41729 non-null  object
 1   GEO_LABEL                                                                                                                41729 non-null  object
 2   GEO_TYPE                                                                                                                 41729 non-null  object
 3   GEO_TYP2                                                                                                                 41729 non-null  obje

# Demographic

In [10]:
# Read the file from github repo URL
url = "https://raw.githubusercontent.com/ListianingrumR/cusp_london_data_dive_2023/main/data/Green%20Space%20Consolidated%20Data%20-%20England%20-%20Version%202.1.csv"
demo_df= pd.read_csv(url, sep=";")

demo_df.head(5)

Unnamed: 0.1,Unnamed: 0,LSOA_Code,LSOA_Name,MSOA_Code,MSOA_Name,MSOA_Name_House_Of_Commons,LA_Code,LA_Name,LA_Name_For_Readability,Area,...,Unbuffrd_GOSpace_Area,Buffrd_GOSpace_Area,Unbuffered_GOSpace_Per_Capita,Pop_Area,PopArea_With_GOSpace_Access,Pcnt_PopArea_With_GOSpace_Access,Pcnt_Pop_Without_GOSpace_Access,Pop_Without_GOSpace_Access,GSDI_AvgArea,GSDI_Access
0,1,E01000001,City of London 001A,E02000001,City of London 001,City of London,E09000001,City of London,City of London,133325.8873,...,0.0,0.0,0.0,133325.8873,0.0,0.0,100.0,1296.0,1,1
1,2,E01000002,City of London 001B,E02000001,City of London 001,City of London,E09000001,City of London,City of London,226199.3767,...,0.0,80106.3516,0.0,226199.3767,80106.3516,35.414046,64.585954,746.613627,1,2
2,3,E01000003,City of London 001C,E02000001,City of London 001,City of London,E09000001,City of London,City of London,57305.1083,...,0.0,1857.634803,0.0,57305.1083,1857.634803,3.241657,96.758343,1306.237636,1,1
3,4,E01000005,City of London 001E,E02000001,City of London 001,City of London,E09000001,City of London,City of London,190745.2936,...,0.0,0.0,0.0,190745.2936,0.0,0.0,100.0,1121.0,1,1
4,5,E01000006,Barking and Dagenham 016A,E02000017,Barking and Dagenham 016,Barking East,E09000002,Barking and Dagenham,Barking and Dagenham,144196.9391,...,0.0,27146.71785,0.0,144196.9391,27146.71785,18.82614,81.17386,1655.946741,1,1


In [16]:
demo_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32844 entries, 0 to 32843
Data columns (total 31 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   Unnamed: 0                                32844 non-null  int64  
 1   LSOA_Code                                 32844 non-null  object 
 2   LSOA_Name                                 32844 non-null  object 
 3   MSOA_Code                                 32844 non-null  object 
 4   MSOA_Name                                 32844 non-null  object 
 5   MSOA_Name_House_Of_Commons                32844 non-null  object 
 6   LA_Code                                   32844 non-null  object 
 7   LA_Name                                   32844 non-null  object 
 8   LA_Name_For_Readability                   32844 non-null  object 
 9   Area                                      32844 non-null  float64
 10  IMD_st_areasha                    