# Inequality and premature mortality 

This is code for a python workshop with LSE undergraduate economics students to explore the relationship between inequality and premature morality. 

------


# Motivation 

OECD figures suggest that the UK has among the highest levels of income inequality in the European Union (as measured by the Gini coefficient), although income inequality is lower than in the United States. 

Data published by Eurostat, the statistical office of the European Union, gives a more positive picture, indicating income inequality in the UK is lower than in several other EU countries although it is slightly higher than the EU average.


<img src="UKinequalitygini.png" alt="UKinequalitygini.png" width="500"/>


### The Deaton Review

In 2019 the IFS with Professor Angus Deaton launched the Deaton Review. A 5-year review into examining income and wealth inequalities, but also differences in health outcomes, political power and economic opportunities in British society and across the world.

The review will attempt to make a distinctions between inequalities which are beneficial by way of providing incentives for people to strive harder, and which should be stamped out because they are derived from "luck" or "cronyism". 

You can read more here: https://www.ifs.org.uk/inequality/


### Related Literature

Atkinson, A (2015) Inequality – what can be done? Working paper 2. LSE International Inequalities Institute. 
http://www.lse.ac.uk/International-Inequalities/Assets/Documents/Working-Papers/Working-Paper-2-Tony-Atkinson.pdf

Neumayer, E (2016) Inequalities of Income and Inequalities of Longevity: A Cross-Country Study. [here.]("https://ajph.aphapublications.org/doi/pdf/10.2105/AJPH.2015.302849")


Neumayer, E (2017) Regional Inequalities in Premature Mortality in Great Britain. [here.]("https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/AQ3WMC")

------ 

### Structure
This project will consist of two parts: 
1. Premature mortality in London 
2. Inequality of longevity and income in the UK and globally

The second part is based on replicating some work done by Eric Neumayer (LSE). 

# Part 1: Premature morality in London

------

# Let's first look at London

The aim here is to create a choropleth map using data from a tutorial which can be found <a href="https://towardsdatascience.com/lets-make-a-map-using-geopandas-pandas-and-matplotlib-to-make-a-chloropleth-map-dddc31c1983d">here</a>

Note: there is a mistake in the author's final result as he uses the wrong 'column', and therefore doesn't actually show mortality. 


# Import packages 

You will need to import pandas, numpy and matplotlib in addition to a new package: geopandas. 

We are using geopandas to create the choropleth maps, which is essentially a geographical heat map.

#### A word of caution when using geopandas

You need to make sure you carefully follow the installation documentation. 
You can read it <a href="https://geopandas.readthedocs.io/en/latest/install.html">here</a>  

When using conda, make sure you use this new environment 'geo_env' so that you don't have any conflicts when using geopandas. 

# Import the data 

Data here varies for multiple years. last updated 2 months ago (from site)
You can download the data yourself <a href="https://data.london.gov.uk/dataset/london-borough-profiles">here</a>  


# Inspect the data 

Try to answer answer the following questions: 

* How many observations are in the data. 
* What are the various columns - list them
* How many missing observations? 
* How many unique boroughs? 
* Rename the columns of interest to simplier names (hint: df.rename(index=str, columns={'Area name':'borough',....})

# Import the shape file

You can download the shapefiles here: https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london

Select the option 'statistical-gis-boundaries-london.zip  (27.34 MB)'
When saving you need to make sure you have the .shx AND .shp files stored together! Make sure to move it to the same directory as all your other files and this notebook for ease of use. 

Run some checks to ensure the data have been imported correctly. 

# Cleaning the dataframes

Now you need to merge your dataframe with your shapefile dataframe. To do this you need to identify a KEY i.e. a column that both dataframes have in common that you will be able to match to. 

# Creating the map

Colours of the map codes: https://matplotlib.org/tutorials/colors/colormaps.html

Here you just want to select mortality and map it. 

* set a variable that will call whatever column we want to visualise on the map
* set the range for the choropleth
* create figure and axes for Matplotlib
* Create colorbar as a legend
* annotate with source of data


# Part two (a): Cross-country relationship 

This dataset was used to examined the effects of market income inequality (income inequality
before taxes and transfers) and income redistribution via taxes and transfers on inequality in longevity.

The corresponding paper can be found here: https://ajph.aphapublications.org/doi/pdf/10.2105/AJPH.2015.302849


# Motivation 
Public policies affect not only health and mortality at the individual level, but also the inequality of longevity-inequality in the number of years lived. 

Example:
* Higher tobacco and alcohol taxes reduce consumption, as do nonfiscal regulatory measures such as restrictions on smoking in closed spaces. 
* This reduces avoidable mortality from lung cancer and liver cirrhosis. 
* More directly, governments implement health and safety regulations, influence total health spending and its allocation, and regulate the coverage of health insurance across individuals. 

All factors that reduce premature deaths also reduce longevity inequality.

# Import the data 

(Note: You have already imported all the relevant datasets, so you won't need to do that again)

We can begin simply by importing the data. 

The data can be downloaded from here: https://www.dropbox.com/s/cp1gc60xagx4uj5/Article%20for%20AJPH.xlsx?dl=0

# Inspect the data 

As before, generate some summary stats to describe the data that you have. This should also help you to understand what the dataset is and how you can use it. 

Consider finding the following: 
* Number of countries. And the names of these countries
* Years the data runs from and to 
* Null observations 
* Most unequal country (High Gini coefficient)
* Most equal country (Low Gini coeffcient) 
* The total columns and what they describe 

# Estimating the relationship
Now identify the columns that you might anticipate will produce a relationship 

* create a simple scatter chart to visualise whether there might be a relationship 
* estimate this using a simple regression 
* Are there any other variables you need to control in your estimation? 


# Part two (b): Inequality of longevity and income in the UK
Similarly there is a relationship between inequality and health outcomes (life expectancy) which can be assessed on a cross-country level. 

We can use additional UK only data to assess the areas with highest premature moartality rates, does this seem to correspond with what you know about income inequality? 


# Import the data 

(Note: You have already imported all the relevant datasets, so you won't need to do that again)

We can begin simply by importing the data, which can be downloaded from [here.]("https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/AQ3WMC")

# Inspect the data 

As before, generate some summary stats to describe the data that you have. This should also help you to understand what the dataset is and how you can use it. 

Consider finding the following: 
* Number of local authorities
* Years the data runs from and to 
* Null observations 
* Town with the highest morality (/lowest life expenctancy) rate  
* Town with the lowest mortality (/highest life expenctancy) rate
* The total columns and what they describe 

# Estimating the relationship
Now identify the columns that you might anticipate will produce a relationship 

* create a simple scatter chart to visualise whether there might be a relationship 
* estimate this using a simple regression 
* Are there any other variables you need to control in your estimation? 
