# Introduction

> **Note on this report**

> This report is written as a series of "Jupyter Notebooks", a form of document that allows executation of
scientific and technical Python code. All the code samples are executable, so others can explore and adapt this
work online.

> In print or PDF, the code samples may be hidden. To view them, see the online version of these Notebooks at

> https://github.com/econandrew/visual-income-distributions-notebooks

The distribution of wealth and income is an issue of particular recent public, as well as long-term academic, interest. While much discussion of distribution is anecdotal, objective data on income distribution is available from sources such as censuses, income surveys and taxation records. (Objective and comprehensive information of the distribution of wealth is generally more difficult to obtain, and is not the focus of this project.)

The World Bank hosts a substantial public database of income distribution information in the form of [PovCalNet](http://iresearch.worldbank.org/PovcalNet). PovCalNet is primarily designed as an online interface to replicate the Bank's calculations on the incidence of extreme poverty globally. As a by-product, it provides information on the income (or consumption) distributions of many of the world's economies at multiple points in time. However, in general this information is not easily accessible---a point of criticism from [others in the development community](https://www.cgdev.org/blog/global-poverty-data-should-be-public-data-set-and-three-requests).

The objective of this project was to make access to this data easier, exposing the distributional component of PovCalNet source data, and allowing visual manipulation of income distributions. Figure 1.1 shows a screenshot of the prototype app in use.

![_**Figure 1.1:** Screenshot of the proof-of-concept web app, showing the aggregate distribution of the EU28 countries in 2015 (excluding Malta which is not in the PovCalNet collection)_](images/screenshot.png)

## Previous efforts to visualise global income distributions

There is a long academic and policy literature involving the construction and visualisation of, and generation of statistics from, global income distributions. For example, ...

Few efforts have targeted a lay audience. Notable amongst these is the the ["mountain chart" app](http://www.gapminder.org/tools/#_locale_id=en;&chart-type=mountain) from Gapminder, which allows the visualisation of a stacked income density for countries on a log income axis. The app uses a two-parameter lognormal approximation, so interesting variation in the shapes of different distributions is lost (beyond scale and spread). This choice does, however, enable Gapminder to provide estimations of income distributions back to 1800, using the method of [Zandel _et al_](http://ideas.repec.org/p/ucg/wpaper/0001.html), requiring only estimates of GDP per capita, Gini index and population for each country-year.

The "Our World in Data" provides a similar, though less detailed, [view of the same data](https://ourworldindata.org/global-economic-inequality) as part of a broader discussion of global inequality.

Recently, during the course of this project, the [World Wealth & Income Database](http://wid.world/) (formerly World Top Incomes Database) expanded the range of visualisations available on their website, focusing particularly on the evolution of inequality statistics.

## Objectives of this project

Compared with previous work described above, we set out to:

1. **Maintain a degree of comparability with PovCalNet and the World Bank's global poverty estimates.** This implied that he distributions should remain close to the survey data itself. So, rather than collapsing each distribution to two statistics (mean and Gini index), we desired to maintain a much greater degree of detail of the original distribution as used within PovCalNet, suggesting non-parametric methods of representing the distributions. Moreover methods of interpolation, extrapolation and aggregation should be consistent with those of PovCalNet if possible.

2. **Test a variety of different distribution visualisations.** Probability densities, while quite standard in the academic literature, may not be the most effective way to communicate distribution to non-experts. National statistical organizations, for example, often show histograms instead. Decile shares or averages - or even simple quantile ratios may be more immediately meaningful to users. This is an empirical question, but to examine it we need to capacity to visualise distributions in more than one way. Therefore distributions must be represented internally in such a way as to make this possible.

## Structure of this report

This report is broke into five main chapters (with chapter 2 split into two parts), reflecting the major stages in the data and modelling pipeline from PovCalNet to our online visualization (Figure 1.2).

![_**Figure 1.2:** Data and modelling pipeline of the project_](images/flowchart.png)