# Project 1: Deaths by tuberculosis

by Michel Wermelinger and Tarek Dib, 16 June, 2016

This is the project notebook for Week 1 of The Open University's [_Learn to code for Data Analysis_](http://futurelearn.com/courses/learn-to-code) course.

In 2000, the United Nations set eight Millenium Development Goals (MDGs) to reduce poverty and diseases, improve gender equality and environmental sustainability, etc. Each goal is quantified and time-bound, to be achieved by the end of 2015. Goal 6 is to have halted and started reversing the spread of HIV, malaria and tuberculosis (TB).
TB doesn't make headlines like Ebola, SARS (severe acute respiratory syndrome) and other epidemics, but is far deadlier. For more information, see the World Health Organisation (WHO) page <http://www.who.int/gho/tb/en/>.

Given the population and number of deaths due to TB in some countries during one year, the following questions will be answered: 

- What is the total, maximum, minimum and average number of deaths in that year?
- Which countries have the most and the least deaths?
- What is the death rate (deaths per 100,000 inhabitants) for each country?
- Which countries have the lowest and highest death rate?

The death rate allows for a better comparison of countries with widely different population sizes.

## The data

The data consists of total population and total number of deaths due to TB (excluding HIV) in 2013 in each country.

The data was taken in July 2015 from <http://apps.who.int/gho/data/node.main.POP107?lang=en> (population) and <http://apps.who.int/gho/data/node.main.593?lang=en> (deaths). The uncertainty bounds of the number of deaths were ignored.

The data were collected into an Excel file which should be in the same folder as this notebook.

In [1]:
import warnings
warnings.simplefilter('ignore', FutureWarning)

from pandas import *
data = read_excel('WHO POP TB all.xls')
data.head(n=10)

Unnamed: 0,Country,Population (1000s),TB deaths
0,Afghanistan,30552,13000.0
1,Albania,3173,20.0
2,Algeria,39208,5100.0
3,Andorra,79,0.26
4,Angola,21472,6900.0
5,Antigua and Barbuda,90,1.2
6,Argentina,41446,570.0
7,Armenia,2977,170.0
8,Australia,23343,45.0
9,Austria,8495,29.0


## The range of the problem

The column of interest is the last one.

In [4]:
tbColumn = data['TB deaths']

The total number of deaths in 2013 is:

In [8]:
tbColumn.sum()

1072677.97

The largest and smallest number of deaths in a single country are:

In [9]:
tbColumn.max()

240000.0

In [10]:
tbColumn.min()

0.0

From 0 to almost a quarter of a million deaths is a huge range. The average number of deaths, over all countries in the data is estimated to be about 5530. The median (315) is computed to be much less than the mean. Thus, the median is probably more sensible than the average number of TB deaths.

In [11]:
tbColumn.mean()

5529.267886597938

In [12]:
tbColumn.median()

315.0

The median is far lower than the mean. This indicates that some of the countries had a very high number of TB deaths in 2013, pushing the value of the mean up.

## The most affected

To see the most affected countries, the table is sorted in ascending order by the last column, which puts those countries in the last rows.

In [23]:
data.sort('TB deaths').head(n=10)

Unnamed: 0,Country,Population (1000s),TB deaths,"TB deaths (per 100,000)"
147,San Marino,31,0.0,0.0
125,Niue,1,0.01,1.0
111,Monaco,38,0.03,0.078947
3,Andorra,79,0.26,0.329114
129,Palau,21,0.36,1.714286
40,Cook Islands,21,0.41,1.952381
118,Nauru,10,0.67,6.7
76,Iceland,330,0.93,0.281818
68,Grenada,106,1.1,1.037736
5,Antigua and Barbuda,90,1.2,1.333333


The table raises the possibility that a large number of deaths may be partly due to a large population. To compare the countries on an equal footing, the death rate per 100,000 inhabitants is computed.

In [24]:
populationColumn = data['Population (1000s)']
data['TB deaths (per 100,000)'] = tbColumn * 100 / populationColumn
data
data.sort(['TB deaths (per 100,000)']).tail(n=10)

Unnamed: 0,Country,Population (1000s),TB deaths,"TB deaths (per 100,000)"
117,Namibia,2303,1300,56.448111
30,Cambodia,15135,10000,66.072019
47,Democratic Republic of the Congo,67514,46000,68.134017
115,Mozambique,25834,18000,69.675621
71,Guinea-Bissau,1704,1200,70.422535
158,Somalia,10496,7700,73.36128
172,Timor-Leste,1133,990,87.378641
165,Swaziland,1250,1100,88.0
124,Nigeria,173615,160000,92.157936
49,Djibouti,873,870,99.656357


## Conclusions

The Sub-Saharan countries seem to have the highest TB death rate. It was estimated that 1 million and 100 thousand people died from TB deaths in 2013. The median shows that half of the world had fewer than 315 deaths. The much higher mean of more than 5500 deaths indicates that some countries had a very high number. San Marino had no deaths linked to TB. On the other hand, India had the most deaths of about 240,000 people linked to TB. However, taking the population size into account, the least affected were San Marino and Monaco with less than 0.08 deaths per 100 thousand inhabitants, and the most affected were Nigeria and Djibouti with over 90 deaths per 100,000 inhabitants.

One should not forget that most values are estimates. Nevertheless, they convey the message that TB is a major cause of fatalities, and that there is a huge disparity between countries, with several ones being highly affected.