Estimates the burden of diseases, injuries, and risk factors globally and for 21 regions for 1990 and 2010 via IHME (Institute for Health Metrics and Evaluations).

In [1]:
%matplotlib inline       
import numpy as np
import pandas as pd      # Import 'numpy' and 'pandas' modules

# Set pandas options controlling output format
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 10)
pd.set_option('display.max_rows', 15)
pd.set_option('display.width', 120)

gbd = pd.read_csv("data/IHME_GBD_2010.csv")  # Load the data
gbd.head()                                   # Prints the first 5 observations

  Country Code Country Name  Year  Age Group     Sex Number of Deaths Death Rate Per 100,000
0          AFG  Afghanistan  1970   0-6 days    Male           19,241             318,292.90
1          AFG  Afghanistan  1970   0-6 days  Female           12,600             219,544.20
2          AFG  Afghanistan  1970   0-6 days    Both           31,840             270,200.70
3          AFG  Afghanistan  1970  7-27 days    Male           15,939              92,701.00
4          AFG  Afghanistan  1970  7-27 days  Female           11,287              68,594.50

In [2]:
gbd.dtypes    # Check the data types of the colums in the dataset

Country Code              object
Country Name              object
Year                       int64
Age Group                 object
Sex                       object
Number of Deaths          object
Death Rate Per 100,000    object
dtype: object

Converted the dtype of a column from object to float64 to get the right results. 

In [3]:
gbd['Number of Deaths'] = gbd['Number of Deaths'].str.replace(',', '')
gbd['Death Rate Per 100,000'] = gbd['Death Rate Per 100,000'].str.replace(',', '')
gbd['Number of Deaths'] = pd.to_numeric(gbd['Number of Deaths'])
gbd['Death Rate Per 100,000'] = pd.to_numeric(gbd['Death Rate Per 100,000'])

gbd.head()

  Country Code Country Name  Year  Age Group     Sex  Number of Deaths  Death Rate Per 100,000
0          AFG  Afghanistan  1970   0-6 days    Male             19241                318292.9
1          AFG  Afghanistan  1970   0-6 days  Female             12600                219544.2
2          AFG  Afghanistan  1970   0-6 days    Both             31840                270200.7
3          AFG  Afghanistan  1970  7-27 days    Male             15939                 92701.0
4          AFG  Afghanistan  1970  7-27 days  Female             11287                 68594.5

In [4]:
gbd.dtypes 

Country Code               object
Country Name               object
Year                        int64
Age Group                  object
Sex                        object
Number of Deaths            int64
Death Rate Per 100,000    float64
dtype: object

In [5]:
# Rename columns
gbd.columns = ['country_code', 'country_name', 'year', 'age_group', 'sex', 'deaths', 'death_rate']
print(gbd.columns)

Index(['country_code', 'country_name', 'year', 'age_group', 'sex', 'deaths', 'death_rate'], dtype='object')


In [7]:
gbd.duplicated().sum()              # To check for any duplicated values

0

In [8]:
grouped_by_age = gbd.groupby('age_group')    
print(grouped_by_age.ngroups)                # Gives number of groups when grouped by 'Age'

21


In [9]:
grouped_by_sex = gbd.groupby('sex')
print(grouped_by_sex.ngroups)        # Number of groups in the column 'Sex'
print(grouped_by_sex.count())        # get the count of items in each column of each group
print(grouped_by_sex.size())         # get how many items are in each group (Both, Female and Male)

3
        country_code  country_name   year  age_group  deaths  death_rate
sex                                                                     
Both           19635         19635  19635      19635   19635       19635
Female         19635         19635  19635      19635   19635       19635
Male           19635         19635  19635      19635   19635       19635
sex
Both      19635
Female    19635
Male      19635
dtype: int64


In [6]:
grouped_by_sex = gbd.groupby('sex')
deaths_by_sex = grouped_by_sex['deaths'].sum()
deaths_by_sex

sex
Both      474477838
Female    218537465
Male      255940462
Name: deaths, dtype: int64

In [10]:
grouped_by_country = gbd.groupby('country_name')
deaths_by_country = grouped_by_country['deaths'].sum()
deaths_by_country

country_name
Afghanistan            5687183
Albania                 349413
Algeria                3322568
Andorra                   8215
Angola                 3405986
Antigua and Barbuda       8856
Argentina              5021838
                        ...   
Uzbekistan             2997697
Vanuatu                  26107
Venezuela              1891674
Viet Nam               8947408
Yemen                  2697944
Zambia                 2325708
Zimbabwe               2162430
Name: deaths, Length: 187, dtype: int64

In [11]:
grouped_by_age = gbd.groupby('age_group')
deaths_by_age = grouped_by_age['deaths'].sum()
deaths_by_age

age_group
0-6 days        29879130
1-4 years       36682766
10-14 years      4514832
15-19 years      7168091
20-24 years      9373205
25-29 years      9879028
28-364 days     35717036
                 ...    
60-64 years     31214264
65-69 years     36202345
7-27 days       12791656
70-74 years     41089598
75-79 years     41247964
80+ years       75488584
All ages       474477841
Name: deaths, Length: 21, dtype: int64

In [12]:
grouped_by_year = gbd.groupby('year')
deaths_by_year = grouped_by_year['deaths'].sum()
deaths_by_year

year
1970    172866152
1980    175767051
1990    185702142
2000    204052611
2010    210567809
Name: deaths, dtype: int64

In [13]:
# Removed rows containing 'Sex' as 'Both' as it contained sum of 'Number of deaths' of both males and females

gbd = gbd.loc[gbd['sex'] != 'Both']
gbd = gbd.loc[gbd['age_group'] != 'All ages']
gbd.head()      # Removed rows containing 'Sex' as 'Both' as it contained sum of 'Number of deaths' of both males and females

  country_code country_name  year    age_group     sex  deaths  death_rate
0          AFG  Afghanistan  1970     0-6 days    Male   19241    318292.9
1          AFG  Afghanistan  1970     0-6 days  Female   12600    219544.2
3          AFG  Afghanistan  1970    7-27 days    Male   15939     92701.0
4          AFG  Afghanistan  1970    7-27 days  Female   11287     68594.5
6          AFG  Afghanistan  1970  28-364 days    Male   37513     15040.1

In [15]:
grouped_by_country = gbd.groupby('country_name')
print(grouped_by_country.ngroups)                  # Number of countries   

print(gbd['country_name'].nunique())        # Number of non-null unique countries 

187
187


In [16]:
group_by_year = gbd.groupby(['year'])   # Group by Year
group_by_year.head(5)                          # Prints first 5 rows in each year

    country_code country_name  year    age_group     sex  deaths  death_rate
0            AFG  Afghanistan  1970     0-6 days    Male   19241    318292.9
1            AFG  Afghanistan  1970     0-6 days  Female   12600    219544.2
3            AFG  Afghanistan  1970    7-27 days    Male   15939     92701.0
4            AFG  Afghanistan  1970    7-27 days  Female   11287     68594.5
6            AFG  Afghanistan  1970  28-364 days    Male   37513     15040.1
63           AFG  Afghanistan  1980     0-6 days    Male   16152    235901.0
64           AFG  Afghanistan  1980     0-6 days  Female   10838    167053.3
..           ...          ...   ...          ...     ...     ...         ...
193          AFG  Afghanistan  2000    7-27 days  Female    9189     28855.1
195          AFG  Afghanistan  2000  28-364 days    Male   34469      6734.3
252          AFG  Afghanistan  2010     0-6 days    Male   17163    127190.9
253          AFG  Afghanistan  2010     0-6 days  Female   11904     93233.3

In [18]:
group_by_year['deaths'].dtype

year
1970    int64
1980    int64
1990    int64
2000    int64
2010    int64
Name: deaths, dtype: object

In [19]:
grouped_by_age.size()     # get how many items are in each group

age_group
0-6 days       2805
1-4 years      2805
10-14 years    2805
15-19 years    2805
20-24 years    2805
25-29 years    2805
28-364 days    2805
               ... 
60-64 years    2805
65-69 years    2805
7-27 days      2805
70-74 years    2805
75-79 years    2805
80+ years      2805
All ages       2805
Length: 21, dtype: int64

In [20]:
grouped_by_age.get_group('7-27 days')[:10]         # get the first 10 rows of data in one specific group '7-27 days'

    country_code country_name  year  age_group     sex  deaths  death_rate
3            AFG  Afghanistan  1970  7-27 days    Male   15939     92701.0
4            AFG  Afghanistan  1970  7-27 days  Female   11287     68594.5
5            AFG  Afghanistan  1970  7-27 days    Both   27226     80912.5
66           AFG  Afghanistan  1980  7-27 days    Male   12551     63362.9
67           AFG  Afghanistan  1980  7-27 days  Female    9217     48823.2
68           AFG  Afghanistan  1980  7-27 days    Both   21768     56266.3
129          AFG  Afghanistan  1990  7-27 days    Male    8325     39454.3
130          AFG  Afghanistan  1990  7-27 days  Female    6203     30916.8
131          AFG  Afghanistan  1990  7-27 days    Both   14528     35293.1
192          AFG  Afghanistan  2000  7-27 days    Male   12365     36914.4

In [21]:
grouped_by_age.nth(0)            # get the 1st item in each group

            country_code country_name  death_rate  deaths   sex  year
age_group                                                            
0-6 days             AFG  Afghanistan    318292.9   19241  Male  1970
1-4 years            AFG  Afghanistan      4288.2   36694  Male  1970
10-14 years          AFG  Afghanistan       230.8    1723  Male  1970
15-19 years          AFG  Afghanistan       282.9    1816  Male  1970
20-24 years          AFG  Afghanistan       392.9    2240  Male  1970
25-29 years          AFG  Afghanistan       432.4    2050  Male  1970
28-364 days          AFG  Afghanistan     15040.1   37513  Male  1970
...                  ...          ...         ...     ...   ...   ...
60-64 years          AFG  Afghanistan      4229.2    4239  Male  1970
65-69 years          AFG  Afghanistan      5870.4    3968  Male  1970
7-27 days            AFG  Afghanistan     92701.0   15939  Male  1970
70-74 years          AFG  Afghanistan      8509.7    3435  Male  1970
75-79 years         

In [22]:
grouped_by_age.nth(2)            # get the 3rd item in each group

            country_code country_name  death_rate  deaths   sex  year
age_group                                                            
0-6 days             AFG  Afghanistan    270200.7   31840  Both  1970
1-4 years            AFG  Afghanistan      4158.6   69542  Both  1970
10-14 years          AFG  Afghanistan       245.9    3529  Both  1970
15-19 years          AFG  Afghanistan       302.8    3718  Both  1970
20-24 years          AFG  Afghanistan       447.9    4870  Both  1970
25-29 years          AFG  Afghanistan       547.0    4939  Both  1970
28-364 days          AFG  Afghanistan     14242.6   69626  Both  1970
...                  ...          ...         ...     ...   ...   ...
60-64 years          AFG  Afghanistan      4340.1    8752  Both  1970
65-69 years          AFG  Afghanistan      6205.6    8547  Both  1970
7-27 days            AFG  Afghanistan     80912.5   27226  Both  1970
70-74 years          AFG  Afghanistan      9419.7    7839  Both  1970
75-79 years         

## Reference and Dataset
In the DataSets section, I used the data from 'Health' i.e, Global Burden of disease. The dataset is in the link provided below:
https://public.tableau.com/en-us/s/resources