# Life expectancy dataset transformation

The life expectancy dataset can be downloaded from [Our World in Data](https://ourworldindata.org/life-expectancy). It is defined as: *Life expectancy at birth is defined as the average number of years that a newborn could expect to live if he or she were to pass through life subject to the age-specific mortality rates of a given period.*

## Import Libraries and load  data

In [21]:
import pandas as pd
import numpy as np
import json

In [22]:
df = pd.read_csv('life-expectancy.csv',  encoding = 'ISO-8859-1', index_col=False)
df.head()

Unnamed: 0,Entity,Code,Year,Life expectancy (years)
0,Afghanistan,AFG,1950,27.638
1,Afghanistan,AFG,1951,27.878
2,Afghanistan,AFG,1952,28.361
3,Afghanistan,AFG,1953,28.852
4,Afghanistan,AFG,1954,29.35


## Group data by region
We are interested only in the life expectancy only on a world level and in different continents.

In [23]:
regions = ['Africa', 'World', 'Asia', 'Americas', 'Europe', 'Oceania']

In [24]:
gb_regions = df.loc[df['Entity'].isin(regions)].groupby('Entity')
gb_regions_sort = gb_regions.apply(lambda x: x.sort_values(['Year'], ascending=True))
gb_regions_sort = gb_regions_sort[['Year', 'Life expectancy (years)']]
gb_regions_sort.head(100)

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,Life expectancy (years)
Entity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Africa,70,1770,26.400000
Africa,71,1925,26.400000
Africa,72,1950,36.450000
Africa,73,1951,36.712000
Africa,74,1952,37.234000
...,...,...,...
Americas,378,1967,64.116911
Americas,379,1968,64.365153
Americas,380,1969,64.633699
Americas,381,1970,64.925741


### Transform to NVD3 format
Transform the data and dump it to JSON to be used by nvd3.

In [25]:
lines_nvd = []
for idx, data in gb_regions_sort.groupby(level='Entity'):
    line = {'key' : str(idx)}
    line['values'] = []
    for x, y in zip(data['Year'], data['Life expectancy (years)']):
        line['values'].append({'x': x, 'y': y})
    lines_nvd.append(line)

In [26]:
with open('life-expectancy-nvd.json', 'w') as f:
    json.dump(lines_nvd, f)

### Transform to APEX format 
Transform the data and dump it to JSON to be used by APEX.

In [27]:
series_apex = []
for idx, data in gb_regions_sort.groupby(level='Entity'):
    s_item = {'name' : str(idx)}
    s_item['data'] = []
    for x, y in zip(data['Year'], data['Life expectancy (years)']):
        s_item['data'].append({'x': x, 'y': y})
    series_apex.append(s_item)

In [28]:
with open('life-expectancy-apex.json', 'w') as f:
    json.dump(series_apex, f)

### Transform to plotly format
Transform the data and dump it to JSON to be used by plotly.

In [29]:
traces_ptly = []
for idx, data in gb_regions_sort.groupby(level='Entity'):
    if idx != 'World':
        trace = {'name' : str(idx), 'x' : [], 'y' : []}
        for x, y in zip(data['Year'], data['Life expectancy (years)']):
            if x >= 1970:
                trace['x'].append(int(x))
                trace['y'].append(y)
        traces_ptly.append(trace)

In [30]:
with open('life-expectancy-plotly.json', 'w') as f:
    json.dump(traces_ptly, f)