This is my final project.

Part 1: Reading in and preprocessing the data - I used a dataframe for this as its a quite flexible and powerfull datastructure, and I learned how to use it in previous tasks of this course.
It is necassary to manually preprocess the data and rename countries which are named different in Geopandas, so they are consistent

In [None]:
import pandas as pd

data = pd.read_csv('gapminder.csv')


'''
To get consistent country names, I used set operations (I put both names into sets, removed the intersectsion - so there are just countries left in geopandas but not in the .csv and vice versa).
The mapping is defined manually: It maps the names in data to the names in geopandas, and then uses replace to replace all names in data with the ones used in geopandas. 
'''


name_mapping = {
    'Western Sahara': 'W. Sahara',
    'United States': 'United States of America',
    'West Bank and Gaza':'Palestine',
    'Bosnia and Herzegovina': 'Bosnia and Herz.',
    'Korea, Rep.': 'South Korea',
    'Slovak Republic': 'Slovakia',
    'Congo, Dem. Rep.': 'Dem. Rep. Congo',
    'Czech Rep.': 'Czechia',
    'Yemen, Rep.': 'Yemen',
    "Cote d'Ivoire": "Côte d'Ivoire",
    'Korea, Dem. Rep.': 'North Korea',
    'South Sudan': 'S. Sudan',
    'Falkland Islands (Malvinas)': 'Falkland Is.',
    'Macedonia, FYR': 'North Macedonia',
    'Swaziland': 'eSwatini',
    'Cape Verde': 'Cabo Verde',
    'Northern Cyprus': 'N. Cyprus',
    'Congo, Rep.': 'Congo',
    'Equatorial Guinea': 'Eq. Guinea'
}

data['Country'] = data['Country'].replace(name_mapping)


Step 2: Creating a world map for the visualisation
For this, I will use the Bokeh library Geopandas.

For now, the map will be not related to the data yet, but rather just display a country name when the user goes with the mouse over it. 
Later, this country name can be replaced by the fertility and birth expectance values

In [None]:
pip install geopandas --user

In [None]:
import geopandas as gpd
from bokeh.io import output_notebook, show
from bokeh.models import GeoJSONDataSource, HoverTool
from bokeh.plotting import figure


'''
To load the worldmap, I used a preinstalled geopandas dataset (naturalearth_lowres).
I converted it into JSON format (like its done in the docs examples) and created an source. 

Then, I create a plot (like in th etutorials) and use .patches to set the source to the world-map.
For better visibility, I add black lines between the countries

Finally, I add a first basic hovertool, which just displays the name of a country, once the user 
holds his mouse over it. In one of the next steps, the exact birthrate and fertility should be displayed
together with the country. 
'''
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
geojson = world.to_json()
geo_source = GeoJSONDataSource(geojson=geojson)

#p = figure(title="Sample World Map", height=800, width=1400)
p = figure(title="Sample World Map", height=500, width=900)

p.patches(source=geo_source, line_color='black', line_width=0.5)

hover = HoverTool(tooltips=[('Test', '@name')])
p.add_tools(hover)

output_notebook()
show(p)

Step 3: 
- Color the countries depending on birthrate 
- Show the birthrate for the country on hover 

For now, both is implemented for a fixed year (2000) - in next steps, the user will be able to choose the year using a slider. 

In [None]:
import pandas as pd
import geopandas as gpd
from bokeh.models import GeoJSONDataSource, HoverTool, LinearColorMapper
from bokeh.plotting import figure, output_notebook, show
from bokeh.palettes import Viridis256
import matplotlib.pyplot as plt
import matplotlib.colors
from bokeh.transform import linear_cmap
from bokeh.palettes import YlOrRd, YlGn

'''
The palette goes from red over 
yellow to green (with green = high life expectancy, yellow = lower and red = lowest). 
Matplotlib has such a palette, called RdYlGn.
It is however necassary, to convert the values into hex, as the linear_cmap function expects hex-values
in its pallette parameter. This is done using a for loop iterating over all colors, converting it 
to hex and adding the converted color to ryg_palette, which is then returned
'''
def create_redyellowgreen_palette():
    cmap = plt.get_cmap('RdYlGn', 50)  

    ryg_palette = []
    for i in range(cmap.N):
        color = matplotlib.colors.rgb2hex(cmap(i))
        ryg_palette.append(color)
    return ryg_palette

'''
From now on, it will be necassary to merge the data-read in (from the csv) with the 
predefined map-data. This is done by using .merge and specifying the column names, 
on which the merge should happen (from the map-data: name, which represents the country name); 
from the data its the Country column. 

Then the merged object is converted to a GeoJSONDataSource Object (like before) again, which can be used
as source for a Bookeh-figure. 

Now what will be new, are coloring the countries based on the birthrate. This is quite simple: Like in the third 
Bookeh-Turorial, for this the fill_color attribute can be used on a linear_cmap. For the cmap, i created
the custom palette myself because I didnt find any of the available ones suitable. This happens in the function
create_redyellowgreen_palette(). The low and high are chosen based on the lowest and highest life-expectancy in the data.

Finally, I add the birthrate to the HoverTool too. This is very easy, because of the previous merge of mapdata and
life-expectancy, its possible to just access it using @lifeExp

'''
data_year_2000 = data[data['Year'] == 2000]
merged = world.merge(data_year_2000, left_on='name', right_on='Country')
geojson = merged.to_json()
geo_source = GeoJSONDataSource(geojson=geojson)

p = figure(title="World Map Colored by Life Expectancy", height=800, width=1400)

custom_palette = create_redyellowgreen_palette()
lifeexpectancy_color_mapper = linear_cmap(field_name='lifeExp', palette=custom_palette, low=min(data_year_2000['lifeExp']), high=max(data_year_2000['lifeExp']))

patches = p.patches('xs', 'ys', source=geo_source, line_color='black', line_width=0.5,
          fill_color=lifeexpectancy_color_mapper)

hover = HoverTool(tooltips=[('Country', '@name'), ('Life Expectancy', '@lifeExp')])
p.add_tools(hover)

output_notebook()
show(p)


Step 4: Add a slider


Now, I will add a slider with which the user is able to choose the year.
Then, the map will be colored according to the year, and also the value displayed when the user hovers over a country will the life expectancy for that year.

From now on, not all of the functionality might not be visible in the Notebook anymore. Instead, the script generates an .html-file
which needs to be opened to see the full functionality. This is because I used a JavaScript-Callback function
to implement the slider (my notebook supports that, but others might not)
Its recommended to open the .html file with Chrome or Firefox

In [None]:

from bokeh.models import Slider, CustomJS
from bokeh.layouts import column
from bokeh.plotting import show, save, output_file
from bokeh.plotting import save


'''
This method first initializes an empty dictionary geojson_data.
It then iterates through all years and for each year stores
year -> (data for the year merged with world map data) in the 
geojson_data dictionary. This is done, so when the user selects 
a year, the implementation can just set the data source for the map to 
geojson_data[year], as it will after this contain all the data
for that year. The initial year is 1964. 
'''
def initialise_full_geoJson(data):
    geojson_data = {}
    for year in range(1964, 2014):
        data_current_year = data[data['Year'] == year]
        merged_data = world.merge(data_current_year, left_on='name', right_on='Country').to_json()
        geojson_data[year] = merged_data

    geo_source = GeoJSONDataSource(geojson=geojson_data[1964])
    return  geojson_data, geo_source

'''
This code first initializes the geojson_data and the new source. 
Until now,  data just contained one year. Now, the data is 
for all years, stored in a dictionary geojson_data with the format 
year -> data-to-this-year
where data-to-this-year contains the worldmap data merged with the external data
on lifeexpectancies and fertilities, so it can be directly used as source.

After this, the plot is assigned the new geo_source.

Then, a slider is created, which ranges over all years there are Data for.
To make the plot change when the user moves the slider, it is necassary to 
use a Javascript-Callback function. Whenever the slider is moved, this function 
is called. It takes the year the user selected ("slider.value") and changes
the source for the plot to the data of this year. So if the user for example selects
2005, the callback function is called and sets the source of geojson to 
geojson_data[2005] 

Finally, this slider is added to the plot using "column" and a html-file is generated
using output_file().

Since both the coloring as well as the hover are based on the source, changing the 
source will ensure both are refering to the year currently selected by the user
'''

geojson_data, geo_source_new = initialise_full_geoJson(data)
lifeexpectancy_color_mapper = linear_cmap(field_name='lifeExp', palette=custom_palette, low=30, high=max(data['lifeExp']))

patches = p.patches('xs', 'ys', source=geo_source_new, line_color='black', line_width=0.5,
          fill_color=lifeexpectancy_color_mapper)

slider = Slider(start=1964, end=2013, value=1964, step=1, title='Year')


''''''
callback = CustomJS(args=dict(source=geo_source_new, geojson_data=geojson_data, slider=slider), code="""
    var selectedYear = slider.value;
    var YearAsString = selectedYear.toString();
    var YearAsInt = parseInt(YearAsString);
    source.geojson = geojson_data.get(YearAsInt);
""")

slider.js_on_change('value', callback)
layout = column(slider, p)
show(layout)
save(layout)  
output_notebook()

output_file("LifeExpectancies.html")



Step 5: Implementing an animation

Now, I add a button, which starts an animation when clicked by the user. 
The Animation starts at 1964 and runs until 2013, showing the map colored based on the life expectancy for each year.

In [None]:
from bokeh.models import Button, CustomJS
from bokeh.layouts import layout


'''
This code implements the animation. It adds a Button to the Layout, and assigns it a callback function that is 
called when the user clicks on the button. 
The animation itself is implemented in this callback function. Basically, it again just changes the source to a different
year. When the function is called, it calls "setInterval" on animate - which will repeatedly call the function itself 
every 0.1s (second parameter). Now every time the function is called, the function gets the current year (stored in the slider),
increases it by 1 and sets the data to this new year. This will happen all 0.1s, untill the year 2013, where the animation
is then stopped using clearInterval().
'''

animation_button = Button(label="Start Animation", button_type="success")
animation_callback = CustomJS(args=dict(source=geo_source, geojson_data=geojson_data, slider=slider, button=animation_button), code="""
    var speed;

    function animate() {
        var year = slider.value;
        if (year < 2013) {
            slider.value = year + 1;
            source.geojson = geojson_data.get(year+1);
        } else {
            alert('Animation Done!');
            clearInterval(speed);
            button.label = 'Start Animation';
        }
    }
        button.label = 'Animation Running';
        speed = setInterval(animate, 100); 
    
""")
animation_button.js_on_click(animation_callback)

layout = column(slider, animation_button, p)
show(layout)
output_file("LifeExpectancies.html")
save(layout)


Step 6: Now, I will add the same for the Fertility. To choose between Fertility and Life Expectancy, a drop-down menu is added

In [None]:
from bokeh.models import Select


'''
I started by adding the fertility to the hover. Next, a select-element is added with the options
to display Life Expectancy, Fertility, Both and the Relationship. 

After this, a new color_mapper is defined for the Fertility. The coloring is the opposite to the 
Life Expectancy: Low fertility countries are colored green, high fertility countries are colored red. 

Finally, in the callback function of the Select, the color-mapper for the map is set to the life-expectancy
color mapper when the user selects lifeexpectancy, and fertility color mapper when he selects
the fertility. 
'''
#I start with adding the Fertility to the Hover (in both cases, only the coloring will be dependend on the selected option)
hover = HoverTool(tooltips=[('Country', '@name'), ('Life Expectancy', '@lifeExp'), ('Fertility', '@Fertility')])
p.add_tools(hover)

#now drop down menu
select = Select(title="Select Data:", value="Life Expectancy", options=["Life Expectancy", "Fertility"])

fertility_color_mapper = linear_cmap(field_name='Fertility', palette=custom_palette, low=max(data['Fertility']), high=min(data['Fertility']))


callback = CustomJS(args=dict(source=geo_source, select=select, fertility_color_mapper=fertility_color_mapper, lifeexpectancy_color_mapper=lifeexpectancy_color_mapper, patches = patches), code="""
    if(select.value === 'Life Expectancy') {
        patches.glyph.fill_color = lifeexpectancy_color_mapper;
    } else if(select.value === 'Fertility') {
        patches.glyph.fill_color = fertility_color_mapper;
    }
""")

select.js_on_change('value', callback)

layout = column(select,slider, animation_button,  p)
output_notebook()
show(layout)
output_file("LifeExp_Fertilities.html")
save(layout)



Step 7: Now , for each country, on click I want to display a plot. This plot will contain the Life Expectancy and Fertility as a Line diagram (for this country). By that, I visualize the relation between Life Expectancy and Fertility for each country, and visually answer the research question. 



In [None]:
from bokeh.io import show, output_notebook
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource,  LinearAxis, Range1d, TapTool
from bokeh.layouts import row



'''
This code defines and returns an (initially empty) line plot. line_source is the data, 
which will be set whenever the user clicks on a country (using a callback function) and 
used as source data for the lineplot. It is initially an empty ColumnDataSource.

The lineplot itself contains the year on the x and the Fertility and LifeExpectancy on the
y-axis. As the fertility is usually significantly smaller than the life expectancy, I decided
to have different scales for them. This was implemented using  the built-in 
functions .y_range and .extra_y_range (they use a Range1d object with the largest/smallest life expectancy
and fertility). So on the left side the range/legend for fertility will display, and on the right side the 
range/legend for life expectancy. 
'''
def define_lineplot(data):
    line_source = ColumnDataSource({'Year': [], 'Fertility': [], 'LifeExp': []})

    line_plot = figure(title="Fertility and Life Expectancy Over Time", height=300, width=600, x_axis_label='Year', y_axis_label='Fertility', visible=False)
    line_plot.y_range = Range1d(start=min(data['Fertility']), end=max(data['Fertility']))  
    line_plot.extra_y_ranges = {"LifeExp": Range1d(start=min(data['lifeExp']), end=max(data['lifeExp']))} 
    line_plot.add_layout(LinearAxis(y_range_name="LifeExp", axis_label="Life Expectancy"), 'right')

    line_plot.line(x='Year', y='Fertility', source=line_source, legend_label='Fertility', color='red')
    line_plot.line(x='Year', y='LifeExp', source=line_source, legend_label='Life Expectancy', color='green', y_range_name="LifeExp")

    return line_plot, line_source

'''
This method defines and returns the button X, which is displayed when the user opens a lineplot.
When X is clicked, the lineplot is closed again and the worldmap is opened again. This
is done by changing the .visible attribute of the worldmap and all associated items.
Initially the worldmap is displayed, i.e. the close-button is not visible
'''

def define_closebutton(data, line_plot, p, slider, animation_button, select):
    close_button = Button(label="X", width=20, visible = False)
    close_button.js_on_click(CustomJS(args=dict(line_plot=line_plot, world_map=p, slider=slider, animation_button=animation_button, close = close_button,select = select), code="""
        line_plot.visible = false;
        close.visible = false;
        world_map.visible = true;
        slider.visible = true;
        animation_button.visible = true;
        select.visible = true;
    """))
    return close_button


hover = HoverTool(tooltips=[('Country', '@name'), 
                            ('Life Expectancy', '@lifeExp'), 
                            ('Fertility', '@Fertility'),
                            ('Click for Details', 'Click on the country to see a detailed view')])
p.add_tools(hover)
line_plot, line_source = define_lineplot(data)
close_button = define_closebutton(data, line_plot, p, slider, animation_button, select)


'''
This code implement the functionality, that when a country on the map is clicked by the user, 
the line diagramm to that country, showing its life expectancy and fertility is displayed.
I used (again) a Javascript callback function to implement this. 

I get the selected country using geo_source.data[]. This contains the country which was
clicked on the map. First, I set the plot title, so the user can see which country the 
diagram shows the plot for. Then, I parse the data into JSON format. I create a dictionary
for the data of the line-plot, and fill it with the Year, Fertility and LifeExpectancy
of the selected country using .push. To enforce the cahnges being updated every time, 
.change.emit() is used, else there is a delay sometimes. Finally, the plot is set visible,
the close button is set visible, and everything else belonging to the world map is set invisible
 '''
callback = CustomJS(args=dict(line_source=line_source, line_plot=line_plot, world_map=p, geo_source=geo_source, data=data.to_json(orient='split'), slider=slider, select=select, animation_button=animation_button, close=close_button), code="""
    var indices = geo_source.selected.indices;
    if(indices.length > 0) {
        var country_name = geo_source.data['name'][indices[0]];
        line_plot.title.text = 'Life Expectancy and Fertility for ' + country_name;
        line_plot.change.emit();

        // Filter data for the selected country
        var all_data = JSON.parse(data);
        line_source.data = {'Year': [], 'Fertility': [], 'LifeExp': []};
        for(var i = 0; i < all_data.index.length; i++) {
            if(all_data.data[i][0] === country_name) {
                line_source.data['Year'].push(all_data.data[i][1]);
                line_source.data['Fertility'].push(all_data.data[i][4]);
                line_source.data['LifeExp'].push(all_data.data[i][2]);
            }
        }
        line_source.change.emit();

        line_plot.visible = true;
        close.visible = true;
        world_map.visible = false;
        slider.visible = false;
        animation_button.visible = false;
        select.visible = false;
    }
""")

p.add_tools(TapTool(callback=callback))

layout = column(select, row(close_button, line_plot), slider, animation_button, p)
handle = show(layout, notebook_handle=True)

output_notebook()
output_file("LifeExp_Fertilities.html")
save(layout)

Step 8: Now a third map is added, which aims to answer the research question more directly.

The hypothesis is: The life expectancy increases, and the fertility decreases.

A third map is added. This map will show both, the life expectancy increase and the fertility decrease within a time interval. The intervall can be chosen by the user (like before). Countries, for which the hypothesis 
holds true, are colored green. Countries, for which there was no significant change in both, are colored yellow. Contries, for which the hypothesis is false are colored red (contradicting the hypothesis means: either life expectancy decreased, or fertility increased, or life expectancy increased but fertility stayed the same, or life expectancy increased but fertility stayed the same).

The second cell contains the code for precomputation of all colors (which is stored in a file). The first cell the implementation of the map. The file is already there (so the precomputation already took space), the second cell is just for reference - it takes around 10 minutes to recompute all values.

Furthermore, to the original plot, a button is added. By clicking on the button, the user can look at the hypothesis map. 

In [None]:


def setup_plot(merged):
  '''
  This function takes the merged data (geo-data and life-Expectancies/Fertilities). 
  The first step is to merge the colors into the data, i.e. for each country, there 
  should be a color (which says whether it matches the hypothesis or not). This 
  is for the start-year 1963 and the end year 2013 (as these are the initial years). 
  The method selects the columns for this year, drops the rest, drops the year so 
  in the end the df only has 2 columns "Country" and "Color" and can be merged with the rest
  of the data. The source of the map is not put on this merged dataframe, and the countrys
  are filled with the color in the "Color"-column. Later, when the user changes the years,
  this color-column will be changed, which automatically causes the color to also change
  '''
  colors = pd.read_csv('colors.csv')
  colors.set_index(['Start_Year', 'End_Year'], inplace=True)
  relevant_data = colors.loc[(1964, 2013)]

  country_colors = relevant_data.drop(columns=['Start_Year', 'End_Year'])
  country_colors_transposed = country_colors.transpose().reset_index()
  country_colors_transposed.columns = ['Country', 'Color']
  merged_map_data = merged.merge(country_colors_transposed, left_on='name', right_on='Country', how='left')

  geo_source = GeoJSONDataSource(geojson=merged_map_data.to_json())
  plot_hypothesis = figure(title="Relationship LifeExpectancy Fertility Map", height=800, width=1400)
  patches = plot_hypothesis.patches('xs', 'ys', source=geo_source, fill_color='Color', line_color='black', line_width=0.5)
  return colors, geo_source, plot_hypothesis, patches

#setup the plot
colors, geo_source, plot_hypothesis, hypothesis_patches = setup_plot(merged)


#create two sliders for the user to select start-year and end-year
slider_start = Slider(start=1964, end=2013, value=1964, step=1, title="Start Year")
slider_end = Slider(start=1964, end=2013, value=2013, step=1, title="End Year")



'''
This is the callback function for the sliders (Both use the same). Whenever a slider is changed,
the callback function gets the years. It then iterates through all countries, and for each
country constructs the key (startYear, endYear) to access the color from colors.to_json(). 
It accesses the color stored in colors using this key, and sets the column "Color" in the geo_source-data
to that color. This automatically updates the color, because the color is displayed based on that column Color
'''
callback = CustomJS(args=dict(source=geo_source, df_json=colors.to_json(), 
                              slider_start=slider_start, slider_end=slider_end), code="""
    // Parse the JSON string to get the color data
    var color_data = JSON.parse(df_json);
    console.log(color_data);

      var data = source.data;
      for (var i = 0; i < data['name'].length; i++) {
      
        var country = data['name'][i];
        var tuple_key = "(" + slider_start.value.toString() + ", " + slider_end.value.toString() + ")";
        var country_color = color_data[country][tuple_key];
        data['Color'][i] = country_color;
    }
    source.change.emit();
""")

slider_start.js_on_change('value', callback)
slider_end.js_on_change('value', callback)

output_file("Hypothesis.html")

layout = column(slider_start, slider_end, plot_hypothesis)
show(layout, notebook_handle=True)




In [None]:

#now I also adapt the original plot to include a button which points to the hypothesis plot 
open_hyp_button = Button(label="View Hypothesis", button_type="success")
callback = CustomJS(code="window.open('Hypothesis.html', '_blank')")
open_hyp_button.js_on_click(callback)
output_file("LifeExp_Fertilities.html")

layout = column(open_hyp_button, select, row(close_button, line_plot), slider, animation_button, p)
show(layout, notebook_handle=True)



This is the code for the precomputation - just for reference

In [None]:

def compute_color(data, start_year, end_year, country):
    '''
    This method takes the data, a start year, an end year and a country and computes the color for
    this contry, in this interval,based on life expectancy and fertility. The idea is that the color is
    green if the data matches the hypothesis, grey if there is no significant change in both and red 
    if the data doesnt match the hypothesis. 

    first, data for start year and and year is filtered, then the difference of life expectancy 
    and fertility is computed. Based on these differences, the color is returned. The method also 
    weights in the year-difference as the factor (as there is a bigger life expectancy change to expect
    in 50 years than in 1 year). 
    '''

    data_start_year = data[(data['Year'] == start_year) & (data['Country'] == country)]
    data_end_year = data[(data['Year'] == end_year) & (data['Country'] == country)]

    years_diff = end_year - start_year
    factor = years_diff / 50 #is around 1 for 1963-2014

    lifeexp_diff = data_end_year['lifeExp'].values[0] - data_start_year['lifeExp'].values[0]
    fertility_diff = data_end_year['Fertility'].values[0] - data_start_year['Fertility'].values[0]

    if lifeexp_diff > (3*factor) and fertility_diff < (-0.4*factor):
        return 'green'
    elif lifeexp_diff > -(3*factor) and fertility_diff< (0.4*factor):
        return 'grey'
    else:
        return 'red'


    

'''
Here,all colors for all combinations of start year, end year and country are computed.
This is done using a Multi-Index for the data (and pandas built-in .from_product), and the countries for the columns. 

Then, for each combination of (startyear, endyear) and country, the color 
is computed using the helper method compute_color and finally the result stored in a .csv

This takes around 10 minutes, however in the submission the .csv is already included so this 
cell does not has to be executed
'''

countries = list(set(data['Country']))
multi_index = pd.MultiIndex.from_product([range(1964, 2014), range(1964, 2014)], names=['Start_Year', 'End_Year'])

precomputed_df = pd.DataFrame(index=multi_index, columns=countries)

for (start_year, end_year) in precomputed_df.index:
    if end_year > start_year:  
        for country in countries:
            precomputed_df.at[(start_year, end_year), country] = compute_color(data, start_year, end_year, country) 

precomputed_df.to_csv('colors.csv')


