## Life Expectancy and Fertility Analysis

This notebook explores changes in life expectancy and fertility rates across various countries from 1964 to 2013. The goal is to identify trends and understand how different regions have developed over these five decades.


## Importing Necessary Libraries

- `pandas`: For loading and manipulating the dataset.
- `bokeh`: To create interactive visualizations that allow us to explore the data dynamically.


In [2]:
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool, Slider
from bokeh.layouts import column
from bokeh.io import push_notebook, curdoc
from bokeh.transform import factor_cmap
from bokeh.palettes import Category20

output_notebook()

## Data Loading and Cleaning

The data used in this notebook comes from the Gapminder dataset, which includes information on life expectancy, fertility rates, and population for various countries. Below, we load the data, handle missing values, and prepare it for the analysis. Specific steps include:
- Dropping rows with missing life expectancy or fertility data.
- Forward and backward filling population data to handle any temporal gaps.


In [12]:
def load_data(filepath):
    """ 
    Load and clean data from a csv file 
    """
    data = pd.read_csv(filepath)
    data.dropna(subset=['lifeExp', 'Fertility'], inplace=True)
    data['pop'] = data.groupby('Country')['pop'].ffill().bfill()
    data['ID'] = data['ID'].fillna('Unknown')
    return data

# Load data
gapminder_data = load_data('gapminder.csv')

## Creating the Visualization

Bokeh is used to create an interactive plot that displays life expectancy versus fertility for different countries. The color of each point corresponds to the region, providing a visual distinction between continents. Interactive tools like hover information and a slider to change the year are added to enhance the user's ability to explore the data.


In [11]:
def make_plot(source, initial_year):
    """ create bokeh plot with interactive tools
    """
    regions = list(set(source.data['Region']))
    color_mapper = factor_cmap('Region', palette=Category20[len(regions)] if len(regions) <= 20 else Category20[20], factors=regions)
    p = figure(title=f"Life Expectancy vs Fertility in {initial_year}", x_axis_label='Fertility', y_axis_label='Life Expectancy', height=600, width=800)
    # Scatter plot with culor based on region
    p.scatter('Fertility', 'lifeExp', source=source, color=color_mapper, line_color='black', fill_alpha=0.6)
    # hover tool 
    hover = HoverTool(tooltips=[("Country", "@Country"), ("Population", "@pop"), ("Life Exp.", "@lifeExp"), ("Fertility", "@Fertility")])
    p.add_tools(hover)
    return p

## Interactive components

A slider allows users to select the year for which they wish to view data. As the slider is moved, the visualization updates to reflect the data of the selected year. This interactivity helps users see trends over time and analyze how individual countries or regions have evolved.


In [10]:
from bokeh.io import show
from IPython.display import display
from bokeh.application.handlers import FunctionHandler
from bokeh.application import Application

def modify_doc(doc):
    initial_year = 1964
    source = ColumnDataSource(gapminder_data[gapminder_data['Year'] == initial_year])
    plot = make_plot(source, initial_year)
    slider = Slider(start=1964, end=2013, value=initial_year, step=1, title="Year")
    
    def update_plot(attr, old, new):
        # update plot based on year
        yr = slider.value
        new_data = gapminder_data[gapminder_data['Year'] == yr]
        source.data = new_data.to_dict(orient='list')
        plot.title.text = f"Life Expectancy vs Fertility in {yr}"
    
    slider.on_change('value', update_plot)
    layout = column(slider, plot)
    doc.add_root(layout)

handler = FunctionHandler(modify_doc)
app = Application(handler)

doc = app.create_document()
show(app, notebook_url="http://localhost:8888") 

## Conclusion

This interactive analysis helps to highlight the significant improvements in life expectancy across the world, particularly in regions that were lagging fifty years ago. However, the visualization also shows that disparities in fertility rates remain, reflecting varying socio-economic developments.