# The change in population around the world between 1950 and 2021

***The goal of our project is to analyse the change in population in the World across time. In the project, we are taking a close look into the similarities and differences in population between continents. Moreover, by deviding the population into 15 different age groups, we are able to analyze the age distribution in different continents. Lastly, the following projects presents the forecast of the population growth based on the data.***

In order for the code to run we are importing and setting magics:

In [17]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import VBox
import plotly.express as px
import plotly.graph_objects as go
from IPython.display import display
from prophet import Prophet
from IPython.display import clear_output

# Read and clean data

We imported data on the population of the world between 1950 and 2021 from Our World in Data. We decided to clean the data such that the final data set contains information on the aggregated population in the continents and the world as a total. The columns of the data set are name of the continent, year, total population of each continent and and world, and the population of different groups. 

The age groups are as follows: Population of children under the age of 1, Population aged 1 to 4 years, Population aged 5 to 9 years, Population aged 10 to 14 years, Population aged 15 to 19 years, Population aged 20 to 29 years, Population aged 30 to 39 years, Population aged 40 to 49 years, Population aged 50 to 59 years, Population aged 60 to 69 years, Population aged 70 to 79 years, Population aged 80 to 89 years, Population aged 90 to 99 years and Population older than 100 years.

Source: https://ourworldindata.org/explorers/population-and-demography?time=earliest..2021&facet=none&pickerSort=asc&hideControls=false&Metric=Population&Sex=Male&Age+group=Total&Projection+Scenario=None&country=Europe+%28UN%29~Asia+%28UN%29~Africa+%28UN%29~Latin+America+and+the+Caribbean+%28UN%29~Oceania+%28UN%29~Northern+America+%28UN%29~OWID_WRL

Global data

In [18]:
pop = pd.read_csv('population-and-demography.csv')

continents = ['Africa (UN)', 'Europe (UN)', 'Asia (UN)', 'World', 'Northern America (UN)', 'Latin America and the Caribbean (UN)', 'Oceania (UN)']

filtered_pop = pop[pop['Country name'].isin(continents)]

drop_these = ['Population at age 1','Population of children under the age of 5', 'Population of children under the age of 15', 'Population under the age of 25', 'Population aged 15 to 64 years', 'Population older than 15 years', 'Population older than 18 years', 'population__all__20_24__records', 'population__all__25_29__records', 'population__all__30_34__records', 'population__all__35_39__records', 'population__all__40_44__records', 'population__all__45_49__records', 'population__all__50_54__records', 'population__all__55_59__records', 'population__all__60_64__records', 'population__all__65_69__records', 'population__all__70_74__records', 'population__all__75_79__records', 'population__all__80_84__records', 'population__all__85_89__records', 'population__all__90_94__records', 'population__all__95_99__records' ]

filtered_pop = filtered_pop.drop(drop_these, axis = 1)


new_index = range(1,505)

filtered_pop.index = new_index

filtered_pop.rename(columns = {'Country name' : 'Continents and World'}, inplace = True)
filtered_pop.head(504)

Unnamed: 0,Continents and World,Year,Population,Population of children under the age of 1,Population aged 1 to 4 years,Population aged 5 to 9 years,Population aged 10 to 14 years,Population aged 15 to 19 years,Population aged 20 to 29 years,Population aged 30 to 39 years,Population aged 40 to 49 years,Population aged 50 to 59 years,Population aged 60 to 69 years,Population aged 70 to 79 years,Population aged 80 to 89 years,Population aged 90 to 99 years,Population older than 100 years
1,Africa (UN),1950,227549260,9393661.0,29672912.0,29653958,25792740,23087212,37677212,27422540,19586988,13178286,7893063,3445317,700903,43862,615.0
2,Africa (UN),1951,232484000,9684508.0,30318004.0,30523140,26205760,23471334,38505124,28050804,19993920,13463632,8015908,3499692,709683,41884,602.0
3,Africa (UN),1952,237586060,9921448.0,31046300.0,31460048,26651612,23855340,39329870,28694808,20406358,13754924,8147429,3558314,718430,40599,582.0
4,Africa (UN),1953,242837440,10167668.0,31877272.0,32368278,27171188,24225864,40120050,29340532,20827816,14056554,8291982,3621039,728561,40087,557.0
5,Africa (UN),1954,248244770,10409928.0,32848608.0,33192114,27802460,24585024,40875776,29989164,21257012,14365597,8451841,3685246,741198,40304,499.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
500,World,2017,7599822300,139420590.0,550940160.0,668653200,631596500,607785800,1207877900,1092995100,953971900,772621950,549785660,285532670,119566930,18615136,459150.0
501,World,2018,7683790000,137690030.0,550970400.0,674324740,638451460,611398340,1204281300,1114043900,958684500,788847800,566538940,295304100,123199544,19578904,475921.0
502,World,2019,7764951000,135471330.0,549401300.0,678417150,645915800,615536000,1200389900,1133424100,963090600,809066200,579960400,306498750,126700504,20575482,503572.0
503,World,2020,7840953000,133345180.0,545801000.0,681410500,653182000,619493500,1196254600,1151358000,969082800,830432400,590299300,318539100,129707720,21499228,547543.0


## **Plots**

In this section, the data has been presented in various graphs.

## 1.1 :  Global population

The following interactive figure presents the total change in population. The World's population trend is clearly rising. The upward trend applies for all the continents as well. In order to take a closer look at the continents, 'World' should be 'unclicked'. The interesting change in the population's trend could be observed for Europe. In the late 1990s and early 2000s a drop in population has occured.

In [19]:
# Selecting only the desired columns
selected_columns = filtered_pop[['Continents and World', 'Year', 'Population']]

# Create an interactive line plot using Plotly Express
fig = px.line(selected_columns, x='Year', y='Population', color='Continents and World',
              title='Population in the World and the continents in years 1950-2021',
              labels={'Population': 'Population in billion', 'Year': 'Year'})

# Adjust layout parameters
fig.update_layout(
    title={'text': 'Population in the World and the continents in years 1950-2021', 'font': {'size': 20}},
    legend={'font': {'size': 10}},
    width=900,  # Adjust width of the plot
    height=600,  # Adjust height of the plot
)
# Show the interactive plot
fig.show()

## 1.2: Population in detail 

We made an interactive table to show the population of different age groups between 1950 and 2021. We classified all the countries into 6 continents and a data will be shown base on the selected year and age group. 

It is observed that Asia occupies approximately half or more of the world's population for most of the years. It has largest population among the contenients. 




In [20]:

years = sorted(filtered_pop['Year'].unique())
age_groups = filtered_pop.columns[3:] 

# Container for the interactive output (the table)
output_container = widgets.Output()

# Create dropdown widgets for selecting the year and age group
year_dropdown = widgets.Dropdown(options=years, description='Year:')
age_group_dropdown = widgets.Dropdown(options=age_groups, description='Age Group:')

def create_population_table(selected_year, selected_age_group):
    with output_container:
        clear_output(wait=True)  
        
        # Filter the DataFrame for the selected year
        filtered_data = filtered_pop[filtered_pop['Year'] == selected_year]
        
        final_df = filtered_data[['Continents and World', selected_age_group]]
        
        final_df = final_df.reset_index(drop=True)
        
        display(final_df)

# Handler for dropdown change events
def on_dropdown_change(change):
        
        selected_year = year_dropdown.value
        selected_age_group = age_group_dropdown.value
        create_population_table(selected_year, selected_age_group)

# Attach the function to the dropdowns' change events
year_dropdown.observe(on_dropdown_change)
age_group_dropdown.observe(on_dropdown_change)

# Display the widgets and the output container
display(year_dropdown, age_group_dropdown, output_container)

create_population_table(years[0], age_groups[0])

Dropdown(description='Year:', options=(1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961,…

Dropdown(description='Age Group:', options=('Population of children under the age of 1', 'Population aged 1 to…

Output()

## 2.1: Population of children under the age of 1

The following plot indicates the data of the new borns among the continents. 

We have observed that Europe shows a declining trend for population of children under the age of 1 from 1950 to 2021, which may implicate that Europe is facing a derease in fertility rate.

The population of Northern America plummets from 1965 to 1975 which could be due to the Vietnam war.


In [21]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population of children under the age of 1']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population of children under the age of 1', color='Continents and World', 
              title='Population of children under the age of 1 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()


## 2.2: Population aged 20 to 29 years

This plot represents the data for young people (20 to 29 years) in each continent.

We see a clear tendency for Africa which sees it young adult population increase rapidly. On the other hand, after a continuous growth in Asia, this one is going downwards, it might be explained by the "one child policy", which had been abolished in 2015, in ruled since 1979.

Finally, the European young adult population is decreasing, this may be the sign of a decreasing fertility within European countries but also with population who do not wish to have children. Moreover, the issue is not that recent as the population started to decline in 1986.

In [22]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 20 to 29 years']

# Create an interactive line plot with multiple curves using Plotly Express 
fig = px.line(continents_data, x='Year', y='Population aged 20 to 29 years', color='Continents and World', 
              title='Population aged 20 to 29 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.3: Population aged 50 to 59 years

This plots indicate the growth of the population aged 50 to 59 years for all the continent.

We can see a clear tendency for Africa, Asia, Oceania and Latin America to see their population aged 50 to 59 years expand whereas for 10 years Europe is basically stagnating and tends to decrease a little bit for the last years. The same thing is happening in Northern America. this could be driven by the fact that these continents are seeing their population getting older and that these old population are not replaced by younger one. 

In [23]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 50 to 59 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 50 to 59 years', color='Continents and World', 
              title='Population aged 50 to 59 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.4: Population aged 80 to 89 years

The following data is repesenting people aged between 80 and 89 years in every continent around the globe.

Because Asia is the most populous continent, it is not surprising to find it above the other curves. We can therefore notice that Europe has a great amount of people between 80 and 89 years, this could be explained by the life expectancy which is going upward, thus increasing the number of people reaching "very old" ages.

We remark also that Africa is very under represented in this plot, Africa has a very young population so whith a forecast we would probably see a quick increase of the population in these years during the following decades. 

In [24]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 80 to 89 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 80 to 89 years', color='Continents and World', 
              title='Population aged 80 to 89 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 3 Population distrbution by age group among the continents between 1950 and 2021

**In order to run the following cell, you need to install ipywidgets in your terminal : pip install ipywidgets**


The following stack bar plot presents the age distrbution of different continents. 

We observed that population aged 20 to 29 has the largest percentage among the age groups in the 1950s whereas the distribution for the elder groups increased in the 2020s which could be a result of medical advances and thus increase of life expectancy. 

Europe and the Northern America has a lower proportion of population aged from 1 to 39 as compared to the other continents and even worsened in 2021 in comparision with the previous years which may indicate that they might encounter problems with ageing population.

In [25]:
# Assuming filtered_pop is already defined and loaded
filtered_pop_without_World = filtered_pop[filtered_pop['Continents and World'] != 'World']

years = sorted(filtered_pop_without_World['Year'].unique())
age_groups = filtered_pop_without_World.columns[3:]

# Define a color map for each region
color_map = {
    'Africa (UN)': 'green',
    'Europe (UN)': 'blue',
    'Asia (UN)': 'red',
    'Northern America (UN)': 'orange',
    'Latin America and Caribbean (UN)': 'purple',
    'Oceania (UN)': 'brown'
}

def create_histogram_plot(selected_year, selected_age_group):
    year_data = filtered_pop_without_World[filtered_pop_without_World['Year'] == selected_year]
    age_group_data = year_data[['Continents and World', selected_age_group]]
    
    fig = go.Figure()

    for continent, color in color_map.items():
        continent_data = age_group_data[age_group_data['Continents and World'] == continent]
        fig.add_trace(go.Bar(
            x=continent_data['Continents and World'],
            y=continent_data[selected_age_group],
            name=continent,
            marker=dict(color=color)
        ))

    fig.update_layout(
        title_text=f'Population Distribution for {selected_age_group} in {selected_year}',
        xaxis_title='Continent',
        yaxis_title='Population'
    )
    fig.show()

# Widgets for selecting year and age group
year_dropdown = widgets.Dropdown(
    options=years,
    value=years[0],
    description='Year:'
)

age_group_dropdown = widgets.Dropdown(
    options=age_groups,
    value=age_groups[0],
    description='Age Group:'
)

def update_plot(change):
    create_histogram_plot(year_dropdown.value, age_group_dropdown.value)

year_dropdown.observe(update_plot, names='value')
age_group_dropdown.observe(update_plot, names='value')

# Display the widgets
display(VBox([year_dropdown, age_group_dropdown]))
create_histogram_plot(years[0], age_groups[0])

VBox(children=(Dropdown(description='Year:', options=(1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 19…

## 4 Population distrution among the continents between 1950 and 2021

The following pie-chart presents the distribution of the population (in percentage) between the continents across years. Due to the interactive part of the chart, one is able to select a year and find the information about the distribution that interest him/her.

In [26]:
# Get unique years from the DataFrame
years = sorted(filtered_pop['Year'].unique())

# Define a fixed color mapping for each continent
color_mapping = {
    'Africa (UN)': 'green',
    'Europe (UN)': 'blue',
    'Asia (UN)': 'red',
    'Northern America (UN)': 'orange',
    'Latin America and the Caribbean (UN)': 'purple',
    'Oceania (UN)': 'brown'
}

# Define a function to create the pie chart based on the selected year
def create_pie_chart(selected_year):
    # Filter data for the selected year
    year_data = filtered_pop[filtered_pop['Year'] == selected_year]

    # Exclude data for 'World'
    continent_data = year_data[year_data['Continents and World'] != 'World']

    # Calculate the total population for each continent
    continent_population = continent_data.groupby('Continents and World')['Population'].sum().reset_index()

    # Create a pie chart with fixed colors
    fig = go.Figure(data=[go.Pie(
        labels=continent_population['Continents and World'],
        values=continent_population['Population'],
        marker=dict(colors=[color_mapping[continent] for continent in continent_population['Continents and World']])
    )])
    fig.update_layout(title=f'Population Distribution for {selected_year}')
    fig.show()

# Create a dropdown menu widget for selecting the year
year_dropdown = widgets.Dropdown(options=years, description='Select Year:')

# Define a function to handle the dropdown menu change event
def on_year_change(change):
    selected_year = change.new
    create_pie_chart(selected_year)

# Attach the function to the dropdown menu's change event
year_dropdown.observe(on_year_change, names='value')

# Display the dropdown menu
display(year_dropdown)

# Create the initial pie chart for the first year in the dataset
create_pie_chart(years[0])

Dropdown(description='Select Year:', options=(1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960…

## **Calculations**

We have made the calculations on the global growth rate between years 1950 and 2021 for the world and the different continents.

In [27]:

# Specify the beginning and ending years
beginning_year = 1950
ending_year = 2021

# Create a DataFrame to store growth rates
growth_rates = {'Continent': [], 'Growth Rate (%)': []}

# Calculate the growth rate for each continent
for continent in filtered_pop['Continents and World'].unique():
    # Filter the DataFrame to select data for the beginning and ending years for the continent
    continent_data = filtered_pop[filtered_pop['Continents and World'] == continent]
    beginning_population = continent_data.loc[continent_data['Year'] == beginning_year, 'Population'].iloc[0]
    ending_population = continent_data.loc[continent_data['Year'] == ending_year, 'Population'].iloc[0]

    # Calculate the compound annual growth rate (CAGR)
    cagr = (ending_population / beginning_population) - 1

    # Convert the growth rate to percentage
    growth_rate_percentage = cagr * 100

    # Append the growth rate to the growth_rates dictionary
    growth_rates['Continent'].append(continent)
    growth_rates['Growth Rate (%)'].append(growth_rate_percentage)

# Convert the growth rates dictionary to a DataFrame
growth_rates_df = pd.DataFrame(growth_rates)

# Display the growth rates DataFrame
print(growth_rates_df)


                              Continent  Growth Rate (%)
0                           Africa (UN)       512.472394
1                             Asia (UN)       240.421434
2                           Europe (UN)        35.554729
3  Latin America and the Caribbean (UN)       289.755444
4                 Northern America (UN)       131.525984
5                          Oceania (UN)       253.737450
6                                 World       216.457623


Next, we have calculated the average growth rate per year for the world and the continents.

In [28]:
# Specify the beginning and ending years
beginning_year = 1950
ending_year = 2021

# Create a DataFrame to store growth rates
growth_rates = {'Continent': [], 'Growth Rate (%)': []}

# Calculate the growth rate for each continent
for continent in filtered_pop['Continents and World'].unique():
    # Filter the DataFrame to select data for the beginning and ending years for the continent
    continent_data = filtered_pop[filtered_pop['Continents and World'] == continent]
    beginning_population = continent_data.loc[continent_data['Year'] == beginning_year, 'Population'].iloc[0]
    ending_population = continent_data.loc[continent_data['Year'] == ending_year, 'Population'].iloc[0]

    # Calculate the number of years between the beginning and ending years
    number_of_years = ending_year - beginning_year

    # Calculate the compound annual growth rate (CAGR)
    cagr = (ending_population / beginning_population) ** (1 / number_of_years) - 1

    # Convert the growth rate to percentage
    growth_rate_percentage = cagr * 100

    # Append the growth rate to the growth_rates dictionary
    growth_rates['Continent'].append(continent)
    growth_rates['Growth Rate (%)'].append(growth_rate_percentage)


# Convert the growth rates dictionary to a DataFrame
growth_rates_df = pd.DataFrame(growth_rates)

# Display the growth rates DataFrame
print(growth_rates_df)


                              Continent  Growth Rate (%)
0                           Africa (UN)         2.585440
1                             Asia (UN)         1.740343
2                           Europe (UN)         0.429377
3  Latin America and the Caribbean (UN)         1.934458
4                 Northern America (UN)         1.189444
5                          Oceania (UN)         1.795341
6                                 World         1.635797


## **Forecast**

**In order to run the following cell, you need to install prophet in your terminal : pip install prophet**


The following figure presents a forecast of the world's population and the continents' population until 2100. The forecast has been created based on the population growth rate of each continent and the World.

In [29]:
# Prepare DataFrame for Prophet
pop_prophet = filtered_pop.rename(columns={'Year': 'ds', 'Population': 'y'})

# Define a function to create forecasts for each continent
def create_forecast(continent):
    # Filter data for the continent
    continent_data = pop_prophet[pop_prophet['Continents and World'] == continent]
    
    # Initialize Prophet model
    model = Prophet()
    
    # Fit the model
    model.fit(continent_data)
    
    # Make future DataFrame
    future = model.make_future_dataframe(periods=2100-1970, freq='Y')  # Forecast until 2100
    
    # Make forecast
    forecast = model.predict(future)
    
    return forecast
# Create forecasts for each continent
forecasts = {}
for continent in filtered_pop['Continents and World'].unique():
    forecasts[continent] = create_forecast(continent)

# Plot the forecasts
fig = go.Figure()
for continent, forecast in forecasts.items():
    fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat'], mode='lines', name=continent))

fig.update_layout(title='Population Forecast until 2100',
                  xaxis_title='Year',
                  yaxis_title='Population',
                  legend_title='Continent')

fig.show()

17:39:39 - cmdstanpy - INFO - Chain [1] start processing


17:39:40 - cmdstanpy - INFO - Chain [1] done processing
17:39:40 - cmdstanpy - INFO - Chain [1] start processing
17:39:40 - cmdstanpy - INFO - Chain [1] done processing
17:39:40 - cmdstanpy - INFO - Chain [1] start processing
17:39:40 - cmdstanpy - INFO - Chain [1] done processing
17:39:41 - cmdstanpy - INFO - Chain [1] start processing
17:39:41 - cmdstanpy - INFO - Chain [1] done processing
17:39:41 - cmdstanpy - INFO - Chain [1] start processing
17:39:41 - cmdstanpy - INFO - Chain [1] done processing
17:39:41 - cmdstanpy - INFO - Chain [1] start processing
17:39:41 - cmdstanpy - INFO - Chain [1] done processing
17:39:41 - cmdstanpy - INFO - Chain [1] start processing
17:39:42 - cmdstanpy - INFO - Chain [1] done processing

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result



The forecast shows a linear trend of the population based on the calculated global growth rate and average growth rate.

However, this does not take into account that there could occur some sudden changes which affect the population growth, e.g. wars, pandemics, economic shocks. etc.

Linear model is not the best way to present the forecast. A more advanced model could be used. 

# Conclusion

Throughout this project we tried to show the evolution of the world population for the world itself but also in detail for each continent. 

We started by import the dat and clean it so that we only keep the relevant data, and from there we started to plot it in order to see the global trend of the growth of the population for the world and the continent. 

Then, we wanted to look in detail for some specific age groups of the population. We decided to keep only children under the age of 1, people between 20 and 29, 50 and 59, 80 and 89. From there, we can clearly see that in some continent such as Europe, the global population is getting older and that less children are made, which could lead to a problem in the upcoming decades because the population won't renew itself and we may encounter problems to maintain a certain economic growth with less workers. 

After that, we wanted to put in a histogram the evolution of the age groups through time for all continent and it confirms what we were thinking right above with the increase of the "average age" in some continent whereas some like Africa see their population become younger. 
Attached to that, we ploted the population distribution in order to show the difference of the growth between the continents.

Next, we have claculated the population growth rates and we can retrieve our previous conclusions in it.

Finally, we made a forecast of the population growth until 2100 to see the global trend of the population.