# Work on population around the world between 1950 and 2021 

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [181]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
import plotly.express as px
import plotly.graph_objects as go
from IPython.display import display
from prophet import Prophet



# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Read and clean data

We imported data on the population of the world between 1950 and 2021. Then, we decided to only keep data on the continents and the world itself. 
For the columns we made the call to kept the global population of each continent and the population of different groups of people

Global data

In [22]:
pop = pd.read_csv('population-and-demography.csv')

continents = ['Africa (UN)', 'Europe (UN)', 'Asia (UN)', 'World', 'Northern America (UN)', 'Latin America and the Caribbean (UN)', 'Oceania (UN)']

filtered_pop = pop[pop['Country name'].isin(continents)]

drop_these = ['Population at age 1','Population of children under the age of 5', 'Population of children under the age of 15', 'Population under the age of 25', 'Population aged 15 to 64 years', 'Population older than 15 years', 'Population older than 18 years', 'population__all__20_24__records', 'population__all__25_29__records', 'population__all__30_34__records', 'population__all__35_39__records', 'population__all__40_44__records', 'population__all__45_49__records', 'population__all__50_54__records', 'population__all__55_59__records', 'population__all__60_64__records', 'population__all__65_69__records', 'population__all__70_74__records', 'population__all__75_79__records', 'population__all__80_84__records', 'population__all__85_89__records', 'population__all__90_94__records', 'population__all__95_99__records' ]

filtered_pop = filtered_pop.drop(drop_these, axis = 1)


new_index = range(1,505)

filtered_pop.index = new_index

filtered_pop.rename(columns = {'Country name' : 'Continents and World'}, inplace = True)
filtered_pop.head(504)

Unnamed: 0,Continents and World,Year,Population,Population of children under the age of 1,Population aged 1 to 4 years,Population aged 5 to 9 years,Population aged 10 to 14 years,Population aged 15 to 19 years,Population aged 20 to 29 years,Population aged 30 to 39 years,Population aged 40 to 49 years,Population aged 50 to 59 years,Population aged 60 to 69 years,Population aged 70 to 79 years,Population aged 80 to 89 years,Population aged 90 to 99 years,Population older than 100 years
1,Africa (UN),1950,227549260,9393661.0,29672912.0,29653958,25792740,23087212,37677212,27422540,19586988,13178286,7893063,3445317,700903,43862,615.0
2,Africa (UN),1951,232484000,9684508.0,30318004.0,30523140,26205760,23471334,38505124,28050804,19993920,13463632,8015908,3499692,709683,41884,602.0
3,Africa (UN),1952,237586060,9921448.0,31046300.0,31460048,26651612,23855340,39329870,28694808,20406358,13754924,8147429,3558314,718430,40599,582.0
4,Africa (UN),1953,242837440,10167668.0,31877272.0,32368278,27171188,24225864,40120050,29340532,20827816,14056554,8291982,3621039,728561,40087,557.0
5,Africa (UN),1954,248244770,10409928.0,32848608.0,33192114,27802460,24585024,40875776,29989164,21257012,14365597,8451841,3685246,741198,40304,499.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
500,World,2017,7599822300,139420590.0,550940160.0,668653200,631596500,607785800,1207877900,1092995100,953971900,772621950,549785660,285532670,119566930,18615136,459150.0
501,World,2018,7683790000,137690030.0,550970400.0,674324740,638451460,611398340,1204281300,1114043900,958684500,788847800,566538940,295304100,123199544,19578904,475921.0
502,World,2019,7764951000,135471330.0,549401300.0,678417150,645915800,615536000,1200389900,1133424100,963090600,809066200,579960400,306498750,126700504,20575482,503572.0
503,World,2020,7840953000,133345180.0,545801000.0,681410500,653182000,619493500,1196254600,1151358000,969082800,830432400,590299300,318539100,129707720,21499228,547543.0


merged data

In [40]:
female_data = pd.read_csv('population-and-demography female data.csv')
male_data = pd.read_csv('population-and-demography for male.csv')

merged_data = pd.merge(female_data, male_data, on = 'Year', how = 'inner')

selected_data = merged_data[['Country name_x', 'Year', 'Female population', 'Male population']]

continents = ['Africa (UN)', 'Europe (UN)', 'Asia (UN)', 'World', 'Northern America (UN)', 'Latin America and the Caribbean (UN)', 'Oceania (UN)']

filtered_selected_data = selected_data[selected_data['Country name_x'].isin(continents)]

filtered_selected_data.rename(columns = {'Country name_x' : 'Continents and World'}, inplace = True)

filtered_selected_data.set_index('Continents and World', inplace=True)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_selected_data.rename(columns = {'Country name_x' : 'Continents and World'}, inplace = True)


## **Plots**

We tried to plot all the data we have so that it is more readable 

## 1

## 2.1 :  Global population

In [183]:
# Selecting only the desired columns
selected_columns = filtered_pop[['Continents and World', 'Year', 'Population']]

# Create an interactive line plot using Plotly Express
fig = px.line(selected_columns, x='Year', y='Population', color='Continents and World',
              title='Population growth in the World and within the continents between 1950 and 2021',
              labels={'Population': 'Population in billion', 'Year': 'Year'})

# Show the interactive plot
fig.show()


## 2.2 : Male and Female population

## 2

## 2.1: Population of children under the age of 1

In [184]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population of children under the age of 1']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population of children under the age of 1', color='Continents and World', 
              title='Population of children under the age of 1 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()


## 2.2: Population between 1 and 4

In [185]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 1 to 4 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 1 to 4 years', color='Continents and World', 
              title='Population aged 1 to 4 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()


## 2.3: Population aged 5 and 9 years 

In [186]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 5 to 9 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 5 to 9 years', color='Continents and World', 
              title='Population aged 5 to 9 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.4: Population aged 10 to 14 years

In [187]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 10 to 14 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 10 to 14 years', color='Continents and World', 
              title='Population aged 10 to 14 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.5: Population aged 15 to 19 years

In [188]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 15 to 19 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 15 to 19 years', color='Continents and World', 
              title='Population aged 15 to 19 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.6: Population aged 20 to 29 years

In [189]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 20 to 29 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 20 to 29 years', color='Continents and World', 
              title='Population aged 20 to 29 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.7: Population aged 30 to 39 years 

In [190]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 30 to 39 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 30 to 39 years', color='Continents and World', 
              title='Population aged 30 to 39 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.8: Population aged 40 to 49 years

In [191]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 40 to 49 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 40 to 49 years', color='Continents and World', 
              title='Population aged 40 to 49 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.9: Population aged 50 to 59 years

In [192]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 50 to 59 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 50 to 59 years', color='Continents and World', 
              title='Population aged 50 to 59 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.10: Population aged 60 to 69 years

In [193]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 60 to 69 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 60 to 69 years', color='Continents and World', 
              title='Population aged 60 to 69 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.11: Population aged 70 to 79 years

In [194]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 70 to 79 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 70 to 79 years', color='Continents and World', 
              title='Population aged 70 to 79 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.12: Population aged 80 to 89 years

In [195]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 80 to 89 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 80 to 89 years', color='Continents and World', 
              title='Population aged 80 to 89 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.13: Population aged 90 to 99 years

In [196]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population aged 90 to 99 years']

# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population aged 90 to 99 years', color='Continents and World', 
              title='Population aged 90 to 99 in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 2.14: Population older than 100 years

In [197]:
# Filter the DataFrame to select data related to other continents
continents_data = filtered_pop[~filtered_pop['Continents and World'].isin(['World'])]

# Columns to include in the plot
columns_to_plot = ['Population older than 100 years']


# Create an interactive line plot with multiple curves using Plotly Express
fig = px.line(continents_data, x='Year', y='Population older than 100 years', color='Continents and World', 
              title='Population older than 100 years in every continents between 1950 and 2021', 
              labels={'value': 'Population', 'Year': 'Year', 'Continents and World': 'Continent'})

# Show the interactive line plot
fig.show()

## 3

In [198]:
# Get unique years from the DataFrame
years = sorted(filtered_pop['Year'].unique())

# Define a function to create the pie chart based on the selected year
def create_pie_chart(selected_year):
    # Filter data for the selected year
    year_data = filtered_pop[filtered_pop['Year'] == selected_year]

     # Exclude data for 'World'
    continent_data = year_data[year_data['Continents and World'] != 'World']

    # Calculate the total population for each continent
    continent_population = continent_data.groupby('Continents and World')['Population'].sum().reset_index()
    
    # Create a pie chart
    fig = go.Figure(data=[go.Pie(labels=continent_population['Continents and World'], values=continent_population['Population'])])
    fig.update_layout(title=f'Population Distribution for {selected_year}')
    fig.show()

# Create a dropdown menu widget for selecting the year
year_dropdown = widgets.Dropdown(options=years, description='Select Year:')
# Define a function to handle the dropdown menu change event
def on_year_change(change):
    selected_year = change.new
    create_pie_chart(selected_year)

# Attach the function to the dropdown menu's change event
year_dropdown.observe(on_year_change, names='value')

# Display the dropdown menu
display(year_dropdown)

# Create the initial pie chart for the first year in the dataset
create_pie_chart(years[0])

Dropdown(description='Select Year:', options=(1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960…

In [42]:
filtered_pop_without_World = filtered_pop[filtered_pop['Continents and World'] != 'World']

# Obtain unique years and age groups (excluding the first three columns assuming they are 'Year', 'Continents and World', and another non-age-group column)
years = sorted(filtered_pop_without_World['Year'].unique())
age_groups = filtered_pop_without_World.columns[3:]  # Adjust based on your DataFrame structure

# Function to create and display the stacked bar plot
def create_stacked_bar_plot(selected_year):
    # Filter the DataFrame for the selected year
    year_data = filtered_pop_without_World[filtered_pop_without_World['Year'] == selected_year]

    # Initialize a figure object
    fig = go.Figure()

    # Loop over each age group to add a bar trace for each
    for age_group in age_groups:
        # Calculate the percentage of each age group within each continent's total population
        continent_data = year_data.groupby('Continents and World').apply(lambda x: x[age_group].sum() / x[age_groups].sum().sum() * 100).reset_index(name=age_group)
       
        # Add a bar trace for the current age group
        fig.add_trace(go.Bar(
            x=continent_data['Continents and World'],
            y=continent_data[age_group],
            name=age_group
        ))

    # Update the layout to stack the bar traces
    fig.update_layout(
        barmode='stack',
        title_text=f'Population Distribution by Age Group in {selected_year}',
        xaxis_title='Continent',
        yaxis_title='Percentage of Total Population'
    )

    # Display the figure
    fig.show()

# Dropdown widget for year selection
year_dropdown = widgets.Dropdown(
    options=years,
    value=years[0],  # Default value
    description='Year:'
)

# Function to update the plot based on the selected year
def update_plot(change):
    create_stacked_bar_plot(change.new)

# Attach the update function to the year dropdown
year_dropdown.observe(update_plot, names='value')

# Display the widget and the initial plot
display(VBox([year_dropdown]))
create_stacked_bar_plot(years[0])

VBox(children=(Dropdown(description='Year:', options=(1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 19…

## **Calculations**

Calculating the global growth rate between 1950 and 2021 for the world and the different continents

In [199]:

# Specify the beginning and ending years
beginning_year = 1950
ending_year = 2021

# Create a DataFrame to store growth rates
growth_rates = {'Continent': [], 'Growth Rate (%)': []}

# Calculate the growth rate for each continent
for continent in filtered_pop['Continents and World'].unique():
    # Filter the DataFrame to select data for the beginning and ending years for the continent
    continent_data = filtered_pop[filtered_pop['Continents and World'] == continent]
    beginning_population = continent_data.loc[continent_data['Year'] == beginning_year, 'Population'].iloc[0]
    ending_population = continent_data.loc[continent_data['Year'] == ending_year, 'Population'].iloc[0]

    # Calculate the compound annual growth rate (CAGR)
    cagr = (ending_population / beginning_population) - 1

    # Convert the growth rate to percentage
    growth_rate_percentage = cagr * 100

    # Append the growth rate to the growth_rates dictionary
    growth_rates['Continent'].append(continent)
    growth_rates['Growth Rate (%)'].append(growth_rate_percentage)

# Convert the growth rates dictionary to a DataFrame
growth_rates_df = pd.DataFrame(growth_rates)

# Display the growth rates DataFrame
print(growth_rates_df)


                              Continent  Growth Rate (%)
0                           Africa (UN)       512.472394
1                             Asia (UN)       240.421434
2                           Europe (UN)        35.554729
3  Latin America and the Caribbean (UN)       289.755444
4                 Northern America (UN)       131.525984
5                          Oceania (UN)       253.737450
6                                 World       216.457623


Calculating the average growth rate per year for the world and the continents

In [200]:
# Specify the beginning and ending years
beginning_year = 1950
ending_year = 2021

# Create a DataFrame to store growth rates
growth_rates = {'Continent': [], 'Growth Rate (%)': []}

# Calculate the growth rate for each continent
for continent in filtered_pop['Continents and World'].unique():
    # Filter the DataFrame to select data for the beginning and ending years for the continent
    continent_data = filtered_pop[filtered_pop['Continents and World'] == continent]
    beginning_population = continent_data.loc[continent_data['Year'] == beginning_year, 'Population'].iloc[0]
    ending_population = continent_data.loc[continent_data['Year'] == ending_year, 'Population'].iloc[0]

    # Calculate the number of years between the beginning and ending years
    number_of_years = ending_year - beginning_year

    # Calculate the compound annual growth rate (CAGR)
    cagr = (ending_population / beginning_population) ** (1 / number_of_years) - 1

    # Convert the growth rate to percentage
    growth_rate_percentage = cagr * 100

    # Append the growth rate to the growth_rates dictionary
    growth_rates['Continent'].append(continent)
    growth_rates['Growth Rate (%)'].append(growth_rate_percentage)


# Convert the growth rates dictionary to a DataFrame
growth_rates_df = pd.DataFrame(growth_rates)

# Display the growth rates DataFrame
print(growth_rates_df)


                              Continent  Growth Rate (%)
0                           Africa (UN)         2.585440
1                             Asia (UN)         1.740343
2                           Europe (UN)         0.429377
3  Latin America and the Caribbean (UN)         1.934458
4                 Northern America (UN)         1.189444
5                          Oceania (UN)         1.795341
6                                 World         1.635797


**In order to run the following cell, you need to install prophet in your terminal : pip install prophet**

**If you do not want to install it, the cell calculates a forecast of the world population and the continents population until 2050**

In [201]:
# Prepare DataFrame for Prophet
pop_prophet = filtered_pop.rename(columns={'Year': 'ds', 'Population': 'y'})

# Define a function to create forecasts for each continent
def create_forecast(continent):
    # Filter data for the continent
    continent_data = pop_prophet[pop_prophet['Continents and World'] == continent]
    
    # Initialize Prophet model
    model = Prophet()
    
    # Fit the model
    model.fit(continent_data)
    
    # Make future DataFrame
    future = model.make_future_dataframe(periods=2100-2021, freq='Y')  # Forecast until 2100
    
    # Make forecast
    forecast = model.predict(future)
    
    return forecast
# Create forecasts for each continent
forecasts = {}
for continent in filtered_pop['Continents and World'].unique():
    forecasts[continent] = create_forecast(continent)

# Plot the forecasts
fig = go.Figure()
for continent, forecast in forecasts.items():
    fig.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat'], mode='lines', name=continent))

fig.update_layout(title='Population Forecast until 2050',
                  xaxis_title='Year',
                  yaxis_title='Population',
                  legend_title='Continent')

fig.show()

10:19:55 - cmdstanpy - INFO - Chain [1] start processing
10:19:56 - cmdstanpy - INFO - Chain [1] done processing
10:19:56 - cmdstanpy - INFO - Chain [1] start processing
10:19:57 - cmdstanpy - INFO - Chain [1] done processing
10:19:57 - cmdstanpy - INFO - Chain [1] start processing
10:19:57 - cmdstanpy - INFO - Chain [1] done processing
10:19:57 - cmdstanpy - INFO - Chain [1] start processing
10:19:57 - cmdstanpy - INFO - Chain [1] done processing
10:19:57 - cmdstanpy - INFO - Chain [1] start processing
10:19:57 - cmdstanpy - INFO - Chain [1] done processing
10:19:57 - cmdstanpy - INFO - Chain [1] start processing
10:19:58 - cmdstanpy - INFO - Chain [1] done processing
10:19:58 - cmdstanpy - INFO - Chain [1] start processing
10:19:58 - cmdstanpy - INFO - Chain [1] done processing

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.arr

# Merge data sets

Now you create combinations of your loaded data sets. Remember the illustration of a (inner) **merge**:

In [202]:
plt.figure(figsize=(15,7))
v = venn2(subsets = (4, 4, 10), set_labels = ('Data X', 'Data Y'))
v.get_label_by_id('100').set_text('dropped')
v.get_label_by_id('010').set_text('dropped' )
v.get_label_by_id('110').set_text('included')
plt.show()

NameError: name 'venn2' is not defined

<Figure size 1500x700 with 0 Axes>

Here we are dropping elements from both data set X and data set Y. A left join would keep all observations in data X intact and subset only from Y. 

Make sure that your resulting data sets have the correct number of rows and columns. That is, be clear about which observations are thrown away. 

**Note:** Don't make Venn diagrams in your own data project. It is just for exposition. 

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.