# Semesterarbeit Datenvisualisierung
Aufbereitung und Erarbeitung der nötigen Daten erfolgen in diesem Jupyternotebook. Anhand von diesen Datenquellen, welche ich in Python Pandas Datenframes lade, enstehen unter anderem Plotly-Grafiken. Diese Grafiken nutze ich für:

   - Präsentation innerhalb von dem ``Storytelling-Tool: http://prezi.com``
   - ``Shiny for Python Dashboard`` (kann anhand diesem Notebook selbständig getestet werden)

Mit dieser Grundlage erstelle ich die Semesterarbeit für die Datenvisualisierung. Für den Statistikteil werde ich einen komplett neuen Datensatz (WORLD VALUES SURVEY WAVE 7, 2017-2021) verwenden um die nötigen 4 Verfahren anzuwenden.

# Daten: Our World in Data (OWID)
Dokumentation API: https://docs.owid.io/projects/etl/api/#owid-catalog

## Inspiration und Durchsicht möglicher Quellen
* **Datapoints used to train:** https://ourworldindata.org/grapher/artificial-intelligence-number-training-datapoints
----
* **Annual patent applications (not detailed on technology):** https://ourworldindata.org/grapher/annual-patent-applications
* **Annual granted patents related to AI:** https://ourworldindata.org/grapher/artificial-intelligence-granted-patents-by-industry
* **Annual working hours:** https://ourworldindata.org/grapher/annual-working-hours-per-worker
* **Annual articles publ in scientific and tech journals:** https://ourworldindata.org/grapher/scientific-publications-per-million
* **Research and development spending as a share of GDP:** https://ourworldindata.org/grapher/research-spending-gdp
* **Research and development per million people vs. GPD per capita:** https://ourworldindata.org/grapher/researchers-in-rd-per-million-people-vs-gdp-pc
----
* **Population:** https://ourworldindata.org/grapher/population
* **Population with UN projections 2100:** https://ourworldindata.org/grapher/population-with-un-projections
* **Median age:** https://ourworldindata.org/grapher/median-age
* **Female Popl by Age** https://ourworldindata.org/grapher/female-population-by-age-group
* **Male Popl by Age** https://ourworldindata.org/grapher/male-population-by-age-group
* **Popl young, working, elderly** https://ourworldindata.org/grapher/population-young-working-elderly-with-projections
----
* **Fertility rate:** https://ourworldindata.org/grapher/children-per-woman-un
* **Population by age group:** https://ourworldindata.org/grapher/population-by-age-group-with-projections
* **Age dependency breakdown by young and old:** https://ourworldindata.org/grapher/age-dependency-breakdown
----
* **Public health expenditure as a share of GDP:** https://ourworldindata.org/grapher/public-health-expenditure-share-gdp

## Menscheit

In [1]:
from owid.catalog import charts
import plotly.express as px
import pandas as pd
import plotly.graph_objects as go

population_df = charts.get_data("population")

In [2]:
population_df.head()

Unnamed: 0,entities,years,population
0,Afghanistan,-10000,14737
1,Afghanistan,-9000,20405
2,Afghanistan,-8000,28253
3,Afghanistan,-7000,39120
4,Afghanistan,-6000,54166


In [3]:
population_df["entities"].unique()

array(['Afghanistan', 'Africa', 'Africa (UN)', 'Akrotiri and Dhekelia',
       'Albania', 'Algeria', 'American Samoa', 'Americas (UN)', 'Andorra',
       'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina',
       'Armenia', 'Aruba', 'Asia', 'Asia (UN)',
       'Asia (excl. China and India)', 'Australia', 'Austria',
       'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados',
       'Belarus', 'Belgium', 'Belize', 'Benin', 'Bermuda', 'Bhutan',
       'Bolivia', 'Bonaire Sint Eustatius and Saba',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso',
       'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde',
       'Cayman Islands', 'Central African Republic', 'Chad', 'Chile',
       'China', 'Colombia', 'Comoros', 'Congo', 'Cook Islands',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao',
       'Cyprus', 'Czechia', 'Czechoslovakia',
       'Democratic Republic of Congo', 'Denmar

In [4]:
population_df

Unnamed: 0,entities,years,population
0,Afghanistan,-10000,14737
1,Afghanistan,-9000,20405
2,Afghanistan,-8000,28253
3,Afghanistan,-7000,39120
4,Afghanistan,-6000,54166
...,...,...,...
59172,Zimbabwe,2019,15271330
59173,Zimbabwe,2020,15526837
59174,Zimbabwe,2021,15797165
59175,Zimbabwe,2022,16069010


In [5]:
weltbevoelkerung = population_df[population_df['entities'] == 'World'].copy()

In [6]:
weltbevoelkerung

Unnamed: 0,entities,years,population
58145,World,-10000,4501152
58146,World,-9000,5687125
58147,World,-8000,7314623
58148,World,-7000,9651703
58149,World,-6000,13278309
...,...,...,...
58266,World,2019,7811293646
58267,World,2020,7887001253
58268,World,2021,7954448327
58269,World,2022,8021407128


In [7]:
# Umrechnung in Milliarden
weltbevoelkerung['population_billions'] = weltbevoelkerung['population'] / 1_000_000_000

# Erstellen des Plotly-Diagramms mit dunklem Theme
fig = px.line(weltbevoelkerung, 
              x='years', 
              y='population_billions',
              title='Weltbevölkerung über die Zeit',
              template='plotly_dark',
              color_discrete_sequence=px.colors.sequential.Sunsetdark)

# Anpassen der Achsenbeschriftungen und Layout
fig.update_layout(
    xaxis_title="Jahr",
    yaxis_title="Bevölkerung (Milliarden)",
    xaxis=dict(
        range=[-10000, 2023],
        tickvals=[-10000, -4000, 0, 2023],  # Spezifische Jahre auf der x-Achse
        ticktext=['-10000', '-4000', '0', '2023']  # Labels für diese Jahre
    )
)

# Füge Annotationen für die spezifischen Jahre hinzu
for year in [-10000, -4000, 0, 2023]:
    year_data = weltbevoelkerung[weltbevoelkerung['years'] == year]
    if not year_data.empty:
        pop_value = year_data.iloc[0]['population_billions']
        fig.add_annotation(
            x=year,
            y=pop_value,
            text=f"{pop_value:.3f}Mrd",
            showarrow=True,
            arrowhead=1,
            yshift=-2,
            font=dict(color='white')
        )

# Anzeigen des Diagramms
fig.show()

In [8]:
china = population_df[population_df['entities'] == 'China'].copy()

In [9]:
china[china['years'] == 2023]

Unnamed: 0,entities,years,population
10915,China,2023,1422584878


## Bevölkerungs-Prognose: UN medium Scenario

In [None]:
population_projections = charts.get_data("population-with-un-projections")

In [None]:
population_projections[(population_projections['entities'] == 'China') & (population_projections['years'] == 2023)]

Unnamed: 0,entities,years,population__sex_all__age_all__variant_estimates,population__sex_all__age_all__variant_medium
6868,China,2023,1422585000.0,


In [12]:
population_projections.head()

Unnamed: 0,entities,years,population__sex_all__age_all__variant_estimates,population__sex_all__age_all__variant_medium
0,Afghanistan,1950,7776133.0,
1,Afghanistan,1951,7879295.0,
2,Afghanistan,1952,7987737.0,
3,Afghanistan,1953,8096656.0,
4,Afghanistan,1954,8207910.0,


In [13]:
population_projections["entities"].unique()

array(['Afghanistan', 'Africa (UN)', 'Albania', 'Algeria',
       'American Samoa', 'Americas (UN)', 'Andorra', 'Angola', 'Anguilla',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba',
       'Asia (UN)', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas',
       'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
       'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
       'Bonaire Sint Eustatius and Saba', 'Bosnia and Herzegovina',
       'Botswana', 'Brazil', 'British Virgin Islands', 'Brunei',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Cape Verde', 'Cayman Islands',
       'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia',
       'Comoros', 'Congo', 'Cook Islands', 'Costa Rica', "Cote d'Ivoire",
       'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czechia',
       'Democratic Republic of Congo', 'Denmark', 'Djibouti', 'Dominica',
       'Dominican Republic', 'East Timor', 'Ecuador', 'Egypt',
       'El Salva

In [14]:
population_projections_world = population_projections[population_projections['entities'] == 'World'].copy()
population_projections_world.head()

Unnamed: 0,entities,years,population__sex_all__age_all__variant_estimates,population__sex_all__age_all__variant_medium
38052,World,1950,2493093000.0,
38053,World,1951,2536927000.0,
38054,World,1952,2584086000.0,
38055,World,1953,2634106000.0,
38056,World,1954,2685895000.0,


In [15]:
# Umrechnung in Milliarden
population_projections_world['population_billions'] = round(population_projections_world['population__sex_all__age_all__variant_estimates'] / 1_000_000_000, 2)
population_projections_world.head()

Unnamed: 0,entities,years,population__sex_all__age_all__variant_estimates,population__sex_all__age_all__variant_medium,population_billions
38052,World,1950,2493093000.0,,2.49
38053,World,1951,2536927000.0,,2.54
38054,World,1952,2584086000.0,,2.58
38055,World,1953,2634106000.0,,2.63
38056,World,1954,2685895000.0,,2.69


In [16]:
import plotly.graph_objects as go

# Daten vorbereiten wie zuvor
historical_data = population_projections_world[population_projections_world['years'] <= 2024].copy()
historical_data['population_billions'] = historical_data['population__sex_all__age_all__variant_estimates'] / 1_000_000_000

forecast_data = population_projections_world[population_projections_world['years'] >= 2024].copy()
forecast_data['population_billions'] = forecast_data['population__sex_all__age_all__variant_medium'] / 1_000_000_000

# Grafik erstellen
fig = go.Figure()

# Historische Daten (kräftiges Grün)
fig.add_trace(go.Scatter(
    x=historical_data['years'],
    y=historical_data['population_billions'],
    name='Historische Daten',
    line=dict(color='#2ecc71', width=3)
))

# Prognosedaten (blasses, gestricheltes Grün)
fig.add_trace(go.Scatter(
    x=forecast_data['years'],
    y=forecast_data['population_billions'],
    name='Prognose',
    line=dict(color='rgba(46, 204, 113, 0.3)', width=2, dash='dash')
))

# Layout anpassen
fig.update_layout(
    title={
        'text': 'Weltbevölkerung: Historie und Prognose',
        'y': 0.95,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': {'size': 24}
    },
    xaxis_title="Jahr",
    yaxis_title="Bevölkerung (Milliarden)",
    template='plotly_dark',
    showlegend=True,
    legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="left",
        x=0.01
    ),
    height=800,
    width=1200,
    xaxis=dict(
        range=[1950, 2100],
        dtick=25,
        ticks='outside',        # Tickmarks nach außen
        ticklen=10,            # Länge der Tickmarks
        tickfont=dict(size=12),
        tickangle=0,           # Keine Rotation der Jahre
        title_standoff=25,     # Mehr Abstand zwischen Achsentitel und Ticks
        side='bottom'          # Position der Achse unten
    ),
    yaxis=dict(
        range=[2, 11],
        dtick=1,
        gridwidth=0.5,
    ),
    plot_bgcolor='rgba(0,0,0,0.95)',
    paper_bgcolor='rgba(0,0,0,0.95)',
)

# Spezifische Jahre für Annotationen
years_to_annotate = [1950, 2000, 2023]
y_shifts = [20, -30, 30]  # Verschiedene y-Verschiebungen für bessere Lesbarkeit
x_shifts = [50, 50, 50]   # Horizontale Verschiebung für rechtsbündige Ausrichtung

for year, y_shift, x_shift in zip(years_to_annotate, y_shifts, x_shifts):
    data_point = historical_data[historical_data['years'] == year]
    if not data_point.empty:
        fig.add_annotation(
            x=year,
            y=data_point['population_billions'].iloc[0],
            text=f"{year}: {data_point['population_billions'].iloc[0]:.2f} Mrd.",
            showarrow=False,
            font=dict(size=12, color='white'),
            bgcolor='rgba(0,0,0,0.7)',
            bordercolor='#2ecc71',
            borderwidth=1,
            yshift=y_shift,
            xshift=x_shift,     # Horizontale Verschiebung
            align='left',       # Text-Ausrichtung in der Box
            xanchor='left'      # Ausrichtung der Box relativ zum Datenpunkt
        )

# Maximum-Punkt der Prognose - zentriert
max_point = forecast_data.loc[forecast_data['population_billions'].idxmax()]
fig.add_annotation(
    x=max_point['years'],
    y=max_point['population_billions'],
    text=f"Maximum ({int(max_point['years'])}): {max_point['population_billions']:.2f} Mrd.",
    showarrow=True,
    arrowhead=1,
    font=dict(size=12, color='white'),
    bgcolor='rgba(0,0,0,0.7)',
    bordercolor='rgba(46, 204, 113, 0.3)',
    borderwidth=1,
    yshift=5,        # Vertikale Verschiebung nach oben
    xshift=0,         # Keine horizontale Verschiebung
    align='center',   # Text-Ausrichtung in der Box zentriert
    xanchor='center'  # Box zentriert über dem Datenpunkt
)

# Anzeigen der Grafik
fig.show()

## Beispiel: China

In [52]:
population_projections = charts.get_data("population-with-un-projections")

In [53]:
population_projections_china = population_projections[population_projections['entities'] == 'China'].copy()

In [54]:
# Umrechnung in Milliarden
population_projections_china['population_billions'] = round(population_projections_china['population__sex_all__age_all__variant_estimates'] / 1_000_000_000, 2)
population_projections_china.head()

Unnamed: 0,entities,years,population__sex_all__age_all__variant_estimates,population__sex_all__age_all__variant_medium,population_billions
6795,China,1950,544044304.0,,0.54
6796,China,1951,553758189.0,,0.55
6797,China,1952,565131382.0,,0.57
6798,China,1953,577557087.0,,0.58
6799,China,1954,590113816.0,,0.59


In [58]:
import plotly.graph_objects as go

# --- Start of streamlined plotting code ---

# Define known column names
hist_pop_col = 'population__sex_all__age_all__variant_estimates'
forecast_pop_col = 'population__sex_all__age_all__variant_medium'

# Prepare data directly
historical_data = population_projections_china[population_projections_china['years'] <= 2024].copy()
historical_data['population_billions'] = historical_data[hist_pop_col] / 1_000_000_000
historical_data.dropna(subset=['population_billions'], inplace=True)

forecast_data = population_projections_china[population_projections_china['years'] >= 2024].copy()
forecast_data['population_billions'] = forecast_data[forecast_pop_col] / 1_000_000_000
forecast_data.dropna(subset=['population_billions'], inplace=True)

# Create figure
fig = go.Figure()

# Add historical data trace (if data exists)
if not historical_data.empty:
    fig.add_trace(go.Scatter(
        x=historical_data['years'],
        y=historical_data['population_billions'],
        name='Historische Daten (China)',
        line=dict(color='#2ecc71', width=3)
    ))

# Add forecast data trace (if data exists)
if not forecast_data.empty:
    fig.add_trace(go.Scatter(
        x=forecast_data['years'],
        y=forecast_data['population_billions'],
        name='Prognose (China)',
        line=dict(color='rgba(46, 204, 113, 0.3)', width=2, dash='dash')
    ))

# Configure layout
fig.update_layout(
    title={
        'text': 'Bevölkerung China: Historie und Prognose',
        'y': 0.95, 'x': 0.5, 'xanchor': 'center', 'yanchor': 'top',
        'font': {'size': 24}
    },
    xaxis_title="Jahr",
    yaxis_title="Bevölkerung (Milliarden)",
    template='plotly_dark',
    showlegend=True,
    legend=dict(yanchor="top", y=0.99, xanchor="left", x=0.01),
    height=800,
    width=1200,
    xaxis=dict(
        range=[1950, 2100], dtick=25, ticks='outside', ticklen=10,
        tickfont=dict(size=12), tickangle=0, title_standoff=25, side='bottom'
    ),
    yaxis=dict(range=[0, 1.6], dtick=0.2, gridwidth=0.5),
    plot_bgcolor='rgba(0,0,0,0.95)',
    paper_bgcolor='rgba(0,0,0,0.95)',
)

# Add historical annotations
if not historical_data.empty:
    years_to_annotate_hist = [1950, 2000, 2023] # Keep 2023 here as it's historical peak
    y_shifts_hist = [20, -30, 30]
    x_shifts_hist = [50, 50, 50] # Adjusted x_shift for 2023 to avoid overlap if needed
    for year, y_shift, x_shift in zip(years_to_annotate_hist, y_shifts_hist, x_shifts_hist):
        data_point = historical_data[historical_data['years'] == year]
        if not data_point.empty and pd.notna(data_point['population_billions'].iloc[0]):
            fig.add_annotation(
                x=year, y=data_point['population_billions'].iloc[0],
                text=f"{year}: {data_point['population_billions'].iloc[0]:.2f} Mrd.",
                showarrow=False, font=dict(size=12, color='white'),
                bgcolor='rgba(0,0,0,0.7)', bordercolor='#2ecc71', borderwidth=1,
                yshift=y_shift, xshift=x_shift, align='left', xanchor='left'
            )

# Add forecast annotation for 2100
if not forecast_data.empty:
    year_2100_data = forecast_data[forecast_data['years'] == 2100]
    if not year_2100_data.empty and pd.notna(year_2100_data['population_billions'].iloc[0]):
        pop_2100 = year_2100_data['population_billions'].iloc[0]
        fig.add_annotation(
            x=2100, y=pop_2100,
            text=f"2100: {pop_2100:.2f} Mrd.",
            showarrow=False, font=dict(size=12, color='white'),
            bgcolor='rgba(0,0,0,0.7)', bordercolor='rgba(46, 204, 113, 0.3)', borderwidth=1, # Use forecast color
            yshift=15,      # Position adjustment
            xshift=-50,     # Position adjustment (negative for left alignment)
            align='right',  # Text alignment
            xanchor='right' # Anchor point
        )


# --- REMOVED Maximum point annotation block ---
# if not forecast_data.empty and forecast_data['population_billions'].notna().any():
#    try:
#        max_point_idx = forecast_data['population_billions'].idxmax()
#        max_point = forecast_data.loc[max_point_idx]
#        fig.add_annotation(
#            x=max_point['years'], y=max_point['population_billions'],
#            text=f"Maximum ({int(max_point['years'])}): {max_point['population_billions']:.2f} Mrd.",
#            showarrow=True, arrowhead=1, font=dict(size=12, color='white'),
#            bgcolor='rgba(0,0,0,0.7)', bordercolor='rgba(46, 204, 113, 0.3)', borderwidth=1,
#            yshift=5, xshift=0, align='center', xanchor='center'
#        )
#    except ValueError:
#         pass

# Show plot
fig.show()

# --- End of streamlined plotting code ---

**Fazit:** China geht also quasi zurück zu 1950  
**Überlegung:** Wie könnte ich das grafisch und in einem überzeugenden Vergleich anstellen --> Working-Age von Ländern

# Working-Age Daten

In [75]:
working_age_df = charts.get_data("population-young-working-elderly-with-projections")
working_age_df.head()

Unnamed: 0,entities,years,population__sex_all__age_65plus__variant_estimates,population__sex_all__age_65plus__variant_medium,population__sex_all__age_15_64__variant_estimates,population__sex_all__age_15_64__variant_medium,population__sex_all__age_0_14__variant_estimates,population__sex_all__age_0_14__variant_medium
0,Afghanistan,1950,221568.0,,4362892.0,,3191673.0,
1,Afghanistan,1951,225314.0,,4417702.0,,3236279.0,
2,Afghanistan,1952,228844.0,,4474793.0,,3284100.0,
3,Afghanistan,1953,232091.0,,4531190.0,,3333375.0,
4,Afghanistan,1954,235121.0,,4586963.0,,3385826.0,


In [78]:
import pandas as pd
import numpy as np

# --- Start of Code ---

# Define target entities and years
target_entities = ['China', 'United States']
start_year = 2023
end_year = 2050

# Filter for relevant entities and years
df_filtered = working_age_df[
    (working_age_df['entities'].isin(target_entities)) &
    (working_age_df['years'] >= start_year) &
    (working_age_df['years'] <= end_year)
].copy()

# Define column names
est_col = 'population__sex_all__age_15_64__variant_estimates'
med_col = 'population__sex_all__age_15_64__variant_medium'

# --- Data Preparation ---
# Create a unified population column (using estimates for 2023, medium for forecast)
# Convert to millions directly
df_filtered['working_pop_millions'] = np.where(
    df_filtered['years'] == start_year,
    df_filtered[est_col] / 1_000_000, # Use estimates for start year
    df_filtered[med_col] / 1_000_000  # Use medium forecast for subsequent years
)

# Sort for accurate cumulative calculation
df_filtered.sort_values(by=['entities', 'years'], inplace=True)

# Calculate cumulative change since start_year for each country
# Group by country, calculate diff, then cumulative sum
df_filtered['yearly_change'] = df_filtered.groupby('entities')['working_pop_millions'].diff()
df_filtered['cumulative_change_millions'] = df_filtered.groupby('entities')['yearly_change'].cumsum().fillna(0)

# Select and pivot the table
comparison_df = df_filtered.pivot_table(
    index='years',
    columns='entities',
    values=['working_pop_millions', 'cumulative_change_millions']
)

# Flatten the multi-level column index for better readability
comparison_df.columns = [f"{col[1]}_{col[0].replace('_millions', '').replace('working_pop','Pop').replace('cumulative_change','Cumul_Change')}" for col in comparison_df.columns]

# Reorder columns for clarity (optional)
comparison_df = comparison_df[[
    'China_Pop', 'China_Cumul_Change', 
    'United States_Pop', 'United States_Cumul_Change'
]]

# Round the values for display
comparison_df = comparison_df.round(2)

# Display the resulting DataFrame
print(f"Working-Age Population Comparison (Millions): {start_year}-{end_year}")
comparison_df

# --- End of Code ---

Working-Age Population Comparison (Millions): 2023-2050


Unnamed: 0_level_0,China_Pop,China_Cumul_Change,United States_Pop,United States_Cumul_Change
years,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2023,982.93,0.0,223.17,0.0
2024,983.97,1.05,223.65,0.48
2025,986.75,3.83,224.04,0.87
2026,989.97,7.05,224.27,1.1
2027,990.48,7.56,224.47,1.3
2028,985.07,2.15,224.7,1.53
2029,978.17,-4.75,224.98,1.81
2030,971.98,-10.95,225.36,2.19
2031,965.74,-17.18,225.85,2.69
2032,960.67,-22.26,226.38,3.21


Im 2050 hat ``China eine kumulative Veränderung von -237 Millionen an Working-Age Personen``, was dem entspricht was die komplette USA zum Stand 2023 hat... Crazy

## Geburtenrate

In [17]:
fertility_rate = charts.get_data("children-per-woman-un")
fertility_rate.head()

Unnamed: 0,entities,years,children_per_woman_un
0,Afghanistan,1950,7.248
1,Afghanistan,1951,7.26
2,Afghanistan,1952,7.26
3,Afghanistan,1953,7.266
4,Afghanistan,1954,7.254


In [18]:
fertility_rate[fertility_rate['entities'] == 'China']

Unnamed: 0,entities,years,children_per_woman_un
3256,China,1950,5.813
3257,China,1951,5.699
3258,China,1952,6.472
3259,China,1953,6.042
3260,China,1954,6.278
...,...,...,...
3325,China,2019,1.496
3326,China,2020,1.236
3327,China,2021,1.117
3328,China,2022,1.034


In [19]:
fertility_rate.describe()

Unnamed: 0,years,children_per_woman_un
count,18722.0,18722.0
mean,1986.5,3.961614
std,21.36058,2.00587
min,1950.0,0.662
25%,1968.0,2.13125
50%,1986.5,3.521
75%,2005.0,5.89975
max,2023.0,8.864


In [20]:
# Berechnen des Rückgangs der Geburtenraten für jedes Land
def calculate_fertility_decline(df):
    # Gruppieren nach Ländern
    countries = df['entities'].unique()
    decline_data = []
    
    for country in countries:
        country_data = df[df['entities'] == country].sort_values('years')
        
        # Prüfen, ob genügend Daten vorhanden sind
        if len(country_data) > 10:  # Mindestens 10 Datenpunkte für eine sinnvolle Analyse
            # Frühester und spätester verfügbarer Wert
            earliest = country_data.iloc[0]
            latest = country_data.iloc[-1]
            
            # Berechnen des absoluten und prozentualen Rückgangs
            absolute_decline = earliest['children_per_woman_un'] - latest['children_per_woman_un']
            percent_decline = (absolute_decline / earliest['children_per_woman_un']) * 100
            
            # Zeitraum
            year_span = latest['years'] - earliest['years']
            
            # Durchschnittlicher jährlicher Rückgang
            annual_decline = absolute_decline / year_span if year_span > 0 else 0
            
            decline_data.append({
                'country': country,
                'start_year': earliest['years'],
                'end_year': latest['years'],
                'start_rate': earliest['children_per_woman_un'],
                'end_rate': latest['children_per_woman_un'],
                'absolute_decline': absolute_decline,
                'percent_decline': percent_decline,
                'year_span': year_span,
                'annual_decline': annual_decline
            })
    
    return pd.DataFrame(decline_data)

# Berechnen des Rückgangs
decline_df = calculate_fertility_decline(fertility_rate)

In [21]:
# Sortieren nach absolutem Rückgang (absteigend)
top_absolute_decline = decline_df.sort_values('absolute_decline', ascending=False).head(5)
print("Top 5 Länder mit dem größten absoluten Rückgang der Geburtenrate:")
top_absolute_decline[['country', 'start_year', 'end_year', 'start_rate', 'end_rate', 'absolute_decline']]

Top 5 Länder mit dem größten absoluten Rückgang der Geburtenrate:


Unnamed: 0,country,start_year,end_year,start_rate,end_rate,absolute_decline
221,Taiwan,1950,2023,6.481,0.87,5.611
116,Kuwait,1950,2023,7.115,1.525,5.59
247,Wallis and Futuna,1950,2023,6.947,1.41,5.537
196,Saint Vincent and the Grenadines,1950,2023,7.285,1.775,5.51
17,Bahrain,1950,2023,7.29,1.824,5.466


In [22]:
# Sortieren nach prozentualem Rückgang (absteigend)
top_percent_decline = decline_df.sort_values('percent_decline', ascending=False).head(5)
print("\nTop 5 Länder mit dem größten prozentualen Rückgang der Geburtenrate:")
top_percent_decline[['country', 'start_year', 'end_year', 'start_rate', 'end_rate', 'percent_decline']]


Top 5 Länder mit dem größten prozentualen Rückgang der Geburtenrate:


Unnamed: 0,country,start_year,end_year,start_rate,end_rate,percent_decline
212,South Korea,1950,2023,6.063,0.72,88.124691
134,Macao,1950,2023,5.554,0.662,88.080663
221,Taiwan,1950,2023,6.481,0.87,86.576146
205,Singapore,1950,2023,6.225,0.943,84.851406
97,Hong Kong,1950,2023,4.255,0.717,83.149236


In [23]:
# Sortieren nach jährlichem Rückgang (absteigend)
top_annual_decline = decline_df.sort_values('annual_decline', ascending=False).head(5)
print("\nTop 5 Länder mit dem schnellsten jährlichen Rückgang der Geburtenrate:")
top_annual_decline[['country', 'start_year', 'end_year', 'start_rate', 'end_rate', 'annual_decline']]


Top 5 Länder mit dem schnellsten jährlichen Rückgang der Geburtenrate:


Unnamed: 0,country,start_year,end_year,start_rate,end_rate,annual_decline
221,Taiwan,1950,2023,6.481,0.87,0.076863
116,Kuwait,1950,2023,7.115,1.525,0.076575
247,Wallis and Futuna,1950,2023,6.947,1.41,0.075849
196,Saint Vincent and the Grenadines,1950,2023,7.285,1.775,0.075479
17,Bahrain,1950,2023,7.29,1.824,0.074877


### TOP5 min() FTR

In [24]:
# Filtern der Daten für das Jahr 2023
fertility_2023 = fertility_rate[fertility_rate['years'] == 2023].copy()

# Sortieren nach Geburtenrate (aufsteigend)
lowest_fertility_2023 = fertility_2023.sort_values('children_per_woman_un').head(5)

# Anzeigen der Ergebnisse
print("Top 5 Länder mit den niedrigsten Geburtenraten im Jahr 2023:")
for i, (index, row) in enumerate(lowest_fertility_2023.iterrows(), 1):
    print(f"{i}. {row['entities']}: {row['children_per_woman_un']:.2f} Kinder pro Frau")

# Definieren einer konsistenten Farbpalette für die Länder
top5_lowest_countries = lowest_fertility_2023['entities'].tolist()
color_palette = px.colors.qualitative.Bold[:len(top5_lowest_countries)]
color_map = dict(zip(top5_lowest_countries, color_palette))

# Visualisierung der Top 5 Länder mit den niedrigsten Geburtenraten
fig = px.bar(
    lowest_fertility_2023,
    x='entities',
    y='children_per_woman_un',
    color='entities',  # Nach Ländern färben
    title='Länder mit den niedrigsten Geburtenraten (2023)',
    template='plotly_dark',
    color_discrete_map=color_map  # Konsistente Farben verwenden
)

# Layout anpassen
fig.update_layout(
    title={
        'text': 'Länder mit den niedrigsten Geburtenraten (2023)',
        'y': 0.95,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': {'size': 24}
    },
    xaxis_title="Land",
    yaxis_title="Geburtenrate pro Frau",
    height=600,
    width=1000,
    showlegend=False  # Legende ausblenden, da die Länder bereits auf der x-Achse stehen
)

# Werte über den Balken anzeigen
fig.update_traces(
    texttemplate='%{y:.2f}',
    textposition='outside'
)

# Y-Achse bei 0 beginnen lassen
fig.update_yaxes(range=[0, max(lowest_fertility_2023['children_per_woman_un']) * 1.1])

# Anzeigen der Grafik
fig.show()

# Zusätzlich: Zeitliche Entwicklung dieser 5 Länder
historical_data = fertility_rate[fertility_rate['entities'].isin(top5_lowest_countries)]

# Liniendiagramm für die historische Entwicklung
fig_historical = px.line(
    historical_data,
    x='years',
    y='children_per_woman_un',
    color='entities',
    title='Historische Entwicklung der Geburtenraten in Ländern mit den niedrigsten Werten (2023)',
    template='plotly_dark',
    color_discrete_map=color_map  # Dieselbe Farbzuordnung wie im Balkendiagramm
)

# Layout anpassen
fig_historical.update_layout(
    title={
        'text': 'Historische Entwicklung der Geburtenraten',
        'y': 0.95,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': {'size': 24}
    },
    xaxis_title="Jahr",
    yaxis_title="Geburtenrate pro Frau",
    legend_title="Land",
    height=600,
    width=1000
)

# Anzeigen der Grafik
fig_historical.show()

Top 5 Länder mit den niedrigsten Geburtenraten im Jahr 2023:
1. Macao: 0.66 Kinder pro Frau
2. Hong Kong: 0.72 Kinder pro Frau
3. South Korea: 0.72 Kinder pro Frau
4. Saint Barthelemy: 0.80 Kinder pro Frau
5. Taiwan: 0.87 Kinder pro Frau


### TOP2 MIN, MAX + MEDIAN FTR

In [46]:
# Filtern der Daten für das Jahr 2023
fertility_2023 = fertility_rate[fertility_rate['years'] == 2023].copy()

# Sortieren nach Geburtenrate (aufsteigend für niedrigste, absteigend für höchste)
lowest_fertility_2023 = fertility_2023.sort_values('children_per_woman_un').head(2)
highest_fertility_2023 = fertility_2023.sort_values('children_per_woman_un', ascending=False).head(2)

# Berechnen des Medians und Finden des Landes mit der Median-Geburtenrate
median_fertility = fertility_2023['children_per_woman_un'].median()
median_country_idx = (fertility_2023['children_per_woman_un'] - median_fertility).abs().idxmin()
median_country = fertility_2023.loc[median_country_idx:median_country_idx].copy()

# Kombinieren der Daten
comparison_data = pd.concat([lowest_fertility_2023, median_country, highest_fertility_2023])

# Kategorie hinzufügen
comparison_data['category'] = ['Niedrigste', 'Niedrigste', 'Median', 'Höchste', 'Höchste']

# Anzeigen der Ergebnisse
print("Top 2 Länder mit den niedrigsten Geburtenraten im Jahr 2023:")
for i, (index, row) in enumerate(lowest_fertility_2023.iterrows(), 1):
    print(f"{i}. {row['entities']}: {row['children_per_woman_un']:.2f} Kinder pro Frau")

print("\nLand mit der Median-Geburtenrate im Jahr 2023:")
print(f"{median_country['entities'].iloc[0]}: {median_country['children_per_woman_un'].iloc[0]:.2f} Kinder pro Frau")

print("\nTop 2 Länder mit den höchsten Geburtenraten im Jahr 2023:")
for i, (index, row) in enumerate(highest_fertility_2023.iterrows(), 1):
    print(f"{i}. {row['entities']}: {row['children_per_woman_un']:.2f} Kinder pro Frau")

# Definieren einer konsistenten Farbpalette für die Länder
selected_countries = comparison_data['entities'].tolist()
color_palette = px.colors.qualitative.Bold[:len(selected_countries)]
color_map = dict(zip(selected_countries, color_palette))

# Visualisierung der Vergleichsdaten
fig = px.bar(
    comparison_data,
    x='entities',
    y='children_per_woman_un',
    color='entities',  # Nach Ländern färben statt nach Kategorie
    title='2 Tiefst- / Median / 2 Höchstwerte (2023)',
    template='plotly_dark',
    color_discrete_map=color_map  # Konsistente Farben verwenden
)

# Layout anpassen
fig.update_layout(
    title={
        'text': '2 Tiefst- / Median / 2 Höchstwerte (2023)',
        'y': 0.95,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': {'size': 24}
    },
    xaxis_title="Land",
    yaxis_title="Geburtenrate pro Frau",
    legend_title="Land",
    height=600,
    width=1000,
    showlegend=False  # Legende im Balkendiagramm ausblenden, da die Länder bereits auf der x-Achse stehen
)

# Werte über den Balken anzeigen
fig.update_traces(
    texttemplate='%{y:.2f}',
    textposition='outside'
)

# Y-Achse bei 0 beginnen lassen
fig.update_yaxes(range=[0, max(comparison_data['children_per_woman_un']) * 1.1])

# Anzeigen der Grafik
fig.show()

# Historische Entwicklung dieser 5 Länder
historical_data = fertility_rate[fertility_rate['entities'].isin(selected_countries)]

# Liniendiagramm für die historische Entwicklung
fig_historical = px.line(
    historical_data,
    x='years',
    y='children_per_woman_un',
    color='entities',
    title='Historische Entwicklung',
    template='plotly_dark',
    color_discrete_map=color_map  # Dieselbe Farbzuordnung wie im Balkendiagramm
)

# Layout anpassen
fig_historical.update_layout(
    title={
        'text': 'Historische Entwicklung',
        'y': 0.95,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': {'size': 24}
    },
    xaxis_title="Jahr",
    yaxis_title="Geburtenrate pro Frau",
    legend_title="Land",
    height=700,
    width=1200
)

# Anzeigen der Grafik
fig_historical.show()

Top 2 Länder mit den niedrigsten Geburtenraten im Jahr 2023:
1. Macao: 0.66 Kinder pro Frau
2. Hong Kong: 0.72 Kinder pro Frau

Land mit der Median-Geburtenrate im Jahr 2023:
Sri Lanka: 1.97 Kinder pro Frau

Top 2 Länder mit den höchsten Geburtenraten im Jahr 2023:
1. Somalia: 6.13 Kinder pro Frau
2. Chad: 6.12 Kinder pro Frau


### WORLD FTR

In [26]:
fertility_rate_world = fertility_rate[fertility_rate['entities'] == 'World']
fertility_rate_world.head()

Unnamed: 0,entities,years,children_per_woman_un
18426,World,1950,4.852
18427,World,1951,4.816
18428,World,1952,5.001
18429,World,1953,4.922
18430,World,1954,4.998


In [43]:
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd # Falls noch nicht importiert

# Annahme: Dein DataFrame heisst 'fertility_rate' und enthält die Daten
# Filtern der Daten für Welt und Schweiz
fertility_rate_world = fertility_rate[fertility_rate['entities'] == 'World'].copy()
fertility_rate_switzerland = fertility_rate[fertility_rate['entities'] == 'Switzerland'].copy()

# Erstellen des Plotly-Diagramms für die Welt
fig = px.line(fertility_rate_world,
              x='years',
              y='children_per_woman_un',
              template='plotly_dark',
              labels={'children_per_woman_un': 'TFR Welt'},
              color_discrete_sequence=px.colors.sequential.Sunsetdark)

# Hinzufügen der Linie für die Schweiz
fig.add_trace(go.Scatter(x=fertility_rate_switzerland['years'],
                         y=fertility_rate_switzerland['children_per_woman_un'],
                         mode='lines',
                         name='Schweiz', # Name wird nicht angezeigt, da Legende aus
                         line=dict(color='lightblue'))) # Eigene Farbe für Schweiz

# Anpassen der Achsenbeschriftungen und Layout
fig.update_layout(
    title={
        'text': 'Gesamtfruchtbarkeitsrate: Welt vs. Schweiz',
        'y': 0.95,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font': {'size': 24}
    },
    xaxis_title="Jahr",
    yaxis_title="Kinder pro Frau",
    template='plotly_dark',
    height=800,
    width=1200,
    xaxis=dict(
        ticks='outside',
        ticklen=10,
        tickfont=dict(size=12),
        title_standoff=25,
        side='bottom'
    ),
    showlegend=False # Legende entfernen
)

# Spezifische Jahre für Annotationen
years_to_annotate = [1963, 2000, 2010, 2023]
world_y_shifts = [30, 30, 30, 30]  # Positive Verschiebung für Welt
switzerland_y_shifts = [-30, -30, -30, -30] # Negative Verschiebung für Schweiz

# Annotationen für Welt hinzufügen
for year, y_shift in zip(years_to_annotate, world_y_shifts):
    data_point = fertility_rate_world[fertility_rate_world['years'] == year]
    if not data_point.empty:
        fig.add_annotation(
            x=year,
            y=data_point['children_per_woman_un'].iloc[0],
            text=f"Welt {year}: {data_point['children_per_woman_un'].iloc[0]:.2f}",
            showarrow=True,
            arrowhead=2,
            arrowsize=1,
            arrowwidth=2,
            arrowcolor='#fb9f3a',    # Pfeilfarbe Welt (Orange/Gelb)
            font=dict(size=12, color='white'),
            bgcolor='rgba(0,0,0,0.7)',
            bordercolor='#fb9f3a',
            borderwidth=1,
            yshift=y_shift,          # Positive Verschiebung
            xshift=0,
            align='center',
            xanchor='center',
            ax=0,
            ay=-y_shift
        )

# Annotationen für Schweiz hinzufügen
for year, y_shift in zip(years_to_annotate, switzerland_y_shifts):
    data_point = fertility_rate_switzerland[fertility_rate_switzerland['years'] == year]
    if not data_point.empty:
        fig.add_annotation(
            x=year,
            y=data_point['children_per_woman_un'].iloc[0],
            text=f"CH {year}: {data_point['children_per_woman_un'].iloc[0]:.2f}", # Text angepasst
            showarrow=True,
            arrowhead=2,
            arrowsize=1,
            arrowwidth=2,
            arrowcolor='lightblue',    # Pfeilfarbe Schweiz
            font=dict(size=12, color='white'),
            bgcolor='rgba(0,0,0,0.7)',
            bordercolor='lightblue',   # Randfarbe Schweiz
            borderwidth=1,
            yshift=y_shift,          # Negative Verschiebung
            xshift=0,
            align='center',
            xanchor='center',
            ax=0,
            ay=-y_shift             # Pfeil zeigt von unten nach oben auf den Punkt
        )


# Anzeigen des Diagramms
fig.show()

In [28]:
import math

# Gegebene Startpopulation (China)
population_initial = 1422584937

# Annahmen
generation_time = 25  # Jahre pro Generation

# Funktion zur Berechnung der Halbierung in Generationen und Jahren
def halbierungszeit(TFR):
    R = TFR / 2.0
    n = math.log(0.5) / math.log(R)
    years = n * generation_time
    return n, years

# Beispiel 1: TFR = 1.3
TFR_1 = 1.8
n1, years1 = halbierungszeit(TFR_1)
print("Bei TFR =", TFR_1)
print("Anzahl Generationen bis Halbierung:", n1)
print("Halbierungszeit in Jahren:", years1)

# Beispiel 2: TFR = 1.59
TFR_2 = 0.8
n2, years2 = halbierungszeit(TFR_2)
print("\nBei TFR =", TFR_2)
print("Anzahl Generationen bis Halbierung:", n2)
print("Halbierungszeit in Jahren:", years2)


Bei TFR = 1.8
Anzahl Generationen bis Halbierung: 6.578813478960585
Halbierungszeit in Jahren: 164.47033697401463

Bei TFR = 0.8
Anzahl Generationen bis Halbierung: 0.7564707973660301
Halbierungszeit in Jahren: 18.91176993415075


**Fazit**

- Wenn man alle Faktoren (Altersverteilung, Sterblichkeit, Migration, altersabhängige Fertilität, etc.) berücksichtigen will, kommt man um ein `Cohort-Component-Modell (bzw. Leslie-Matrix-Modell)` nicht herum.
- Eine **einfache Gleichung existiert nicht, weil die Realität zu komplex ist**. Stattdessen iteriert man das Modell über mehrere Zeitschritte, oft in 1- oder 5-Jahres-Intervallen.
- Genau solche Modelle werden von Institutionen wie der UN, der Weltbank oder nationalen Statistikämtern genutzt, um Bevölkerungsprognosen bis 2100 oder darüber hinaus zu erstellen.

## Verteilung Alter

In [29]:
median_df = charts.get_data("median-age")
popl_by_age_with_projections_df = charts.get_data("population-by-age-group-with-projections")
male_popl_by_age_df = charts.get_data("male-population-by-age-group")
female_popl_by_age_df = charts.get_data("female-population-by-age-group")

In [30]:
median_df.head()

Unnamed: 0,entities,years,median_age__sex_all__age_all__variant_estimates,median_age__sex_all__age_all__variant_medium
0,Afghanistan,1950,18.395,
1,Afghanistan,1951,18.37,
2,Afghanistan,1952,18.333,
3,Afghanistan,1953,18.289,
4,Afghanistan,1954,18.239,


In [31]:
popl_by_age_with_projections_df.head()

Unnamed: 0,entities,years,population__sex_all__age_all__variant_estimates,population__sex_all__age_all__variant_medium,population__sex_all__age_65plus__variant_estimates,population__sex_all__age_65plus__variant_medium,population__sex_all__age_25_64__variant_estimates,population__sex_all__age_25_64__variant_medium,population__sex_all__age_0_24__variant_medium,population__sex_all__age_0_24__variant_estimates,population__sex_all__age_0_14__variant_estimates,population__sex_all__age_0_14__variant_medium,population__sex_all__age_0_4__variant_estimates,population__sex_all__age_0_4__variant_medium
0,Afghanistan,1950,7776133.0,,221568.0,,2881732.0,,,4672833.0,3191673.0,,1300029.0,
1,Afghanistan,1951,7879295.0,,225314.0,,2914311.0,,,4739670.0,3236279.0,,1304860.0,
2,Afghanistan,1952,7987737.0,,228844.0,,2948420.0,,,4810473.0,3284100.0,,1312384.0,
3,Afghanistan,1953,8096656.0,,232091.0,,2982312.0,,,4882253.0,3333375.0,,1324539.0,
4,Afghanistan,1954,8207910.0,,235121.0,,3016433.0,,,4956356.0,3385826.0,,1342583.0,


---
### Median

In [32]:
median_df.head()

Unnamed: 0,entities,years,median_age__sex_all__age_all__variant_estimates,median_age__sex_all__age_all__variant_medium
0,Afghanistan,1950,18.395,
1,Afghanistan,1951,18.37,
2,Afghanistan,1952,18.333,
3,Afghanistan,1953,18.289,
4,Afghanistan,1954,18.239,


In [33]:
africa_df = median_df[median_df['entities'] == 'Africa (UN)']
africa_df[africa_df['years'] > 2020]

Unnamed: 0,entities,years,median_age__sex_all__age_all__variant_estimates,median_age__sex_all__age_all__variant_medium
222,Africa (UN),2021,18.769,
223,Africa (UN),2022,18.894,
224,Africa (UN),2023,19.032,
225,Africa (UN),2024,,19.174
226,Africa (UN),2025,,19.321
...,...,...,...,...
297,Africa (UN),2096,,34.329
298,Africa (UN),2097,,34.516
299,Africa (UN),2098,,34.700
300,Africa (UN),2099,,34.883


In [34]:
median_df['years'].unique()

array([1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960,
       1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971,
       1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982,
       1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993,
       1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,
       2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015,
       2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026,
       2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035, 2036, 2037,
       2038, 2039, 2040, 2041, 2042, 2043, 2044, 2045, 2046, 2047, 2048,
       2049, 2050, 2051, 2052, 2053, 2054, 2055, 2056, 2057, 2058, 2059,
       2060, 2061, 2062, 2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070,
       2071, 2072, 2073, 2074, 2075, 2076, 2077, 2078, 2079, 2080, 2081,
       2082, 2083, 2084, 2085, 2086, 2087, 2088, 2089, 2090, 2091, 2092,
       2093, 2094, 2095, 2096, 2097, 2098, 2099, 21

In [35]:
median_df['entities'].unique()

array(['Afghanistan', 'Africa (UN)', 'Albania', 'Algeria',
       'American Samoa', 'Andorra', 'Angola', 'Anguilla',
       'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba',
       'Asia (UN)', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas',
       'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium',
       'Belize', 'Benin', 'Bermuda', 'Bhutan', 'Bolivia',
       'Bonaire Sint Eustatius and Saba', 'Bosnia and Herzegovina',
       'Botswana', 'Brazil', 'British Virgin Islands', 'Brunei',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon',
       'Canada', 'Cape Verde', 'Cayman Islands',
       'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia',
       'Comoros', 'Congo', 'Cook Islands', 'Costa Rica', "Cote d'Ivoire",
       'Croatia', 'Cuba', 'Curacao', 'Cyprus', 'Czechia',
       'Democratic Republic of Congo', 'Denmark', 'Djibouti', 'Dominica',
       'Dominican Republic', 'East Timor', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial

In [36]:
import plotly.express as px
import pandas as pd
import numpy as np

# Filtern der gewünschten Regionen
regions = ["Asia (UN)", "Europe (UN)", "United States", "Africa (UN)"]
filtered_df = median_df[median_df["entities"].isin(regions)].copy()

# Die Spalten für das mediane Alter
estimates_column = "median_age__sex_all__age_all__variant_estimates"
medium_column = "median_age__sex_all__age_all__variant_medium"

# Neue Spalte erstellen, die beide Datenquellen kombiniert
filtered_df["median_age_combined"] = np.where(
    filtered_df[estimates_column].notna(),
    filtered_df[estimates_column],
    filtered_df[medium_column]
)

# Erstelle eine Textspalte mit "Projected" für Jahre nach 2023
filtered_df["custom_text"] = filtered_df.apply(
    lambda row: f"{row['median_age_combined']:.1f}<br><span style='color:#edeaa8;font-size:12px'>Projected</span>" 
    if row["years"] > 2023 
    else f"{row['median_age_combined']:.1f}", 
    axis=1
)

# Animiertes Balkendiagramm erstellen
fig = px.bar(
    filtered_df,
    x="entities",
    y="median_age_combined",
    color="entities",
    animation_frame="years",
    range_y=[0, filtered_df["median_age_combined"].max() * 1.1],
    title="Medianes Alter nach Region (1950-2100)",
    labels={
        "entities": "Region", 
        "median_age_combined": "Medianes Alter (Jahre)",
        "years": "Jahr"
    },
    text="custom_text"  # Verwende die benutzerdefinierte Textspalte
)

# Layout anpassen
fig.update_layout(
    template="plotly_dark",
    legend_title="Region",
    font=dict(size=14),
    height=600,
    width=800
)

# Textposition anpassen
fig.update_traces(textposition="outside")

# Animation anpassen
fig.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 100
fig.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 50

# Anzeigen der Grafik
fig.show()

In [37]:
import plotly.express as px
import pandas as pd

# Filtern der gewünschten Regionen
regions = ["Asia (UN)", "Europe (UN)", "United States", "Africa (UN)"]
filtered_df = median_df[median_df["entities"].isin(regions)]

# Die Spalte für das mediane Alter
median_column = "median_age__sex_all__age_all__variant_estimates"

# Liniendiagramm erstellen
fig = px.line(
    filtered_df,
    x="years",
    y=median_column,
    color="entities",
    title="Entwicklung des medianen Alters nach Region (1950-2023)",
    labels={
        "years": "Jahr", 
        median_column: "Medianes Alter (Jahre)",
        "entities": "Region"
    },
    color_discrete_sequence=px.colors.qualitative.Bold,
    markers=True,  # Marker für jeden Datenpunkt
    line_shape="spline"  # Glättere Linien
)

# Layout anpassen
fig.update_layout(
    template="plotly_dark",
    legend_title="Region",
    font=dict(size=14),
    height=600,
    width=1000,
    hovermode="x unified"  # Zeigt alle Werte für ein Jahr beim Hovern an
)

# Y-Achse anpassen
fig.update_yaxes(range=[0, filtered_df[median_column].max() * 1.1])

# Anzeigen der Grafik
fig.show()

## Krankheitskosten