# Population Decline
## An exploratory data analysis

In [450]:
import pandas as pd
import matplotlib.pyplot as plt
from linearmodels import PanelOLS
import statsmodels.api as sm
import plotly.express as px
import plotly.graph_objects as go

The population of the world has been going up quite rapidly for quite some time now, but this trend is beginning to slow down. In parts of the world, most recently China, population decline has become an issue of great concern. Using data from 1955 until 2020, in this analysis I will explore global population trends, and try to shed light on important factors in population growth and decline.

## Graphical Analysis

In [451]:
path = "population.csv"
df = pd.read_csv(path)
df.drop(columns = ["Unnamed: 0", "GlobalRank", "Country%OfWorldPop"], inplace = True)
df = df[df["Continent"]!= "North America"]
df["Migrants%"] = (df['Migrants(net)'] / df['Population']) * 100
df.head()

Unnamed: 0,Continent,Country,Year,Population,Yearly%Change,YearlyChange,Migrants(net),MedianAge,FertilityRate,UrbanPop%,UrbanPopulation,Migrants%
0,Asia,Afghanistan,1955,8270991,1.3,103775,-4000,19.2,7.45,7.1,587818,-0.048362
1,Asia,Afghanistan,1960,8996973,1.7,145196,-4000,18.8,7.45,8.4,755797,-0.044459
2,Asia,Afghanistan,1965,9956320,2.05,191869,-4000,18.4,7.45,9.9,984350,-0.040175
3,Asia,Afghanistan,1970,11173642,2.33,243464,-4000,17.9,7.45,11.6,1295433,-0.035799
4,Asia,Afghanistan,1975,12689160,2.58,303104,-4000,17.3,7.45,13.5,1717422,-0.031523


In [452]:
continentalpop = df.groupby(["Continent", "Year"])["Population"].sum()
fig = px.line(
    data_frame= df,
    x = continentalpop.index.get_level_values(level = 1),
    y = continentalpop,
    color = continentalpop.index.get_level_values(level = 0),
    title= "World Population Over Time",
    labels={'y': "Population", 'x':'Year','color':'Continent'},
    log_y = False)
fig.show()

This has been the population paradigm we're familliar with: steady, predictable growth largely carried by developing nations, with slower but still positive growth elsewhere. Looking more closely, however, we can see an impending problem.

In [453]:
fig = px.violin(data_frame=df[((df["Year"] == 2020)|(df["Year"] == 1955))], x= "Continent", y='Yearly%Change', color = "Year")
fig.show()

Since the middle of the 20th century, the rate of population growth across most continents has crept downwards, and soon entire continents may begin experiencing a total population decline. Taking a look at certain individual countries, the situation becomes even more clear.

In [454]:
df["PopLessMigration"] = df["YearlyChange"] - df["Migrants(net)"] # Subtract migration from population growth stats to only include natural population decline
df1 = df[(df["PopLessMigration"] < -5000) & (df["Yearly%Change"] < 0) & (df["Year"] > 1999) & (df["Country"] != "Syria")]  # Filter a new dataframe for countries who have experienced natural population decline this millenium
declinelist = df1["Country"].unique()
fig = px.line(data_frame= df[df['Country'].isin(declinelist)], y = "Yearly%Change", x = "Year", color="Continent", hover_name = "Country")
fig.show()

Examining those countries that have already experienced decline, we can see several patterns:
- Most countries already experiencing population decline are in Europe
- These natural declines happen at a very slow pace, with a gradual decline in population growth rate over time
- The trend does not show any clear indications of reversing

From these, we can draw several conclusions:
- Population decline can occur as a result of endogenous factors
- These endogenous factors are prevalent in Europe and Japan
- These endogenous factors are not typically sensitive to external shocks

Having established that, lets examine what these factors could be, why they are prevalent in Europe and Japan and not elsewhere, and how to potentially reverse them.

In [455]:
df = df[(df["UrbanPop%"] != 0) & (df["MedianAge"] != 0) & (df["UrbanPop%"] < 100)]
fig = px.violin(data_frame=df[df["Year"] == 2020], y=["UrbanPop%", "MedianAge"], color= (df[df["Year"] == 2020]['Country'].isin(declinelist).astype('bool')), labels= {"color": "Pop Decline"}, range_y= [0,100] )
fig.show()


The countries which experience population decline are more urbanized and older than the group that didn't experience decline, on average. Age, while highly correlated, would be problematic for determining causality. However, it still does raise an interesting question: some countries nearly as old, and even more urbanized experienced population decline. Why?

In [456]:
fig = px.violin(data_frame=df[df["Year"] == 2015], y=["Migrants%"], color= (df[df["Year"] == 2015]['Country'].isin(declinelist).astype('bool')), labels= {"color": "Pop Decline"}, range_y= [-2.5,2.5] )
fig.show()

In combination with the previous graph, this could provide a more complete picture of the factors at play here. The median country in decline had net emigration, and the third quartile took only half the number of immigrants that the non-decline group did, by percentage of population.

## Modeling

Lets run a regression on the panel data we have, examining whether the above analysis is correct.

In [457]:
df['Year'] = pd.to_datetime(df["Year"], format = '%Y')
df = df.set_index(['Country', 'Year'])
df = df[df["FertilityRate"] != 0]

In [None]:
exog_vars = ["UrbanPop%", "Migrants%"]
exog = sm.add_constant(df[exog_vars])
mod = PanelOLS(dependent= df["FertilityRate"], exog = exog, entity_effects= True, time_effects = True, drop_absorbed= True).fit()
print(mod)

```
                          PanelOLS Estimation Summary                           
================================================================================
Dep. Variable:          FertilityRate   R-squared:                        0.0390
Estimator:                   PanelOLS   R-squared (Between):              0.2849
No. Observations:                3580   R-squared (Within):               0.2302
Date:                Thu, Feb 09 2023   R-squared (Overall):              0.2630
Time:                        14:30:25   Log-likelihood                   -3757.8
Cov. Estimator:            Unadjusted                                           
                                        F-statistic:                      68.254
Entities:                         201   P-value                           0.0000
Avg Obs:                       17.811   Distribution:                  F(2,3360)
Min Obs:                      10.0000                                           
Max Obs:                       18.000   F-statistic (robust):             68.254
                                        P-value                           0.0000
Time periods:                      18   Distribution:                  F(2,3360)
Avg Obs:                       198.89                                           
Min Obs:                       196.00                                           
Max Obs:                       201.00                                           
                                                                                
                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
const          4.9646     0.0940     52.827     0.0000      4.7804      5.1489
UrbanPop%     -0.0207     0.0018    -11.328     0.0000     -0.0243     -0.0172
Migrants%      0.0426     0.0171     2.4859     0.0130      0.0090      0.0762
==============================================================================

F-test for Poolability: 58.878
P-value: 0.0000
Distribution: F(217,3360)

Included effects: Entity, Time
```

Utilizing both entity and time effect controls, we have a result which seems to, at the surface, agree with the previous analysis. Urbanization is negatively correlated with fertility rate at a < 0.001 signficance level, and migration as a percentage of the population is correlated positively at a < 0.05 significance level.

Lets take a look at a few examples and see if these assumptions hold up.

In [474]:
Countries, years, variables= ['Japan', 'Greece', 'United States', 'Canada'], ['2020'] * 4, ["Yearly%Change", "UrbanPop%", "Migrants%", "FertilityRate"]
lookup = zip(Countries, years)
print(df.loc[lookup][variables])

                          Yearly%Change  UrbanPop%  Migrants%  FertilityRate
Country       Year                                                          
Japan         2020-01-01          -0.30       91.8   0.056580           1.37
Greece        2020-01-01          -0.48       84.9  -0.153506           1.30
United States 2020-01-01           0.59       82.8   0.288459           1.78
Canada        2020-01-01           0.89       81.3   0.641278           1.53


This simplistic look seems to confirm the model; urabanization leads to low fertility, but can be counteracted through immigration.

In [484]:
fig = px.line(data_frame= df.loc["China"], x = df.loc['China'].index.get_level_values(0), y = variables, log_y = True)
fig.show()

We can clearly see China heading for a similar situation as Japan here; exponentially increasing urbanization, negative migration, and dropping fertility rates. Given the trends seen here, it should come as no surprise that two years after out data ends China experienced its first population decline.

## Limitations

The simplistic ideas here will suffer from some omitted variable bias; no measure of wealth was used in the regression, so urbanization is likely acting as a proxy measure of overall levels of wealth. Additionally, the fixed effects during the regression wouldn't be able to capture shifting cultural trends during this timeframe, namely the increasing amounts of irreligious people in many of the countries studied. If further research were to be done into this topic, it should likely include several other factors.

## Conclusion

While prediction future trends is difficult, even with currently available data a paradaigm shift in global population dynamics is evident.  Europe and Japan are 'leading the pack' when it comes to declining population, but several other continents are currently where Europe was last century, and poised to follow in their footsteps perhaps even quicker than the Europeans themselves.

We will likely soon see places which have urbanized rapidly in the past few decades soon begin to experience the same symptoms felt by Europe today, perhaps to an even greater degree. For nations looking to counteract this decline, North America lends a solution: immigration seems to be an effective counter to natural population decline. Whether the low and middle-income soon coming head to head with demographic issues will be able to utilize this as effectively as the US and Canada remain to be seen.

Thank you for reading!