# Algeria Population by State (Wilaya) from 1861 to 2020

***By Souames Mohamed Annis & Bousselat Ahmed Moncef***

This notebook will build an animated choropleth of the population of the 48 states of Algeria from 1861 (colonized era) to 2020 (Last year).

## Used Libraries :
- BeautifulSoup
- Requests
- Geopandas
- Pandas
- Scipy
- Matplotlib
- Moviepy

## Data Sources : 
- ONS : the national statistics office offer population data per state for different census : 1987, 1998, 2008.
- World Bank : Total algerian population from 1960 to 2020
- Persee : French archives for total algerian population before 1960, upto 1861.


## Method:

The idea is to use linear interpolation between census data, and for years where we do not have the population by state, but we have the total population we juste use a percent calculated from the closest point (2008 or 1987).

These percents are calculated using the ONS data since they are the most reliable for the period of 1987-2008.

Example : to get the approximate population of the Tamanrasset state in 2010, we use the percent of population in tamanresset in 2008 (ONS data) then multiply it by the total population of 2010 (world bank data).

The inconvenient with this technique is that we are not taking into account the distribution of percentages over the years. However it's still a good approximation.

In [None]:
!pip install geopandas bs4 requests

In [3]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import geopandas as gpd
import numpy as np
import scipy as scp
import matplotlib.pyplot as plt
from scipy import interpolate
import os
from moviepy.editor import ImageSequenceClip

Let's get a geojson file of all states (58 according to the new borders), however we will later try to convert it to the old 48 states using dissolve from GeoPandas.

In [5]:
!wget https://raw.githubusercontent.com/fr33dz/Algeria-geojson/master/all-wilayas.geojson -O "all-wilayas.geojson"

--2021-10-26 13:30:53--  https://raw.githubusercontent.com/fr33dz/Algeria-geojson/master/all-wilayas.geojson
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 22768492 (22M) [text/plain]
Saving to: ‘all-wilayas.geojson’


2021-10-26 13:30:53 (137 MB/s) - ‘all-wilayas.geojson’ saved [22768492/22768492]



## Building 48 wilayas geojson

In [6]:
def build_gpd(geojson):
    # Convert algeria's new 58 states to the standard 48 states
    algeria_48 = geojson.copy()
    algeria_48 = algeria_48.drop(['name_ar', 'name_ber', 'density'], axis="columns")
    renames = {
        "Timimoune": "Adrar",
        "Bordj Badji Mokhtar": "Adrar",
        "In Guezzam": "Tamanrasset",
        "In Salah": "Tamanrasset",
        "Djanet": "Illizi",
        "Béni Abbès": "Béchar",
        "El Menia": "Ghardaia",
        "Ouled Djellal": "Biskra",
        "El M'Ghair": "El Oued",
        "Touggourt": "Ouargla"
    } 
    for prev_name, new_name in renames.items():
        algeria_48['name'] = algeria_48['name'].replace([prev_name], new_name)

    algeria_48["geometry"] = algeria_48['geometry'].buffer(0.0001)
    algeria_48 = algeria_48.dissolve(by='name', as_index=False)

    algeria_48["city_code"] = pd.to_numeric(algeria_48["city_code"])

    algeria_48 = algeria_48.set_index("city_code").sort_index()
    algeria_48 = algeria_48.rename(columns={"name": "wilaya"})
    
    return algeria_48

In [7]:
algeria = gpd.read_file('all-wilayas.geojson')

In [33]:
algeria_48 = build_gpd(algeria)

# Scraping data from Wikipedia (1987-2008) : 

There is a wikipedia page with data from ONS with population over 3 census : 1987, 1998,2008 per state. These are quite reliable data and official numbers.

For years between each census, we use a linear interpolation.



In [4]:
def scrape_algeria_pop():
    base_url = "https://en.wikipedia.org/wiki/List_of_Algerian_provinces_by_population"
    page = requests.get(base_url).text
    soup = BeautifulSoup(page,"html.parser")
    table = soup.find('table',class_ = "wikitable")
    df = pd.read_html(str(table).replace("&nbsp;",""))
    return df[0]

def clean_df(df):
    temp = df.copy()
    temp["Name"] = [wil.replace("Province","") for wil in temp["Name"]]
    temp = temp.rename(
        {
            "Name": "wilaya",
            "1987 census": "1987",
            "1998 census": "1998",    
            "2008 census": "2008",
        },
        axis=1
    )
    temp = temp[:-1]
    cols = ["1987","1998","2008"]
    
    for col in cols:
        temp[col] = temp[col].astype("int")
    
    return temp

def generate_other_years(df,start,end):
    # Interpolate years between each census
    years = [y for y in range(start,end)]

    for wil in df.wilaya :
        # Build interpolated function for each wilaya : 
        true_y = df[['1987','1998','2008']].values

    f = interpolate.interp1d([1987,1998,2008],true_y,kind='quadratic')
    for y in years: 
        df[str(y)] = f(y)
        #df[str(y)] = df[str(y)].astype("int").apply(np.log)

    return df

In [24]:
# data_1 is a df for years between 1987 and 2008
data_1 = scrape_algeria_pop()
data_1 = generate_other_years(clean_df(data_1),1987,2008)
data_1["Wilaya"] = pd.to_numeric(data_1["Wilaya"])
data_1 = data_1.rename(columns={"Wilaya": "city_code"}).drop(['wilaya'], axis="columns")


##Generate total population between 1861-2020:

We created an excel file using manual data collection from 1861 to 2020 from world bank data and other french archives.

You can find the csv file in the github `total_population.csv`.

We then use percents to deduce the population for each state

In [19]:
geometries = algeria_48[["wilaya", "geometry"]].drop_duplicates().sort_values("wilaya")

percentages = algeria_48.groupby(by=["city_code", "wilaya", "year"]).population.mean() / algeria_48.groupby(by=["year"]).population.sum()

s = []
for i in range(len(percentages)):
    s.append(percentages.index[i][2] == "1987")

percentages = percentages[s].droplevel(2)

a = algeria_48.copy()

for y in range(1861, 1987):
    pop = (percentages * algeria_pop[algeria_pop["year"] == y]["population"].iloc[0]).astype(int).to_frame().reset_index()
    pop["year"] = y

    pop = pop.merge(geometries, on="wilaya", how="left")
    pop = pop[["city_code", "wilaya", "geometry", "year", "population"]]
    a = gpd.GeoDataFrame(pd.concat([a, pop], ignore_index=True))


percentages = algeria_48.groupby(by=["city_code", "wilaya", "year"]).population.mean() / algeria_48.groupby(by=["year"]).population.sum()

s = []
for i in range(len(percentages)):
    s.append(percentages.index[i][2] == "2008")

percentages = percentages[s].droplevel(2)

for y in range(2009, 2021):
    pop = (percentages * algeria_pop[algeria_pop["year"] == y]["population"].iloc[0]).astype(int).to_frame().reset_index()
    pop["year"] = y

    pop = pop.merge(geometries, on="wilaya", how="left")
    pop = pop[["city_code", "wilaya", "geometry", "year", "population"]]
    a = gpd.GeoDataFrame(pd.concat([a, pop], ignore_index=True))

a["year"] = a.year.astype(int)

a = a.sort_values(["year", "city_code"]).reset_index().drop(['index'], axis="columns")


## Final geo dataframe

You can find a csv file with all data if you find problems with running the code.



In [25]:
data = pd.read_csv("pop_wilaya_generated.csv")
data.head()

Unnamed: 0,city_code,wilaya,year,population
0,1,Adrar,1861,28017
1,2,Chlef,1861,87786
2,3,Laghouat,1861,27791
3,4,Oum El Bouaghi,1861,52007
4,5,Batna,1861,97775


In [35]:
algeria_data = algeria_48.merge(data, on="city_code", how="left")

In [36]:
algeria_data.head()

Unnamed: 0,city_code,wilaya_x,geometry,wilaya_y,year,population
0,1,Adrar,"POLYGON ((1.42706 23.99752, 1.42854 23.95840, ...",Adrar,1861,28017
1,1,Adrar,"POLYGON ((1.42706 23.99752, 1.42854 23.95840, ...",Adrar,1862,28600
2,1,Adrar,"POLYGON ((1.42706 23.99752, 1.42854 23.95840, ...",Adrar,1863,29183
3,1,Adrar,"POLYGON ((1.42706 23.99752, 1.42854 23.95840, ...",Adrar,1864,29767
4,1,Adrar,"POLYGON ((1.42706 23.99752, 1.42854 23.95840, ...",Adrar,1865,30350


# Create animated choropleth

We will use matplotlib and geopandas to create a choropleth, then save every year as a jpeg image, and then create an MP4 through moviepy

In [38]:
!rm -rdf images
!mkdir images

In [42]:
vmax = algeria_data.population.max()
vmin = algeria_data.population.min()

In [43]:
for y in range(1861,2021):
  
  fig = algeria_data[algeria_data.year == y].plot(figsize=(10, 10), column="population", cmap='plasma', vmin = vmin, vmax = vmax, legend=True, norm=plt.Normalize(vmin=vmin, vmax=vmax))
  plt.title("Algeria's Population in " + str(y))
  plt.annotate('© Souames Mohamed Annis & Bousselat Ahmed Moncef - Algeria told by data (Github)', (0,0), (0, -20), xycoords='axes fraction', textcoords='offset points', va='top')
  fn = "pop_" + str(y) + ".jpg"
  filepath = os.path.join('/content/images', fn)
  chart = fig.get_figure()
  chart.savefig(filepath, dpi=200)
  plt.close(chart)

In [44]:
from moviepy.editor import ImageSequenceClip
images = []
#load 10 images
for i in range(1861,2021):
    images.append('images/pop_'+str(i)+'.jpg')

clip = ImageSequenceClip(images,fps=10)
clip.write_videofile('pop_new_plasma_10.mp4')

[MoviePy] >>>> Building video pop_new_plasma_10.mp4
[MoviePy] Writing video pop_new_plasma_10.mp4


100%|██████████| 160/160 [00:16<00:00,  9.64it/s]


[MoviePy] Done.
[MoviePy] >>>> Video ready: pop_new_plasma_10.mp4 

