# Demographic Research Methods and the PyCCM Library
## Computer Lab 4: Combining previous labs into a projection
## Instructor: Jiani Yan
## Date: October 30th, 2025
---

In this class, we are going to learn about how to do some fertility calculations in Python. As before, Section 1 is most important. We will try to finish both sections in the class, but it might be that later sections should be done as a 'class homework'. Solutions for all sections will either be presented at the start of the second day, or at the end of the class, time permitting.

Every ~15 minutes we'll stretch our legs, drink some water, and then live code up some answers so everybody can catch up. Relax, you're doing great!

In [None]:
import numpy as np
from pathlib import Path
import os

src_path= Path.cwd().parents[2]/'src'
data_path = Path.cwd().parents[2]/'data'

print(src_path)
os.chdir(src_path)

In [1]:
import numpy as np
import pandas as pd

from data_loaders import load_all_data
from abridger import unabridge_df
from fertility import compute_asfr, get_target_params
from mortality import make_lifetable
from helpers import _collapse_defunciones_01_24_to_04, _tfr_from_asfr_df, _smooth_tfr


## 1. Section 1: Loading data and unabridging

---

### 1.1 Load in the conteos.rds file, using the data_loaders module or otherwise.

In [1]:
# <Your answer goes here>

### 1.2 Filter for your favorite DPTO. Lets focus on FEMALES first. Obtain one array/series for population counts, deaths, and fertility.

In [2]:
# <Your answer goes here>

### 1.3 Unabridge each of these series using tools from PyCCM.

In [3]:
# <Your answer goes here>

### 1.4 Use the code from labs 2 and 3 to calculate survivorship ratios and fertility rates just like before, but instead on un-abridged data.

In [4]:
# <Your answer goes here>

### 1.5 This should give you the three things you need for creating a closed female only CCM:

* Counts
* Survivorship ratios
* ASFRs.

## 2. Section 2: Building our first Leslie Matrices

---

### 2.1 Define an empty numpy array of the appropriate size.

In [5]:
# <Your answer goes here>

### 2.2 Survive your ASFRs as defined in the lecture

In [6]:
# <Your answer goes here>

### 2.3 Position this corrected array into the appropriate place on the first row of the Leslie Matrix (noting early 0s)

In [7]:
# <Your answer goes here>

### 2.4 Position this updated array into the appropriate place on the first row of the Leslie Matrix (noting early 0s)

In [8]:
# <Your answer goes here>

### 2.5 Take your survivorship ratios. These don't need adjusting (here). Position them on the lower off-diagonal (i.e. A_{i+1, i} forall i \in{0, 90}).

In [9]:
# <Your answer goes here>

### 2.6 Multiply this matrix by your population count array.

In [10]:
# <Your answer goes here>

### 2.7 In a loop or otherwise (e.g. exponentiating the LM), to create multi-year projections

In [11]:
# <Your answer goes here>

## 3. Section 3: Repeat this exercise, but instead for Males. Note, here, that we need a MF leslie matrix to survive the women giving births to males. We also need to adjust our SRB!

In [12]:

# ---- code cell ----
print("\n" + "="*60)
print("Section 3: Building Leslie Matrix for MALES")
print("="*60)

# Get MALE data (SEXO = 1)
df_male = df_dpto[df_dpto['SEXO'] == 1.0].copy()

df_pop_m = df_male[(df_male['VARIABLE'] == 'poblacion_total') &
                   (df_male['FUENTE'] == f'censo_{year}')].copy()

df_deaths_m = df_male[(df_male['VARIABLE'] == 'defunciones') &
                      (df_male['FUENTE'] == f'censo_{year}')].copy()

# Unabridge male data
df_deaths_m_fixed = _collapse_defunciones_01_24_to_04(df_deaths_m)

df_pop_m_1yr = unabridge_df(
    df_pop_m,
    series_keys=['DPTO_NOMBRE', 'ANO', 'SEXO', 'VARIABLE', 'FUENTE'],
    value_col='VALOR',
    ridge=1e-6
)

df_deaths_m_1yr = unabridge_df(
    df_deaths_m_fixed,
    series_keys=['DPTO_NOMBRE', 'ANO', 'SEXO', 'VARIABLE', 'FUENTE'],
    value_col='VALOR',
    ridge=1e-6
)

# Build male life table
pop_m_by_age = df_pop_m_1yr.groupby('EDAD')['VALOR'].sum()
deaths_m_by_age = df_deaths_m_1yr.groupby('EDAD')['VALOR'].sum()

# Filter out NaN ages and align to expected age range
pop_m_by_age = pop_m_by_age[pop_m_by_age.index.notna()]
deaths_m_by_age = deaths_m_by_age[deaths_m_by_age.index.notna()]

# Reindex to ensure we have all ages
pop_m_by_age = pop_m_by_age.reindex(ages_1yr, fill_value=0.0)
deaths_m_by_age = deaths_m_by_age.reindex(ages_1yr, fill_value=0.0)

lt_male = make_lifetable(
    ages=pop_m_by_age.index,  # Pass the age index
    population=pop_m_by_age.values,
    deaths=deaths_m_by_age.values,
    radix=100000,
    use_ma=True,
    ma_window=5
)

print(f"\n✓ Male life table calculated:")
print(f"  Male life expectancy (e_0): {lt_male.iloc[0]['ex']:.2f} years")

# Calculate male survivorship ratios
survivorship_m = np.zeros(len(ages_1yr))
lx_values_m = lt_male['lx'].values

for i in range(len(ages_1yr) - 1):
    if lx_values_m[i] > 0:
        survivorship_m[i] = lx_values_m[i + 1] / lx_values_m[i]
    else:
        survivorship_m[i] = 0.0

ex_last_m = lt_male.iloc[-1]['ex']
if ex_last_m > 0:
    survivorship_m[-1] = np.exp(-1.0 / ex_last_m)
else:
    survivorship_m[-1] = 0.0

# Create male population vector
population_m = pop_m_by_age.reindex(ages_1yr, fill_value=0.0).values

print(f"  Total male population: {population_m.sum():,.0f}")
print(f"  Infant survival (age 0→1): {survivorship_m[0]:.4f}")

# Build male Leslie matrix
# Note: Males are born from FEMALE fertility, so first row uses female ASFR
leslie_matrix_M = np.zeros((n_ages, n_ages))

# Male births from female fertility (proportion male = SRB / (1 + SRB))
prop_male = srb / (1.0 + srb)

# Survived ASFR for male births
survived_asfr_m = np.zeros(n_ages)
for i, age_str in enumerate(reproductive_ages):
    age_idx = int(age_str)
    if age_idx < n_ages - 1:
        survived_asfr_m[age_idx] = (
            asfr_values[i] * prop_male *
            (survivorship_m[age_idx] + survivorship_m[age_idx + 1]) / 2
        )
    else:
        survived_asfr_m[age_idx] = asfr_values[i] * prop_male * survivorship_m[age_idx]

# Place in first row
leslie_matrix_M[0, :] = survived_asfr_m

# Male survivorship on sub-diagonal
for i in range(n_ages - 1):
    leslie_matrix_M[i + 1, i] = survivorship_m[i]

leslie_matrix_M[-1, -1] = survivorship_m[-1]

print(f"\n✓ Male Leslie Matrix complete")
print(f"  Proportion male births: {prop_male:.3f}")

# Project males
projections_m = np.zeros((projection_years + 1, n_ages))
projections_m[0, :] = population_m

for t in range(projection_years):
    projections_m[t + 1, :] = leslie_matrix_M @ projections_m[t, :]

total_pop_m = projections_m.sum(axis=1)

print(f"\n✓ Male projections complete:")
print(f"  Start year {year}:        {total_pop_m[0]:,.0f}")
print(f"  End year {year+projection_years}:          {total_pop_m[-1]:,.0f}")


## 4. Section 4: Combine projections from Males and Females to give a total population projection for multiple years. Visualise this.

In [13]:
# <Your answer goes here>

## 5. Section 5: Add half of the migration array to the exposures for both mortality and fertility, male and female.

In [14]:
# <Your answer goes here>

## 6. Section 6: Incorporate MA mortality smoothing and target mortality improvements/fertility adjustments over time.

In [15]:
# <Your answer goes here>

## 7. Section 7. Account for Omissions. Congratulations; you've just replicated PyCCM :)

In [16]:
# <Your answer goes here>