# Notebook 03 — Reporting & Export

## Purpose
This notebook takes the cleaned indicator datasets and produces:
1. Summary tables per AU WGYD indicator
2. A combined master export file for Power BI
3. A printed indicator summary report

## Why this matters
In M&E pipelines, the final step is always packaging analysis results
into formats that non-technical stakeholders can use — programme
managers, donors, and leadership teams. This notebook automates that
reporting step.


In [1]:
# Import libraries
import pandas as pd
import os

# ── File paths ──────────────────────────────────────────────
# os.path.dirname(__file__) does not work in notebooks
# so we build the path from the notebook's current directory

BASE_DIR = os.path.dirname(os.path.abspath("03_reporting_export.ipynb"))
CLEANED_DIR = os.path.join(BASE_DIR, "cleaned")
EXPORT_DIR  = os.path.join(BASE_DIR, "cleaned", "exports")

# Create exports folder if it doesn't exist
os.makedirs(EXPORT_DIR, exist_ok=True)

print("Base directory  :", BASE_DIR)
print("Cleaned data    :", CLEANED_DIR)
print("Export folder   :", EXPORT_DIR)
print("Export folder ready ✓")


Base directory  : c:\Users\wanji\Downloads\TEE\Doing Data Analysis\Tasks\01_data
Cleaned data    : c:\Users\wanji\Downloads\TEE\Doing Data Analysis\Tasks\01_data\cleaned
Export folder   : c:\Users\wanji\Downloads\TEE\Doing Data Analysis\Tasks\01_data\cleaned\exports
Export folder ready ✓


## Step 1 — Load All Cleaned Datasets

We load all 5 cleaned CSV files into separate DataFrames.
Each file represents one AU WGYD indicator.

We then print the shape of each to confirm they loaded correctly
— shape means (rows, columns).


In [2]:
# Load all cleaned datasets
female_lfp      = pd.read_csv(os.path.join(CLEANED_DIR, "female_lfp_clean.csv"))
girls_enrol     = pd.read_csv(os.path.join(CLEANED_DIR, "girls_secondary_enrol_clean.csv"))
women_parl      = pd.read_csv(os.path.join(CLEANED_DIR, "women_in_parliament_clean.csv"))
youth_unemp_f   = pd.read_csv(os.path.join(CLEANED_DIR, "youth_unemployment_f_clean.csv"))
youth_unemp_m   = pd.read_csv(os.path.join(CLEANED_DIR, "youth_unemployment_m_clean.csv"))

# Confirm load
datasets = {
    "female_lfp"     : female_lfp,
    "girls_enrol"    : girls_enrol,
    "women_parl"     : women_parl,
    "youth_unemp_f"  : youth_unemp_f,
    "youth_unemp_m"  : youth_unemp_m
}

for name, df in datasets.items():
    print(f"{name:20s} → {df.shape[0]} rows, {df.shape[1]} columns")


female_lfp           → 486 rows, 6 columns
girls_enrol          → 486 rows, 6 columns
women_parl           → 486 rows, 6 columns
youth_unemp_f        → 486 rows, 6 columns
youth_unemp_m        → 486 rows, 6 columns


## Step 2 — Build Regional Summary Tables

For each indicator, we calculate:
- The **average value** per AU region across all years
- The **minimum** and **maximum** values per region
- The **number of records** per region (data completeness check)

This is the standard M&E summary table format used in AU progress reports.


In [3]:
# Build regional summary for each indicator
def regional_summary(df, indicator_name):
    """
    Groups a cleaned indicator DataFrame by AU region
    and calculates key summary statistics.
    """
    summary = df.groupby("au_region")["value"].agg(
        avg_value   = "mean",
        min_value   = "min",
        max_value   = "max",
        record_count= "count"
    ).reset_index()

    summary["avg_value"]  = summary["avg_value"].round(2)
    summary["min_value"]  = summary["min_value"].round(2)
    summary["max_value"]  = summary["max_value"].round(2)
    summary["indicator"]  = indicator_name

    return summary

# Run for all 5 indicators
summary_lfp      = regional_summary(female_lfp,    "female_lfp")
summary_enrol    = regional_summary(girls_enrol,   "girls_secondary_enrol")
summary_parl     = regional_summary(women_parl,    "women_in_parliament")
summary_unemp_f  = regional_summary(youth_unemp_f, "youth_unemployment_f")
summary_unemp_m  = regional_summary(youth_unemp_m, "youth_unemployment_m")

# Preview the first summary table
print("=== Female Labour Force Participation — Regional Summary ===")
print(summary_lfp.to_string(index=False))


=== Female Labour Force Participation — Regional Summary ===
      au_region  avg_value  min_value  max_value  record_count  indicator
 Central Africa      55.45      22.57      75.89            81 female_lfp
    East Africa      61.16      18.55      84.63           117 female_lfp
   North Africa      21.71      14.48      31.66            44 female_lfp
          Other      35.83      33.37      36.62             9 female_lfp
Southern Africa      58.36      48.61      78.56            81 female_lfp
    West Africa      57.59      26.80      81.88           144 female_lfp


## Step 3 — Combine All Summaries Into One Master Table

We stack all 5 regional summary tables into a single DataFrame.

This is the **master indicator report table** — one file that contains
all indicators, all regions, all summary statistics.

This is what gets exported to CSV and loaded into Power BI or
shared with programme managers as a standalone report.


In [4]:
# Stack all 5 summary tables into one master DataFrame
master_summary = pd.concat([
    summary_lfp,
    summary_enrol,
    summary_parl,
    summary_unemp_f,
    summary_unemp_m
], ignore_index=True)

# Reorder columns for clean reporting
master_summary = master_summary[[
    "indicator",
    "au_region",
    "avg_value",
    "min_value",
    "max_value",
    "record_count"
]]

# Sort by indicator then region
master_summary = master_summary.sort_values(
    ["indicator", "au_region"]
).reset_index(drop=True)

print(f"Master summary table: {master_summary.shape[0]} rows, {master_summary.shape[1]} columns")
print()
print(master_summary.to_string(index=False))


Master summary table: 30 rows, 6 columns

            indicator       au_region  avg_value  min_value  max_value  record_count
           female_lfp  Central Africa      55.45      22.57      75.89            81
           female_lfp     East Africa      61.16      18.55      84.63           117
           female_lfp    North Africa      21.71      14.48      31.66            44
           female_lfp           Other      35.83      33.37      36.62             9
           female_lfp Southern Africa      58.36      48.61      78.56            81
           female_lfp     West Africa      57.59      26.80      81.88           144
girls_secondary_enrol  Central Africa      40.33      11.04      95.13            23
girls_secondary_enrol     East Africa      57.55       2.76     113.52            56
girls_secondary_enrol    North Africa      73.50      40.47     106.15            22
girls_secondary_enrol           Other        NaN        NaN        NaN             0
girls_secondary_enrol S

## Step 4 — Export Master Summary to CSV

We export the master summary table to the `cleaned/exports/` folder.

This CSV is the file you load into Power BI as a reporting layer —
it is pre-aggregated, clean, and ready for executive dashboards
without any further transformation needed.


In [5]:
# Export master summary table to CSV
export_path = os.path.join(EXPORT_DIR, "master_indicator_summary.csv")
master_summary.to_csv(export_path, index=False)

print(f"Master summary exported to:")
print(f"  {export_path}")
print()

# Confirm file was written by reading it back
verify = pd.read_csv(export_path)
print(f"Verification — rows loaded back: {len(verify)}")
print(f"Verification — columns: {list(verify.columns)}")


Master summary exported to:
  c:\Users\wanji\Downloads\TEE\Doing Data Analysis\Tasks\01_data\cleaned\exports\master_indicator_summary.csv

Verification — rows loaded back: 30
Verification — columns: ['indicator', 'au_region', 'avg_value', 'min_value', 'max_value', 'record_count']


## Step 5 — Print Indicator Report

We print a formatted summary report for each indicator.

This is what you would include in a programme report or donor update —
a plain-English summary of each indicator's performance across regions,
flagging the highest and lowest performing regions.


In [6]:
# Print a formatted indicator report
indicators = master_summary["indicator"].unique()

print("=" * 60)
print("  AU WGYD INDICATOR REPORT — AFRICA GENDER & YOUTH TRACKER")
print("=" * 60)

for ind in sorted(indicators):
    subset = master_summary[master_summary["indicator"] == ind]
    top    = subset.loc[subset["avg_value"].idxmax(), "au_region"]
    bottom = subset.loc[subset["avg_value"].idxmin(), "au_region"]
    cont_avg = subset["avg_value"].mean().round(2)

    print(f"\nIndicator : {ind}")
    print(f"  Continental average : {cont_avg}")
    print(f"  Highest region      : {top}  ({subset['avg_value'].max():.2f})")
    print(f"  Lowest region       : {bottom}  ({subset['avg_value'].min():.2f})")
    print("-" * 60)


  AU WGYD INDICATOR REPORT — AFRICA GENDER & YOUTH TRACKER

Indicator : female_lfp
  Continental average : 48.35
  Highest region      : East Africa  (61.16)
  Lowest region       : North Africa  (21.71)
------------------------------------------------------------

Indicator : girls_secondary_enrol
  Continental average : 59.56
  Highest region      : Southern Africa  (73.95)
  Lowest region       : Central Africa  (40.33)
------------------------------------------------------------

Indicator : women_in_parliament
  Continental average : 21.84
  Highest region      : East Africa  (28.55)
  Lowest region       : Other  (16.07)
------------------------------------------------------------

Indicator : youth_unemployment_f
  Continental average : 30.43
  Highest region      : Other  (70.56)
  Lowest region       : West Africa  (8.54)
------------------------------------------------------------

Indicator : youth_unemployment_m
  Continental average : 22.56
  Highest region      : Other  (

## Step 6 — Export Individual Indicator Files

In addition to the master summary, we export one CSV per indicator.

This mirrors how real M&E reporting systems work — each programme
team gets their own indicator file, while the master file goes to
central reporting and leadership.


In [7]:
# Export one CSV per indicator
export_map = {
    "female_lfp"            : female_lfp,
    "girls_secondary_enrol" : girls_enrol,
    "women_in_parliament"   : women_parl,
    "youth_unemployment_f"  : youth_unemp_f,
    "youth_unemployment_m"  : youth_unemp_m
}

for name, df in export_map.items():
    file_path = os.path.join(EXPORT_DIR, f"{name}_export.csv")
    df.to_csv(file_path, index=False)
    print(f"Exported: {name}_export.csv  ({len(df)} rows)")

print()
print("All indicator files exported successfully ✓")


Exported: female_lfp_export.csv  (486 rows)
Exported: girls_secondary_enrol_export.csv  (486 rows)
Exported: women_in_parliament_export.csv  (486 rows)
Exported: youth_unemployment_f_export.csv  (486 rows)
Exported: youth_unemployment_m_export.csv  (486 rows)

All indicator files exported successfully ✓


In [8]:
# Final verification — list all files in the exports folder
print("=" * 60)
print("  EXPORT FOLDER CONTENTS — FINAL VERIFICATION")
print("=" * 60)

export_files = os.listdir(EXPORT_DIR)
export_files.sort()

for f in export_files:
    file_path = os.path.join(EXPORT_DIR, f)
    size_kb   = round(os.path.getsize(file_path) / 1024, 2)
    print(f"  {f:45s} {size_kb} KB")

print()
print(f"Total files exported: {len(export_files)}")
print()
print("Pipeline complete. Ready for Power BI ingestion ✓")


  EXPORT FOLDER CONTENTS — FINAL VERIFICATION
  female_lfp_export.csv                         23.74 KB
  girls_secondary_enrol_export.csv              29.33 KB
  master_indicator_summary.csv                  1.59 KB
  women_in_parliament_export.csv                31.98 KB
  youth_unemployment_f_export.csv               28.27 KB
  youth_unemployment_m_export.csv               28.24 KB

Total files exported: 6

Pipeline complete. Ready for Power BI ingestion ✓
