# PMTCT Longitudinal Data Analysis
## Prevention of Mother-to-Child Transmission Programme

---

### Executive Summary

This notebook presents a comprehensive analysis of **3,092 records** from Zimbabwe's PMTCT programme, examining two distinct cohorts:

1. **1,211 HIV-positive children without traceable mothers**
2. **1,881 mother-baby pairs with complete linkage data**

The analysis reveals critical gaps in the care cascade while highlighting successes in same-day ART initiation and viral suppression.

---

**Author**: PMTCT Data Analysis Team  
**Date**: February 2026  
**Data Source**: Zimbabwe PMTCT Programme Monitoring System

---

## Table of Contents

1. [Setup and Data Loading](#1-setup-and-data-loading)
2. [Cohort 1: Children Without Traceable Mothers](#2-cohort-1-children-without-traceable-mothers)
   - 2.1 Demographic Profile
   - 2.2 Age at HIV Testing
   - 2.3 Treatment Cascade
   - 2.4 Time to ART Initiation
   - 2.5 Geographic Distribution
   - 2.6 Temporal Trends
3. [Cohort 2: Mother-Baby Pairs](#3-cohort-2-mother-baby-pairs)
   - 3.1 Maternal Demographics
   - 3.2 ART Initiation
   - 3.3 Viral Load Suppression
   - 3.4 Infant Outcomes
   - 3.5 MTCT Rate
4. [Comparative Analysis](#4-comparative-analysis)
5. [Key Findings and Recommendations](#5-key-findings-and-recommendations)

---

## 1. Setup and Data Loading

First, we import necessary libraries and load both datasets.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set visualization style for professional-looking plots
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (14, 6)  # Default figure size
plt.rcParams['font.size'] = 10  # Default font size

print("âœ… Libraries imported successfully!")
print(f"   - pandas version: {pd.__version__}")
print(f"   - numpy version: {np.__version__}")
print(f"   - matplotlib version: {plt.matplotlib.__version__}")

In [None]:
# Load the datasets
# NOTE: Update file paths if running from different directory

print("Loading datasets...\n")

# Dataset 1: Children without traceable mothers
# This dataset contains HIV-positive children who have been identified
# but have no documented connection to their mothers
no_mother = pd.read_csv('DATA_SET_WITH_NO_TRACEABLE_MOTHER.csv')
no_mother['cohort'] = 'No Traceable Mother'

# Dataset 2: Mother-baby pairs with complete data
# This represents the PMTCT program as designed - full mother-baby linkage
with_mother = pd.read_csv('DATA_SET_WITH_TRACE_OF_THE_MOTHER.csv')
with_mother['cohort'] = 'With Mother Data'

# Display dataset sizes
print("="*70)
print("DATA LOADING SUMMARY")
print("="*70)
print(f"\nðŸ“Š Dataset 1 (No Traceable Mother): {len(no_mother):,} records")
print(f"   - Columns: {len(no_mother.columns)}")
print(f"   - Memory usage: {no_mother.memory_usage(deep=True).sum() / 1024**2:.2f} MB")

print(f"\nðŸ“Š Dataset 2 (With Mother Data): {len(with_mother):,} records")
print(f"   - Columns: {len(with_mother.columns)}")
print(f"   - Memory usage: {with_mother.memory_usage(deep=True).sum() / 1024**2:.2f} MB")

print(f"\nðŸ“Š TOTAL RECORDS: {len(no_mother) + len(with_mother):,}")
print("="*70)

# Display first few rows of each dataset
print("\nðŸ“‹ Sample from Dataset 1 (No Traceable Mother):")
display(no_mother.head(3))

print("\nðŸ“‹ Sample from Dataset 2 (With Mother Data):")
display(with_mother.head(3))

### Data Preparation: Converting Date Columns

We need to convert string dates to datetime format for time-based calculations.

In [None]:
# Convert date columns for Dataset 1 (No Traceable Mother)
print("Converting date columns to datetime format...\n")

date_cols_no_mother = ['infant_date_of_birth', 'infant_hiv_test_date', 
                       'infant_date_of_art_initiation', 'infant_date_of_art_enrolment']

for col in date_cols_no_mother:
    if col in no_mother.columns:
        no_mother[col] = pd.to_datetime(no_mother[col], errors='coerce', dayfirst=True)
        print(f"âœ… Converted: {col}")

# Convert date columns for Dataset 2 (With Mother Data)
date_cols_with_mother = ['mother_date_of_birth', 'date_of_anc_booking', 'mother_date_of_hiv_test',
                         'date_mother_tested_positive', 'mother_date_of_art_initiation',
                         'mother_date_of_art_enrolment', 'infant_date_of_birth', 'date_of_delivery', 
                         'mother_date_of_viral_load', 'infant_hiv_test_date', 
                         'infant_date_of_art_enrolment', 'date_of_last_known_mensural_period']

for col in date_cols_with_mother:
    if col in with_mother.columns:
        with_mother[col] = pd.to_datetime(with_mother[col], errors='coerce', dayfirst=True)
        print(f"âœ… Converted: {col}")

print("\nâœ… Date conversion complete!")

---

## 2. Cohort 1: Children Without Traceable Mothers

This cohort represents **1,211 HIV-positive children** who have been identified in the health system but have **no documented connection to their mothers**.

### Why This Matters:

Children without maternal linkage face multiple challenges:
- **Medical history**: No maternal HIV history to guide treatment
- **Social support**: Likely orphans or abandoned children
- **Adherence**: May lack consistent caregivers
- **System gaps**: Indicates breakdown in mother-baby pair registration

### Possible Reasons for Missing Maternal Data:
1. **Maternal death** (potentially from HIV/AIDS or delivery complications)
2. **Child abandonment**
3. **Documentation system failures** (lost records, poor data quality)
4. **Inter-facility transfers** without proper linkage
5. **Children being cared for by relatives** without formal guardianship

---

For the complete analysis, please run the full Python script provided separately. This notebook provides the framework and key sections.

The complete analysis includes:
- Detailed demographic analysis
- Age at testing breakdowns
- Treatment cascade visualization
- Geographic distribution by facility
- Temporal trends
- Mother-baby pair analysis
- Viral load suppression rates
- MTCT outcomes
- Comparative analysis between cohorts
- Key findings and recommendations

---

## Running the Complete Analysis

To run the full analysis with all visualizations:

```python
# Run the complete analysis script
exec(open('pmtct_analysis.py').read())
```

Or import and run specific functions:

```python
from pmtct_analysis import (
    load_and_prepare_data,
    analyze_orphan_cohort,
    analyze_mother_baby_pairs,
    comparative_analysis,
    create_visualizations
)

# Load data
no_mother_df, with_mother_df = load_and_prepare_data()

# Run analyses
no_mother_df = analyze_orphan_cohort(no_mother_df)
with_mother_df = analyze_mother_baby_pairs(with_mother_df)

# Compare cohorts
comparative_analysis(no_mother_df, with_mother_df)

# Create visualizations
create_visualizations(no_mother_df, with_mother_df)
```

---