# Unveiling Gen Z: A Journey Through Data (1997–2012)

In the quiet echoes of 1997, a generation began to emerge, moulded by a rapidly changing world. By 2012, this cohort, often labelled as Gen Z, had witnessed seismic shifts in technology, education, and societal norms. But who are they, and what defines their place in Kenya’s socio-economic landscape? Join us as we unravel their story through a compelling data-driven narrative.

## 1. Defining the Scope: Key Metrics of Focus
Gen Z stands apart, shaped by access to information, economic challenges, and social transformation. To understand them, our analysis centres on four critical dimensions:

- **Education**: Access, attainment, and quality.
- **Economic Participation**: Employment-to-population ratios, unemployment rates, and GDP trends.
- **Population Dynamics**: Census data, age distribution, and gender disparities.
- **Health and Social Indicators**: Youth health and associated societal factors.

## 2. Structuring the Analysis

### A. Population and Demographics
**Objective:** To paint a demographic portrait of Gen Z.

**Approach:**
- Utilise data from the **2019 Population Census** and **population by age, sex, and education datasets**.
- Quantify Gen Z’s share in Kenya’s total population.
- Compare rural versus urban distribution.

**Questions to Answer:**
- How rapidly is this generation growing?
- Where do they live, and how is gender distributed across rural and urban areas?

### B. Education
**Objective:** To assess educational achievements and challenges.

**Approach:**
- Analyse data from **qc_education_ken.csv** and **education_ken.csv**.
- Explore enrollment rates, completion trends, and gender disparities.
- Identify barriers to access and quality issues.

**Questions to Answer:**
- Are educational outcomes improving over time?
- What factors impede progress?

### C. Economic Participation
**Objective:** To evaluate Gen Z’s integration into the economy.

**Approach:**
- Analyse employment data using **SLEMPTOTLSPZSKEN** (employment-to-population ratio) and **SLUEM1524ZSKEN** (youth unemployment rate).
- Correlate trends with **Real GDP per capita** from **GDP.xls**.

**Questions to Answer:**
- Does economic growth translate to youth employment?
- Are structural barriers contributing to unemployment?

### D. Health and Social Indicators
**Objective:** To examine the health and societal challenges faced by Gen Z.

**Approach:**
- Use **KDHS-tables.xlsx** to investigate health trends and challenges.
- Correlate health outcomes with education and economic factors.

**Questions to Answer:**
- How do health challenges shape their socio-economic potential?
- What interventions are needed?

## 3. Conducting Data-Driven Analysis
Each dataset will be meticulously processed:
- **Data Cleaning**: Address missing values, standardise formats.
- **Visualisation**: Craft descriptive charts (e.g., histograms, line graphs).
- **Statistical Analysis**: Perform regression or correlation analyses to uncover relationships.

## 4. Drawing Conclusions
**Population Trends:** Urbanisation, gender composition, and growth.
**Education Trends:** Gains and gaps in access and quality.
**Economic Integration:** Aligning economic growth with employment.
**Health and Well-being:** Connections between health, education, and economic potential.

## 5. Visual Storytelling
Effective visuals will enhance the narrative:
- **Pie Charts**: Demographic splits.
- **Line Graphs**: Trends in education, employment, and GDP.
- **Heatmaps**: Inter-variable correlations.

## 6. The Final Report
### Introduction:
In a nation balancing tradition and modernity, Gen Z represents a bridge to the future. This report dives into their lives, ambitions, and hurdles.

### Key Findings:
- **Population Dynamics**: Gen Z accounts for a significant portion of Kenya’s population, with unique rural-urban dynamics.
- **Education**: Marked improvements in enrollment but persistent quality gaps.
- **Economic Participation**: Youth unemployment remains a challenge despite GDP growth.
- **Health Indicators**: Alarming trends in youth health require urgent attention.

### Implications:
Understanding Gen Z is vital for policymakers, educators, and businesses. Their future defines Kenya’s trajectory.

### Recommendations:
- **Education Reforms**: Prioritise quality and accessibility.
- **Youth Employment Programs**: Align skills training with market demands.
- **Health Interventions**: Address youth-specific health concerns through targeted programs.

**The data speaks, but it’s up to us to act. Gen Z is not just a generation; they are Kenya’s tomorrow.**




The best approach to explore, clean, and prepare large datasets efficiently is to use ETL (Extract, Transform, Load) pipelines with modular and reusable functions. Here’s a robust framework to tackle this using Python and libraries like pandas, numpy, and pyarrow for data processing.

**Step 1: Set Up the ETL Pipeline**
- Extract: Load the data from multiple sources.
- Transform: Perform cleaning and preprocessing using reusable functions.
- Load: Store the cleaned datasets into a final location for analysis.

**Step 2: Framework for the Pipeline**
1. Import Libraries and Setup
2. Define Helper Functions
Create functions for:

- Loading multiple datasets
- Exploring data
- Cleaning common issues (e.g., missing values, outliers)

3. Pipeline Execution

**Step 3: Implement the Pipeline**

**Step 4: Extend with Advanced Cleaning**
For each dataset, you can add specialised cleaning functions like:

1. Outlier Detection
2. Data Type Validation

**Step 5: Leverage Cloud Tools (Optional)**
- Cloud Pipelines: Use tools like Apache Airflow, Prefect, or AWS Glue for scalable pipelines.
- Distributed Processing: For very large datasets, use Dask or PySpark.


This approach ensures:

- Modular and Reusable Code: Functions for loading, cleaning, and exploring are generic.
- Efficient Workflow: Multiple datasets are processed simultaneously.
- Originality Retained: No unnecessary transformations; cleaning is targeted.


**A. Population and Demographics**

To analyze the demographic profile of Generation Z (born between 1997 and 2012) in Kenya, we'll utilize the datasets: *population_census_2019.xlsx* and *population_by_age_sex_and_educational_attainment_in_2020.csv* 

*Our objectives are to:*

- Quantify Generation Z's proportion of Kenya's total population.
- Examine the rural versus urban distribution.
- Analyze gender distribution across different areas.


*Steps for Analysis:*

1. Load the Data:

- Import necessary libraries.
- Load the datasets into DataFrames.

2. Data Preparation:

- Filter data to include only individuals born between 1997 and 2012.
- Aggregate population counts by age, sex, and area (rural/urban).

3. Analysis:

- Calculate the total population of Generation Z.
- Determine the percentage of Generation Z in the total population.
- Analyze rural versus urban distribution.
- Examine gender distribution within rural and urban areas.

4. Visualization:

- Create bar charts to visualize the distribution by area and gender.

In [23]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [24]:
# Load the population census data
population_census = pd.read_excel("population census 2019.xlsx")

# Display the columns and first few rows to understand the structure
print("Columns and first few rows in population census 2019.xlsx:")
print(population_census.columns)
print(population_census.head())


Columns and first few rows in population census 2019.xlsx:
Index(['ADM0_NAME', 'ADM0_PCODE', 'F_TL', 'M_TL', 'T_TL', 'F_00_04', 'F_05_09',
       'F_10_14', 'F_15_19', 'F_20_24', 'F_25_29', 'F_30_34', 'F_35_39',
       'F_40_44', 'F_45_49', 'F_50_54', 'F_55_59', 'F_60_64', 'F_65_69',
       'F_70_74', 'F_75_79', 'F_80_84', 'F_85_89', 'F_90_94', 'F_95_99',
       'F_100Plus', 'F_Unstated', 'M_00_04', 'M_05_09', 'M_10_14', 'M_15_19',
       'M_20_24', 'M_25_29', 'M_30_34', 'M_35_39', 'M_40_44', 'M_45_49',
       'M_50_54', 'M_55_59', 'M_60_64', 'M_65_69', 'M_70_74', 'M_75_79',
       'M_80_84', 'M_85_89', 'M_90_94', 'M_95_99', 'M_100Plus', 'M_Unstated',
       'T_00_04', 'T_05_09', 'T_10_14', 'T_15_19', 'T_20_24', 'T_25_29',
       'T_30_34', 'T_35_39', 'T_40_44', 'T_45_49', 'T_50_54', 'T_55_59',
       'T_60_64', 'T_65_69', 'T_70_74', 'T_75_79', 'T_80_84', 'T_85_89',
       'T_90_94', 'T_95_99', 'T_100Plus', 'T_Unstated'],
      dtype='object')
  ADM0_NAME ADM0_PCODE      F_TL      M_TL

In [25]:
# Load the education data
education_data = pd.read_csv("education_ken.csv")

# Display the columns and first few rows to understand the structure
print("\nColumns and first few rows in education_ken.csv:")
print(education_data.columns)
print(education_data.head())



Columns and first few rows in education_ken.csv:
Index(['Country Name', 'Country ISO3', 'Year', 'Indicator Name',
       'Indicator Code', 'Value'],
      dtype='object')
    Country Name   Country ISO3        Year  \
0  #country+name  #country+code  #date+year   
1          Kenya            KEN        2010   
2          Kenya            KEN        2005   
3          Kenya            KEN        2000   
4          Kenya            KEN        1995   

                                      Indicator Name       Indicator Code  \
0                                    #indicator+name      #indicator+code   
1  Barro-Lee: Percentage of female population age...  BAR.NOED.1519.FE.ZS   
2  Barro-Lee: Percentage of female population age...  BAR.NOED.1519.FE.ZS   
3  Barro-Lee: Percentage of female population age...  BAR.NOED.1519.FE.ZS   
4  Barro-Lee: Percentage of female population age...  BAR.NOED.1519.FE.ZS   

                  Value  
0  #indicator+value+num  
1                   7.8  
2   

In [26]:
# Load the employment data
employment_data = pd.read_csv("SLEMPTOTLSPZSKEN- Employment to Population Ratio for Kenya, Percent, Annual, Not Seasonally Adjusted - FRED Graph.csv")

# Display the columns and first few rows to understand the structure
print("\nColumns and first few rows in SLEMPTOTLSPZSKEN- Employment to Population Ratio for Kenya, Percent, Annual, Not Seasonally Adjusted - FRED Graph.csv:")
print(employment_data.columns)
print(employment_data.head())



Columns and first few rows in SLEMPTOTLSPZSKEN- Employment to Population Ratio for Kenya, Percent, Annual, Not Seasonally Adjusted - FRED Graph.csv:
Index(['Frequency: Annual', 'Unnamed: 1'], dtype='object')
  Frequency: Annual        Unnamed: 1
0  observation_date  SLEMPTOTLSPZSKEN
1        1991-01-01            71.183
2        1992-01-01            70.938
3        1993-01-01            70.708
4        1994-01-01            70.597


In [27]:
# Load the youth unemployment data
youth_unemployment_data = pd.read_excel("SLUEM1524ZSKEN-Youth Unemployment Rate for Kenya, Percent, Annual, Not Seasonally Adjusted.xlsx")

# Display the columns and first few rows to understand the structure
print("\nColumns and first few rows in SLUEM1524ZSKEN-Youth Unemployment Rate for Kenya, Percent, Annual, Not Seasonally Adjusted.xlsx:")
print(youth_unemployment_data.columns)
print(youth_unemployment_data.head())



Columns and first few rows in SLUEM1524ZSKEN-Youth Unemployment Rate for Kenya, Percent, Annual, Not Seasonally Adjusted.xlsx:
Index(['Frequency: Annual', 'Unnamed: 1'], dtype='object')
     Frequency: Annual      Unnamed: 1
0     observation_date  SLUEM1524ZSKEN
1  1991-01-01 00:00:00           6.287
2  1992-01-01 00:00:00           6.657
3  1993-01-01 00:00:00           6.953
4  1994-01-01 00:00:00           6.946


In [28]:
# Load the population by age, sex, and educational attainment data
population_age_sex_education = pd.read_csv("population_by_age_sex_and_educationa_attainment_in_2020.csv")

# Display the columns and first few rows to understand the structure
print("\nColumns and first few rows in population_by_age_sex_and_educational_attainment_in_2020.csv:")
print(population_age_sex_education.columns)
print(population_age_sex_education.head())



Columns and first few rows in population_by_age_sex_and_educational_attainment_in_2020.csv:
Index(['Age_Bracket', 'No_Education_-_Male', 'Primary_-_Male',
       'Secondary_-_Male', 'Tertiary_-_Male', 'No_Education_-_Female',
       'Primary_-_Female', 'Secondary_-_Female', 'Tertiary_-_Female',
       'OBJECTID'],
      dtype='object')
  Age_Bracket  No_Education_-_Male  Primary_-_Male  Secondary_-_Male  \
0        100+            -0.030017       -0.016887         -0.001247   
1       95-99            -0.382403       -0.285781         -0.026988   
2       90-94            -2.548827       -2.515124         -0.303471   
3       85-89            -9.021490      -11.633843         -1.779124   
4       80-84           -18.094175      -24.007820         -7.596289   

   Tertiary_-_Male  No_Education_-_Female  Primary_-_Female  \
0        -0.000230               0.089101          0.006546   
1        -0.004828               1.180810          0.147924   
2        -0.052578               7.1601