#Understanding PMKVY KPIs: From Participation to Outcomes


PMKVY training data follows a pipeline, not a flat process. Each KPI represents a stage in that pipeline. Reading them independently can be misleading. Reading them together tells the real story.

1. Total Enrolled

This is the entry point of the system.
It reflects mobilization success, outreach efficiency, and demand generation.

High enrolment alone does not indicate program effectiveness. It only shows how many candidates entered the system.

2. Total Trained

This indicates candidates who actually underwent training.

The gap between Enrolled and Trained captures early-stage drop-offs due to:

Candidate attrition

Batch cancellations

Administrative or attendance issues

3. Total Assessed

Assessment is the first quality gate.

A lower Assessed count compared to Trained often signals:

Delays in assessment scheduling

Non-eligibility for assessment

Training completion issues

This metric reflects operational coordination between training providers and assessment agencies.

4. Total Certified

Certification represents formal recognition of skill competency.

The drop from Assessed to Certified is critical.
It reflects:

Assessment outcomes

Skill quality

Alignment between training content and assessment standards

This is where program quality becomes visible.

5. Total Placed (Reported)

Placement is the outcome metric, but also the most sensitive one.

Reported placements may vary due to:

Self-employment not being captured consistently Informal employment Reporting delays

Placement numbers should always be interpreted in relation to Certified, not Enrolled.

Conversion Rates (Why They Matter More Than Absolute Numbers)

Training Completion Rate (TCR)

TCR = Trained / Enrolled

Measures how effectively enrolled candidates are converted into trained candidates.

Low TCR indicates early-stage leakage.

Assessment Conversion Rate (ACR)

ACR = Assessed / Trained

Shows how smoothly training transitions into assessment.

Operational inefficiencies surface here.

Certification Success Rate (CSR)

CSR = Certified / Assessed

This is a quality indicator.

A low CSR often reflects:

Poor training quality

Assessment‚Äìtraining mismatch

Enrolment to Certification Rate (ECR)

ECR = Certified / Enrolled

This is a system efficiency metric.

It answers: Out of everyone who entered the system, how many successfully emerged certified?

Placement Rate

Placement Rate = Placed / Certified

This must never be calculated on Enrolment.
Placement depends on certification, not registration.

This metric reflects market linkage, not mobilization success.

Key Insight

High enrolment with low certification or placement is not success.
Strong certification and placement with moderate enrolment often reflects better program quality.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("whitegrid")

In [None]:
df = pd.read_csv("PMKVY-210422.csv")
df.head()

In [None]:
df.shape
df.columns

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display, Markdown

In [None]:
df.fillna(0, inplace=True)

In [None]:
import plotly.graph_objects as go

# ---- Aggregate funnel values ----
enrolled  = df["Enrolled"].sum()
trained   = df["Trained"].sum()
assessed  = df["Assessed"].sum()
certified = df["Certified"].sum()
placed    = df["Reported Placed"].sum()

# ---- Funnel stages ----
stages = [
    "Enrolled",
    "Trained",
    "Assessed",
    "Certified",
    "Placed"
]

values = [
    enrolled,
    trained,
    assessed,
    certified,
    placed
]

# ---- Create funnel ----
fig = go.Figure(go.Funnel(
    y=stages,
    x=values,
    textinfo="value+percent initial",
    hovertemplate="<b>%{y}</b><br>Count: %{x}<extra></extra>"
))

fig.update_layout(
    title="PMKVY Training Funnel: Candidate Flow",
    template="plotly_white",
    height=450
)

fig.show()

In [None]:
import plotly.graph_objects as go
# ---- Conversion rates ----
TCR = (trained / enrolled * 100) if enrolled else 0
ACR = (assessed / trained * 100) if trained else 0
CSR = (certified / assessed * 100) if assessed else 0
ECR = (certified / enrolled * 100) if enrolled else 0
PR  = (placed / certified * 100) if certified else 0

# ---- Funnel labels & values ----
stages = [
    "Training Completion Rate (TCR)",
    "Assessment Conversion Rate (ACR)",
    "Certification Success Rate (CSR)",
    "Enrolment ‚Üí Certification Rate (ECR)",
    "Placement Rate"
]

values = [TCR, ACR, CSR, ECR, PR]

# ---- Funnel chart ----
fig = go.Figure(go.Funnel(
    y=stages,
    x=values,
    textinfo="value",
    texttemplate="%{x:.1f}%",
    hovertemplate="<b>%{y}</b><br>Rate: %{x:.2f}%<extra></extra>"
))

fig.update_layout(
    title="PMKVY Conversion Funnel: Efficiency Across Stages",
    xaxis_title="Percentage (%)",
    template="plotly_white",
    height=450
)

fig.show()

###Interpretation of Conversion Funnel

This funnel shows efficiency at each stage, not volume.

A high TCR indicates effective mobilization-to-training conversion

A weak ACR signals assessment bottlenecks or eligibility issues

CSR reflects training quality and alignment with assessment standards

ECR captures overall system efficiency

Placement Rate reflects market linkage and employability, not outreach

Even when enrolment is high, low downstream rates indicate where corrective action is required.

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# ---------- COMPUTE STATE-LEVEL RATES ----------
state_summary = []

for state in df["TCState"].unique():
    d = df[df["TCState"] == state]

    enrolled  = d["Enrolled"].sum()
    trained   = d["Trained"].sum()
    assessed  = d["Assessed"].sum()
    certified = d["Certified"].sum()
    placed    = d["Reported Placed"].sum()

    if enrolled == 0:
        continue

    TCR = (trained / enrolled * 100) if enrolled else 0
    ACR = (assessed / trained * 100) if trained else 0
    CSR = (certified / assessed * 100) if assessed else 0
    ECR = (certified / enrolled * 100) if enrolled else 0
    PR  = (placed / certified * 100) if certified else 0

    state_summary.append({
        "State": state,
        "TCR": TCR,
        "ACR": ACR,
        "CSR": CSR,
        "ECR": ECR,
        "PR": PR
    })

state_df = pd.DataFrame(state_summary)

# ---------- IDENTIFY BEST & WORST ----------
best_state  = state_df.loc[state_df["ECR"].idxmax()]
worst_state = state_df.loc[state_df["ECR"].idxmin()]

stages = [
    "Training Completion Rate (TCR)",
    "Assessment Conversion Rate (ACR)",
    "Certification Success Rate (CSR)",
    "Enrolment ‚Üí Certification Rate (ECR)",
    "Placement Rate"
]

# ---------- CREATE SIDE-BY-SIDE FUNNELS ----------
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=[
        f"Best Performing State: {best_state['State']}",
        f"Worst Performing State: {worst_state['State']}"
    ],
    specs=[[{"type": "funnel"}, {"type": "funnel"}]]
)

fig.add_trace(
    go.Funnel(
        y=stages,
        x=[best_state["TCR"], best_state["ACR"], best_state["CSR"], best_state["ECR"], best_state["PR"]],
        texttemplate="%{x:.1f}%",
        hovertemplate="<b>%{y}</b><br>%{x:.2f}%<extra></extra>"
    ),
    row=1, col=1
)

fig.add_trace(
    go.Funnel(
        y=stages,
        x=[worst_state["TCR"], worst_state["ACR"], worst_state["CSR"], worst_state["ECR"], worst_state["PR"]],
        texttemplate="%{x:.1f}%",
        hovertemplate="<b>%{y}</b><br>%{x:.2f}%<extra></extra>"
    ),
    row=1, col=2
)

fig.update_layout(
    title="PMKVY Conversion Funnel: Best vs Worst Performing States (by ECR)",
    template="plotly_white",
    height=500
)

fig.show()

###Best vs Worst State Comparison ‚Äì Key Insight

This comparison isolates system efficiency, not scale.

The best-performing state maintains relatively stable conversion across stages

The worst-performing state shows sharp drop-offs, indicating structural issues

Differences often emerge at assessment and certification stages rather than enrolment

In [None]:
# ---------- IDENTIFY WEAKEST STAGE PER STATE ----------

weak_stage_summary = []

for state in df["TCState"].unique():
    d = df[df["TCState"] == state]

    enrolled  = d["Enrolled"].sum()
    trained   = d["Trained"].sum()
    assessed  = d["Assessed"].sum()
    certified = d["Certified"].sum()
    placed    = d["Reported Placed"].sum()

    if enrolled == 0:
        continue

    rates = {
        "TCR": (trained / enrolled * 100) if enrolled else 0,
        "ACR": (assessed / trained * 100) if trained else 0,
        "CSR": (certified / assessed * 100) if assessed else 0,
        "ECR": (certified / enrolled * 100) if enrolled else 0,
        "Placement Rate": (placed / certified * 100) if certified else 0
    }

    weakest_stage = min(rates, key=rates.get)

    weak_stage_summary.append({
        "State": state,
        "Weakest Stage": weakest_stage,
        "Weakest Rate (%)": round(rates[weakest_stage], 2)
    })

weak_stage_df = pd.DataFrame(weak_stage_summary)
weak_stage_df.head()

This table identifies the single weakest conversion stage for each state.
Instead of reacting to low placement or certification numbers, this approach helps target root causes in the training pipeline.

In [None]:
# ---------- POLICY RECOMMENDATION ENGINE ----------

def policy_recommendation(weak_stage):
    recommendations = {
        "TCR": "Strengthen candidate mobilization screening, attendance monitoring, and early-stage retention mechanisms.",
        "ACR": "Improve coordination between training providers and assessment agencies; reduce assessment scheduling delays.",
        "CSR": "Review training quality, trainer certification, and alignment with assessment standards.",
        "ECR": "Conduct end-to-end process audits to reduce cumulative leakage across stages.",
        "Placement Rate": "Strengthen industry linkage, post-certification support, and tracking of self-employment outcomes."
    }
    return recommendations.get(weak_stage, "Review program implementation holistically.")

# Example for one state
example_state = weak_stage_df.iloc[0]

print(f"State: {example_state['State']}")
print(f"Weakest Stage: {example_state['Weakest Stage']}")
print(f"Recommendation: {policy_recommendation(example_state['Weakest Stage'])}")

###Policy Lens
The weakest stage indicates where targeted intervention will yield the highest marginal improvement.
Addressing downstream outcomes without fixing upstream leakages is unlikely to improve overall program effectiveness.

In [None]:
# ---------- TOP 5 vs BOTTOM 5 STATES (BY ECR) ----------

top5 = state_df.sort_values("ECR", ascending=False).head(5)
bottom5 = state_df.sort_values("ECR").head(5)

comparison_df = pd.concat([
    top5.assign(Group="Top 5 States"),
    bottom5.assign(Group="Bottom 5 States")
])

comparison_df

In [None]:
import plotly.express as px

fig = px.box(
    comparison_df,
    x="Group",
    y="ECR",
    points="all",
    title="PMKVY System Efficiency: Top 5 vs Bottom 5 States (ECR)",
    labels={"ECR": "Enrolment ‚Üí Certification Rate (%)"}
)

fig.update_layout(template="plotly_white", height=450)
fig.show()

This comparison shows that performance differences are systemic, not marginal.
Top-performing states consistently retain candidates across stages, while bottom-performing states experience compounding leakage.

##Hypothesis Test

Is Certification Success Associated with Placement Outcome?

In PMKVY, certification is often treated as a success milestone.
But from a policy and program perspective, the real question is:

Does higher certification success actually translate into better placement outcomes?

If certification and placement are weakly related, then improving certification alone will not improve employment outcomes.

üß† Hypothesis Statement

Null Hypothesis (H‚ÇÄ):
Certification Success Rate (CSR) and Placement Rate are independent at the state level.

Alternative Hypothesis (H‚ÇÅ):
States with higher Certification Success Rates tend to have higher Placement Rates.

üìê Why Spearman Correlation?

State-level rates are not guaranteed to be normally distributed

Relationship may be monotonic but not linear

Spearman correlation is robust and appropriate

In [None]:
state_perf = []

for state in df["TCState"].unique():
    d = df[df["TCState"] == state]

    assessed  = d["Assessed"].sum()
    certified = d["Certified"].sum()
    placed    = d["Reported Placed"].sum()

    if assessed > 0 and certified > 0:
        CSR = certified / assessed
        PR  = placed / certified
        state_perf.append([CSR, PR])

state_perf = pd.DataFrame(state_perf, columns=["CSR", "PlacementRate"])

In [None]:
from scipy import stats

In [None]:
state_perf = []

for state in df["TCState"].unique():
    d = df[df["TCState"] == state]

    assessed  = d["Assessed"].sum()
    certified = d["Certified"].sum()
    placed    = d["Reported Placed"].sum()

    if assessed > 0 and certified > 0:
        CSR = certified / assessed
        PR  = placed / certified

        state_perf.append({
            "State": state,
            "CSR": CSR,
            "Placement_Rate": PR
        })

state_perf_df = pd.DataFrame(state_perf)
state_perf_df.head()

In [None]:
corr, p_value = stats.spearmanr(
    state_perf_df["CSR"],
    state_perf_df["Placement_Rate"]
)

print(f"Spearman Correlation: {corr:.3f}")
print(f"p-value: {p_value:.4f}")

In [None]:
import plotly.express as px

fig = px.scatter(
    state_perf_df,
    x="CSR",
    y="Placement_Rate",
    hover_name="State",
    trendline="ols",
    title="Relationship Between Certification Success and Placement Outcomes",
    labels={
        "CSR": "Certification Success Rate",
        "Placement_Rate": "Placement Rate"
    }
)

fig.update_layout(
    template="plotly_white",
    height=450
)

fig.show()

Each point is a state

Right-upward trend indicates positive association

Scatter shows variation beyond certification alone

This reinforces that certification is necessary but not sufficient.

üìå Interpretation of Results

The Spearman correlation between Certification Success Rate (CSR) and Placement Rate is ‚Äì0.079, with a p-value of 0.6458.

This indicates no statistically significant association between certification success and placement outcomes at the state level. The correlation is weak and close to zero, and the high p-value suggests that any observed relationship is likely due to random variation rather than a systematic pattern.

In practical terms, states with higher certification success do not consistently achieve higher placement rates.

üéØ What This Means for PMKVY Performance

This result highlights an important programmatic insight:

Certification is a necessary credential, but it is not sufficient to guarantee employment outcomes.

Placement performance is influenced by additional factors beyond training quality, such as:

Strength of industry linkages

Local labor market conditions

Effectiveness of placement tracking

Self-employment and informal employment not fully captured in reporting

Improving certification rates alone is therefore unlikely to yield proportional gains in placement.

üèõÔ∏è Policy & Program Impact

Program design should decouple certification targets from placement expectations.

States with strong certification outcomes but weak placement require targeted market linkage interventions, not further training reforms.

Placement strategies should focus on:

Employer engagement

Post-certification support

Improved tracking of informal and self-employment outcomes

This finding supports a shift from output-driven metrics to outcome-driven program design.

##Are conversion rates different across stages?

(Where is the system weakest overall?)



Are differences between TCR, ACR, ECR , PR and CSR statistically significant?

Hypotheses

H‚ÇÄ: Mean conversion rates across stages are equal

H‚ÇÅ: At least one stage has a significantly lower rate

In [None]:
from scipy.stats import friedmanchisquare

stage_rates = []

for state in df["TCState"].unique():
    d = df[df["TCState"] == state]

    enrolled = d["Enrolled"].sum()
    trained  = d["Trained"].sum()
    assessed = d["Assessed"].sum()
    certified = d["Certified"].sum()

    if enrolled > 0 and trained > 0 and assessed > 0:
        stage_rates.append([
            trained / enrolled,        # TCR
            assessed / trained,        # ACR
            certified / assessed       # CSR
        ])

stage_df = pd.DataFrame(stage_rates, columns=["TCR", "ACR", "CSR"])

stat, p_value = friedmanchisquare(
    stage_df["TCR"],
    stage_df["ACR"],
    stage_df["CSR"]
)

print("Friedman p-value:", p_value)

üìå Interpretation of Friedman Test Result

The Friedman test yields a p-value of 4.34 √ó 10‚Åª¬π¬π, which is far below the conventional significance threshold of 0.05.

This provides strong statistical evidence to reject the null hypothesis that conversion rates across stages (TCR, ACR, and CSR) are equal. In other words, the observed differences between stages in the PMKVY training funnel are systematic and not due to random variation.

üéØ What This Means for the PMKVY Training Pipeline

The result confirms that candidate drop-offs are stage-specific, not uniform across the system.

Some stages consistently perform worse than others across states

Leakage points are structurally embedded in the process

Treating the training pipeline as a single process masks critical inefficiencies

This validates the need for targeted interventions rather than blanket program reforms.

üèõÔ∏è Programmatic Implications

Interventions should be designed stage-wise, not end-to-end only

Resources should be focused on stages with the lowest conversion rates

Monitoring frameworks should track stage-level performance, not just overall outcomes

Ignoring these differences risks reinforcing the weakest parts of the pipeline.

#Are placements unusually volatile across states?

(Consistency vs unpredictability)

Is placement performance more volatile than certification performance?

In [81]:
state_variability = []

for state in df["TCState"].unique():
    d = df[df["TCState"] == state]

    assessed  = d["Assessed"].sum()
    certified = d["Certified"].sum()
    placed    = d["Reported Placed"].sum()

    if assessed > 0 and certified > 0:
        CSR = certified / assessed
        PR  = placed / certified

        state_variability.append([CSR, PR])

var_df = pd.DataFrame(state_variability, columns=["CSR", "PlacementRate"])

print("CSR Std Dev:", round(var_df["CSR"].std(), 3))
print("Placement Rate Std Dev:", round(var_df["PlacementRate"].std(), 3))

CSR Std Dev: 0.057
Placement Rate Std Dev: 0.121


üìå Interpretation of Volatility Results

The standard deviation of the Certification Success Rate (CSR) across states is 0.057, while the standard deviation of the Placement Rate is 0.121.

This indicates that placement performance is more than twice as volatile as certification performance across states.

In statistical terms, certification outcomes are relatively stable and consistent, whereas placement outcomes vary widely from state to state.

üéØ What This Means for PMKVY Outcomes

Certification performance reflects internal program processes such as training quality, assessment alignment, and institutional controls, which tend to be standardized across states.

Placement performance, by contrast, is influenced by external and heterogeneous factors, including:

Local labor market conditions

Industry presence and demand

Strength of employer linkages

Effectiveness of placement tracking

Informal and self-employment dynamics

As a result, placement outcomes are inherently less predictable and more uneven.

üèõÔ∏è Programmatic Insight

This finding explains why:

Improvements in certification do not automatically translate into uniform placement gains

A one-size-fits-all placement strategy is unlikely to succeed

States require context-specific employment interventions

Monitoring placement performance using national benchmarks alone may therefore be misleading.

üß† Analytical Takeaway

The higher volatility in placement rates confirms that placement is not merely the final stage of the training pipeline, but a distinct outcome shaped by market forces and institutional linkages.

This reinforces the need to:

Evaluate placement separately from training performance

Design differentiated placement strategies by state or region

Interpret placement figures with caution, especially in cross-state comparisons

In [82]:
# ---- Compute mean & std ----
csr_mean = var_df["CSR"].mean()
csr_std  = var_df["CSR"].std()

pr_mean = var_df["PlacementRate"].mean()
pr_std  = var_df["PlacementRate"].std()

# ---- Coefficient of Variation ----
csr_cv = csr_std / csr_mean
pr_cv  = pr_std / pr_mean

print(f"CSR CV: {csr_cv:.2f}")
print(f"Placement Rate CV: {pr_cv:.2f}")

CSR CV: 0.06
Placement Rate CV: 0.48


üìå Interpretation of Coefficient of Variation Results

The Coefficient of Variation (CV) for Certification Success Rate (CSR) is 0.06, while the CV for Placement Rate is 0.48.

This means that placement outcomes are eight times more volatile than certification outcomes across states.

In practical terms, certification performance is highly stable and predictable, whereas placement performance varies widely from state to state.

üéØ What This Reveals About the PMKVY System

Low CSR volatility (CV = 0.06) indicates that training quality and assessment processes are relatively standardized and consistently implemented nationwide.

High placement volatility (CV = 0.48) reflects strong dependence on external factors such as:

Local labor market conditions

Industry presence and demand

Strength of placement and employer linkage mechanisms

Variations in reporting and tracking of employment outcomes

This confirms that placement is not simply the final step of the training pipeline, but a separate outcome domain influenced by market dynamics.

üèõÔ∏è Monitoring & Policy Implications

Placement should not be monitored using uniform national benchmarks.
High volatility implies that a single target masks regional and structural disparities.

Certification can be monitored through centralized quality controls, but placement requires context-sensitive strategies.

States with acceptable certification performance but volatile placement outcomes should be prioritized for:

Industry engagement interventions

Strengthened post-certification support

Improved placement tracking mechanisms

üß† Strategic Insight

Low variability in certification reflects program control.
High variability in placement reflects market exposure.

#Does Volatility Differ by Region?
Why this matters

If volatility clusters by region, then regional labor markets, not training systems, are driving outcomes.

In [83]:
region_map = {
    "Jammu and Kashmir": "North",
    "Punjab": "North",
    "Haryana": "North",
    "Delhi": "North",
    "Uttar Pradesh": "North",
    "Rajasthan": "North",

    "Maharashtra": "West",
    "Gujarat": "West",
    "Goa": "West",

    "Tamil Nadu": "South",
    "Karnataka": "South",
    "Kerala": "South",
    "Andhra Pradesh": "South",
    "Telangana": "South",

    "West Bengal": "East",
    "Odisha": "East",
    "Bihar": "East",
    "Jharkhand": "East",

    "Assam": "North East",
    "Meghalaya": "North East",
    "Manipur": "North East"
}

state_perf_df["Region"] = state_perf_df["State"].map(region_map)

In [84]:
region_volatility = state_perf_df.groupby("Region")["Placement_Rate"].agg(["mean", "std"])
region_volatility["CV"] = region_volatility["std"] / region_volatility["mean"]

region_volatility.sort_values("CV", ascending=False)

Unnamed: 0_level_0,mean,std,CV
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
West,0.140274,0.070918,0.505565
North East,0.216006,0.099929,0.462621
South,0.254167,0.102283,0.402425
East,0.20997,0.071149,0.338855
North,0.277442,0.08801,0.317218


üìå Interpretation: Regional Volatility in Placement Outcomes

The coefficient of variation (CV) highlights substantial regional differences in the stability of placement outcomes under PMKVY.

The West region shows the highest volatility in placement outcomes (CV ‚âà 0.51), despite a relatively low mean placement rate. This indicates highly inconsistent employment outcomes across states within the region.

The North East also exhibits very high volatility (CV ‚âà 0.46), suggesting uneven labor absorption capacity and fragile market linkages.

The South, while having a comparatively higher average placement rate, still shows moderate volatility (CV ‚âà 0.40), implying that strong outcomes are not uniformly distributed across states.

The East and North regions demonstrate lower volatility (CV ‚âà 0.34 and 0.32, respectively), indicating relatively more predictable placement outcomes, even if average performance differs.

Overall, regions with higher average placement rates do not necessarily exhibit lower volatility, reinforcing that average performance and stability are distinct dimensions of program success.

üéØ Key Analytical Insight

Placement outcomes under PMKVY are not uniformly unstable nationwide.
Instead, volatility is region-specific, reflecting differences in:

Regional labor market structure

Industry concentration and demand

Migration patterns

Effectiveness of state-level placement and employer engagement mechanisms

This confirms that placement performance is shaped more by regional market dynamics than by training system quality alone.

üèõÔ∏è Monitoring & Program Implications
1. Avoid Uniform National Placement Benchmarks

High regional volatility implies that national averages mask meaningful regional risks. Placement targets should be region-adjusted, not centrally imposed.

2. Treat High-Volatility Regions as ‚ÄúRisk Zones‚Äù

Regions such as the West and North East should be flagged for:

Deeper labor market diagnostics

Strengthened employer partnerships

Improved post-certification support mechanisms

3. Use Stability as a Monitoring Signal

Regions with lower CVs (North, East) demonstrate more predictable outcomes and can serve as operational benchmarks, even if their mean placement rates are moderate.

4. Separate Performance from Predictability

Monitoring frameworks should track:

Mean placement rate (how good outcomes are)

Coefficient of variation (how reliable outcomes are)

Both dimensions are necessary for evidence-based decision-making.

üß† Strategic Takeaway (one line)

Placement under PMKVY is not just uneven‚Äîit is regionally unstable, and volatility itself is a critical signal for where monitoring and intervention should be prioritized.

In [85]:
training_volatility = []

for ttype in df["TrainingType"].unique():
    d = df[df["TrainingType"] == ttype]

    assessed  = d["Assessed"].sum()
    certified = d["Certified"].sum()
    placed    = d["Reported Placed"].sum()

    if assessed > 0 and certified > 0:
        CSR = certified / assessed
        PR  = placed / certified

        training_volatility.append({
            "TrainingType": ttype,
            "CSR": CSR,
            "PlacementRate": PR
        })

tv_df = pd.DataFrame(training_volatility)

tv_summary = tv_df[["CSR", "PlacementRate"]].agg(["mean", "std"])
tv_summary.loc["CV"] = tv_summary.loc["std"] / tv_summary.loc["mean"]

tv_summary

Unnamed: 0,CSR,PlacementRate
mean,0.808548,0.345013
std,0.18879,0.298817
CV,0.233492,0.866102


üìå Interpretation: Volatility by Training Type

The coefficient of variation (CV) for Certification Success Rate (CSR) across training types is 0.23, while the CV for Placement Rate is 0.87.

This indicates that placement outcomes across training types are nearly four times more volatile than certification outcomes.

In contrast, certification performance is relatively stable across training categories, suggesting that training delivery and assessment processes are broadly standardized regardless of training type.

üéØ What This Reveals About Training-Type Performance

Low CSR volatility (CV = 0.23) implies that most training types achieve comparable certification outcomes once candidates reach the assessment stage.

Very high placement volatility (CV = 0.87) shows that employability varies dramatically by training type, reflecting differences in:

Market demand for specific skills

Industry absorption capacity

Alignment of training content with local employment opportunities

Viability of self-employment pathways

This confirms that not all training types are equally employable, even if certification outcomes appear strong.

üèõÔ∏è Program & Monitoring Implications

Certification should not be used as a proxy for employability
High certification rates across training types can mask substantial variation in labor market outcomes.

Training types require differentiated placement strategies
High-volatility training categories should be reviewed for:

Demand saturation

Outdated occupational standards

Weak employer linkages

Introduce demand-weighted monitoring
Monitoring frameworks should evaluate training types on:

Certification stability

Placement volatility

Market relevance indicators

Scale with caution
Training types with high placement volatility should not be scaled uniformly without demand validation.

üß† Strategic Insight

Certification reflects the supply-side efficiency of training.
Placement volatility exposes demand-side uncertainty.