# Part 5 – Reflection

---

## 1. What assumptions in my solution are weakest?

### "All deals have reached a terminal state" — HIGH
The dataset contains only Won/Lost outcomes. In reality, a significant portion of pipeline is *open* — deals that haven't resolved yet. Analyzing only closed deals introduces survivorship bias: recent quarters would appear worse because their still-winnable deals aren't counted. In production, I'd need a separate analysis track for open pipeline with time-based risk scoring.

### "Structured CRM fields are sufficient to diagnose win rate drivers" — HIGH
Part 3 showed that deal characteristics barely predict outcomes (~55% accuracy, AUC 0.48). Even RSFS only lifts this to ~57%. The most predictive signals in real B2B sales — qualitative deal notes, email/calendar engagement, competitive intel, multi-threading — aren't in this dataset. The implication is that the biggest ROI comes from *capturing better data*, not from better models on existing data.

### "Co-lead partnerships will improve outcomes" — HIGH
The co-lead system assumes pairing a low-RSFS rep with a high-RSFS co-lead will lift win probability by ~23pp. But the evidence is correlational — high-RSFS reps win more in their strong industries when working deals solo. We can't prove that advisory involvement produces the same lift. The $1.1M revenue estimate is an upper bound; I'd validate with a controlled experiment before scaling.

### "Historical patterns will persist" — MEDIUM
RSFS and SMI are backward-looking. If the market shifts (new competitor, economic change), historical patterns become unreliable. The weekly retrain cadence (Part 4) helps detect drift, but any historical model fundamentally lags reality.

### "25 sales reps are comparable" — MEDIUM
RSFS treats all reps as having equally meaningful track records. Without tenure and territory data, it captures correlation (which reps win where) but can't distinguish skill from circumstance.

---

## 2. What would break in real-world production?

### Data quality
Real CRM data is messy — fields are optional, inconsistently filled, and sometimes gamed. In production, I'd need data quality scores per record, imputation strategies, and anomaly detection on input data itself.

### RSFS circularity at scale
Currently RSFS includes the deal being scored in its own calculation. With ~40 deals per (rep, industry) combo, the impact is ~2.5% — small. But for a new rep with only 5 deals, circularity dominates. In production, I'd compute RSFS from prior-period data only and enforce minimum sample thresholds with graceful fallbacks.

### Co-lead system adoption
The biggest operational risk. Reps may view co-leads as a threat rather than support, requiring careful framing as "strategic advisor" with shared comp credit. Attribution ambiguity (who gets credit for co-led wins?) could undermine adoption if comp structures aren't adjusted. And if more deals get flagged RED after a market shift, co-lead workload could spike past sustainable levels without dynamic caps.

### Seasonality and external context
The Q1 2024 decline might partly reflect normal seasonality (Q1 slowdown after Q4 budget flush) rather than a structural problem. Without external context — competitor moves, pricing changes, reorgs — the system risks flagging normal variation as anomalies.

---

## 3. What would I build next if given 1 month?

### Week 1-2: Enrich the data model
- Integrate CRM activity data (emails sent, meetings held, calls logged per deal)
- Build engagement scoring: are deals getting *activity* or sitting idle?
- This would likely be the highest-ROI improvement — Part 3 showed structured fields have near-zero predictive power, so the signal gap is in unstructured engagement data

### Week 2-3: Implement the co-lead matching system
Part 4 designed the Co-Lead Matching Engine; this week is about building it:
- Webhook-triggered RSFS computation on deal creation/assignment
- Auto-assign co-leads via top-3 round-robin with workload constraints
- Simple UI (Slack bot or browser extension) showing RSFS and co-lead status
- Run as a **controlled rollout** — co-leads on half of RED deals only, to measure actual lift vs. the ~23pp historical estimate

### Week 3-4: Build the adaptive alert system
- Implement alert engine from Part 4 with configurable thresholds
- Add feedback mechanism: CRO marks alerts "useful" / "not useful" to tune sensitivity
- Ship weekly digest email and monthly rep audit report

### If there's time: Natural language insight generation
Use an LLM to convert metric outputs into plain-English summaries for non-technical stakeholders — this is where LLMs add genuine value, not in the analysis itself, but in making the output accessible.

---

## 4. What part of my solution am I least confident about?

### The magnitude of co-lead impact
I'm confident in the direction (better rep-industry matching helps) but not the magnitude. The ~23pp lift is extrapolated from solo performance data — we've never observed co-lead dynamics. The actual lift could range from near-zero (co-leads don't engage deeply enough) to above estimate (knowledge transfer accelerates development). Until a controlled experiment runs, the revenue figure is a hypothesis.

### Whether CRM-only models can drive decisions
A model at ~57% accuracy and AUC 0.59 won't change how decisions are made on its own. I chose logistic regression deliberately for interpretability, but the honest assessment is that individual deal prediction with these features is barely better than a coin flip. The value is in pattern-level insights (50pp rep-industry spread, 11pp RED vs. GREEN gap) and the diagnostic reframing — weak accuracy points to where data investment should go, which is itself the most actionable finding.

### RSFS in small samples
With ~20-30 deals per (rep, industry) combination, quarter-to-quarter rates can swing 15-20pp from sampling noise. The current point estimates work for a proof of concept, but production would need Bayesian smoothing and confidence intervals.

**A note on this dataset:** The data appears synthetically generated with relatively uniform distributions. In a real engagement, I'd validate every insight with the sales team — the patterns above may be more or less pronounced with real-world data.

---

## What I learned

The value of a data science system in sales intelligence is not in model accuracy — it's in the framework for asking the right questions. The CRO doesn't need an AUC of 0.95. They need:

1. A decomposition framework ("where exactly is the problem?")
2. A prioritization framework ("what should I focus on first?")
3. A monitoring framework ("how will I know if it's getting better or worse?")

The custom metrics, driver analysis, and alert system design serve these three purposes — even when the underlying model isn't highly accurate. That's the difference between data science as "building models" and data science as "building decision systems."