# Lesson 16: Final Capstone Project
---
**Choose one real dataset and business problem:**
* Retail Sales Dashboard
* HR Attrition Analysis
* Marketing Funnel Performance
* Financial KPIs Tracker
* Survey Sentiment and NPS Analysis

**Deliverables:**
* Cleaned dataset
* Data transformation notebook
* 3–5 meaningful visualizations
* Insightful summary (Markdown / PowerPoint)
* GitHub repo or downloadable PDF

### Purpose

Build and deliver a real-world data project from start to finish, using everything you’ve learned:

- Python foundations
- Data cleaning
- Pandas, NumPy
- Visualization (Matplotlib, Seaborn)
- (Optional) Predictive modeling

### Project Options

Choose one dataset and business question:
| Project                       | Dataset Type           | Goal                                      |
| ----------------------------- | ---------------------- | ----------------------------------------- |
| **Retail Sales Dashboard** | Product sales (CSV)    | Analyze revenue, seasonality, products    |
| **HR Attrition Analysis**  | Employee data          | Who is leaving and why?                   |
| **Marketing Funnel KPIs**  | Campaign tracking      | Track conversions, drop-off points        |
| **Financial KPI Tracker**  | Revenue & cost trends  | Visualize and forecast KPIs               |
| **Survey NPS + Sentiment** | Survey + open feedback | Clean text + analyze satisfaction metrics |

### Final Deliverables

| Component           | Required? | Format                           |
| ------------------- | --------- | -------------------------------- |
| Cleaned Dataset  |         | CSV or processing code           |
| Data Notebook    |         | `.ipynb` (Jupyter/Colab)         |
| Visuals          |         | Seaborn / Matplotlib charts      |
| Insight Summary  |         | Markdown or Slides               |
| Predictive Model | Optional  | Linear regression (scikit-learn) |
| Submission       |         | GitHub repo or ZIP folder        |

### Workflow Checklist

### STEP 1: Load & Explore

- [ ] Read dataset with `pd.read_csv()`
- [ ] Run `.info()` and `.describe()`
- [ ] Check nulls, dtypes, duplicates

### STEP 2: Clean Data

- [ ] Handle missing values (drop or fill)
- [ ] Rename columns
- [ ] Convert dates
- [ ] Create new columns if needed

### STEP 3: Analyze & Transform

- [ ] Aggregations: `.groupby()`, `.agg()`
- [ ] Feature engineering: `.apply()`, `.map()`
- [ ] Sort, filter, slice

### STEP 4: Visualize

- [ ] Distribution plots (hist, box, violin)
- [ ] Relationships (scatter, lmplot)
- [ ] Group comparisons (bar, line, heatmap)
- [ ] Time trends (resample, rolling)

### STEP 5 (Optional): Predict

- [ ] Split into X, y
- [ ] Train/test split
- [ ] LinearRegression fit + predict
- [ ] Evaluate with R², MAE

### STEP 6: Communicate

- [ ] Write insights in Markdown or PPT
- [ ] Add titles to all charts
- [ ] Answer business question(s)

### Example Visuals to Include

| Chart Type  | Shows                                     |
| ----------- | ----------------------------------------- |
| Line Plot   | Trends over time                          |
| Bar Plot    | Group comparisons (e.g., region, segment) |
| Boxplot     | Distribution across categories            |
| Heatmap     | Correlation matrix                        |
| Scatterplot | Relationship between 2 variables          |

### Insight Summary Template

You can deliver insights as:

- Markdown cell summary (in your .ipynb)
- PowerPoint slides
- README in your GitHub repo
- Sample structure: