
# ITI Students Analytics — Simulated Dataset (12,000+ records)

This notebook analyzes a **synthetic** dataset representing students at the *Information Technology Institute (Egypt)* across multiple **tracks** (Data Science, Power BI Development, Data Analysis, Data Engineering, Software Testing, Web Application Development, AI Engineering, Cybersecurity, Cloud Engineering).

It includes KPIs like **attendance, certificates, exams, freelancing income, projects, employment status** and more.

> 📦 Data file: `iti_DATASET.csv`  




## Data Dictionary (selected)

- `student_id`: unique integer id
- `full_name`: student full name (Egyptian)
- `gender`: M/F
- `age`: student age (18–40)
- `governorate`: Egyptian governorate
- `admission_cohort`: monthly cohort start date (2022-01 .. 2025-09)
- `track`: learning track
- `attendance_rate`: [0,1] fraction of sessions attended
- `sessions_attended`, `total_sessions`: detailed attendance
- `exams_score`: 0–100
- `certificates_count`: count of finished certificates
- `certificate_providers`: comma-separated providers when present
- `freelancing_income_usd_total`: total income (log-normal, sparse)
- `freelancing_income_usd_monthly_est`: rough monthly estimate (if freelancing)
- `projects_delivered`: project count during/after program
- `employment_status`: Unemployed/Intern/Freelancer/Part-time/Full-time/Further Study
- `internship_company`: if employed
- `mentor_name`: assigned mentor
- `warnings_count`: disciplinary/attendance warnings
- `dropout_flag`: 1 if dropped out
- `graduated_flag`: 1 if graduated
- `graduation_date`: date if graduated



## Business Questions

1. **Placement & Readiness**
   - Q1: Which **tracks** have the **highest placement** (Intern/Part-time/Full-time)?
   - Q2: What's the **relationship** between **attendance** and **exam scores**?
   - Q3: What **attendance threshold** best predicts **graduation**?

2. **Freelancing & Certifications**
   - Q4: Do **certificates** correlate with **freelancing income**?
   - Q5: Which **certificate providers** are common among **high earners**?

3. **Operations & Quality**
   - Q6: Which cohorts/governorates show **higher dropout** or **warnings**?
   - Q7: **Data quality** check: missing values % per column, duplicate students, outliers.

4. **Program Design**
   - Q8: Are **projects_delivered** a stronger predictor of **employment** than exams?
   - Q9: What is the **optimal track mix** for future cohorts to maximize employment?

5. **KPI Dashboards**
   - Q10: Build quick KPIs: **Graduation Rate**, **Employment Rate**, **Avg Exam**, **Avg Attendance**, **Freelancers %**, **Avg Freelance Income** by track.


### Data Quality & Outliers

### Exploratory Data Analysis (EDA)