| Repository | What it does | Stack |
|---|---|---|
| etl-pipeline-testing | PySpark ETL pipeline with schema validation, business rules, and XRAY-format test cases | PySpark, Pytest |
| datalake-regression-tests | Medallion architecture regression harness (Bronze→Silver→Gold) with a DataFrame comparator that detects schema drift | PySpark, Pytest |
| sql-data-validation | 20 SQL data quality checks (nulls, FK integrity, value ranges, business rules) with an HTML report | Python, SQLite |
| anomaly-dashboard | SQLite-backed anomaly tracker with a generated HTML dashboard — a lightweight JIRA/XRAY alternative | Python, SQLite |
| retail-sales-analysis | ETL + KPI analysis on retail sales data, segmented by category, region, age, gender | Pandas, Matplotlib |
| sales-forecasting | Monthly revenue forecasting with ARIMA — stationarity testing, seasonal decomposition, AIC order selection | Statsmodels, Pandas |
| customer-segmentation-rfm | RFM scoring + K-Means clustering on 400 customers — elbow method, silhouette, segment revenue breakdown | Scikit-learn, Pandas |
| customer-churn-prediction | Churn prediction with logistic regression — feature encoding, evaluation, feature importance | Scikit-learn, Pandas |
Languages: Python, SQL
Data: Pandas, NumPy, PySpark
ML: Scikit-learn, Statsmodels
BI: Power BI, DAX, Power Query
Testing: Pytest, pytest-cov
Databases: PostgreSQL, MySQL, SQLite
- Orange Maroc (Feb–Aug 2025) — built 15+ Power BI dashboards for procurement KPIs and supplier performance; automated Excel/Power Query reporting pipelines
- EMSI — Engineering degree in Computer Science & Networks (MIAGE), specialized in Data Engineering and BI
- Self-studying ML/DL through Scikit-learn, Keras, and hands-on projects