Fares KnightBytePy

Hi 👋, I'm Fares

Data Scientist Master's @ McGill | Applied ML & AI

🔭 I’m currently working on many things, check down below! I just finished a RAG Agent that summarizes my youtube videos on my watch list =)
🌱 I’m currently learning Aurelien's Hands-On Machine Learning with Scikit-Learn and PyTorch (on chapter 12 - CNNs) and planning on reading AI Engineering by Chip Huyen next! Please give me AI/ML book suggestions,I would love to read them and continously evolve.
👯 I’m looking to collaborate on a Data Science/ML education game that i'm working on that takes a player from a complete beginner to a Data Scientist. Please reach out to me if you are a Sound Engineer, Art Designer, or just have any fun ideas/levels to collab on.
💬 Ask me about current book i'm reading! or "Why is Kafka fast?" (Hint: it's because it uses Sequential I/O instead of Random I/O and appends at the end of a list) or Ask me "WHAT IS BACKPROPAGATION FARES!?" so I can whip out my basketball shooting example.
⚡ Fun fact I played collegiate E-Sports!

Languages and Tools:

Actively looking for Data Science / ML roles (Intern / New Grad 2026)

🧠 What I Do

End-to-end ML: data → features → model → dashboard → decision
Forecasting & risk modeling in high-stakes settings (retail demand, credit risk, portfolio allocation)
Voice-of-customer NLP at scale (6M+ Steam reviews → topic clusters → player segments)
Explainable ML for non-technical stakeholders (SHAP, what-if simulation)
Reproducible delivery (Docker, VS Code Dev Containers)

🔥 Featured Projects

These are built like products, not class assignments.

1. Customer Churn Intelligence Dashboard

What it is

Churn model using SMOTE + XGBoost (F1 on churners = 0.88)
Flags high-risk customers and simulates “what if we give this segment a 10% discount?”
Delivered in Streamlit so non-technical teams can use it

Why it matters

It's not “here’s a model,” it’s “here’s who to save today and how.”

Repo: customer-churn-streamlit

2. Hybrid Time Series Sales Forcasting - Tensorflow/Keras (LSTM) | Prophet | XGBoost | Hybrid Ensemble

What it is

Hybrid LSTM + Prophet + XGBoost forecaster on ~421k rows of Walmart-style weekly sales
Handles promo spikes, seasonal effects, macro signals

How it's built

Leakage-safe time-aware CV
Rolling / lag features
Holiday & promo awareness

Why it matters

Helps planners avoid stockouts during high-demand weeks

Repo: chronosblend-forecasting

3. Credit Risk Default Pipeline

What it is

Default probability model with gradient boosting
SHAP explanations: “here’s exactly why this borrower is high risk”
Outputs risk tiers underwriter teams can defend to compliance

Why it matters

Turns “black box says no” into transparent, auditable reasoning

Repo: credit-risk-scoring

4. Steam Reviews Segmentation & Topic Mining

What it is

Analyzed ~6,000,000 Steam game reviews to understand player sentiment, pain points, and engagement drivers
NLP pipeline: text cleaning → TF-IDF → NMF topic modeling → SentenceTransformer embeddings → clustering
Generated player segments (“performance complainers”, “balance grinders”, “content hunters”) a studio could target

Why it matters

It's automated voice-of-customer analytics: you can point this at any product with reviews and instantly know what to fix first

Repo: steam-reviews-segmentation

5. State of the Art - Digital Twin Project using PyTorch | Transformers | MCP | RAG (In Progress)

What it is

Designed a digital twin that simulates customer response to discounts and campaigns; estimated heterogeneous treatment effects and individualized treatment rules using T-/X-/DR-learners and causal forests.
Learns budget-constrained policies via contextual bandits and offline RL (CQL/BCQ); validated with IPS/DR/SNIPS showing higher uplift@k and lower regret vs business-as-usual and strong XGBoost baselines.
Productionized with Spark/Delta pipelines, feature store, MLflow model registry, FastAPI microservices, Docker/Kubernetes, CI/CD, and drift/data-quality monitoring (Evidently/Great Expectations).

Status: Private / request access

🛠 Tech Stack

Core        : Python, SQL, Pandas, NumPy
Modeling    : Scikit-Learn, XGBoost, LightGBM, CatBoost
Deep / TS   : PyTorch, Keras (LSTM), Prophet, time-aware CV
NLP         : TF-IDF, NMF topic modeling, SentenceTransformer embeddings, clustering
Apps        : Streamlit dashboards, FastAPI services
Infra       : Docker, VS Code Dev Containers
Explainability : SHAP / model interpretability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fares KnightBytePy

Highlights

Block or report KnightBytePy

Hi 👋, I'm Fares

Data Scientist Master's @ McGill | Applied ML & AI

Languages and Tools:

🧠 What I Do

🔥 Featured Projects

1. Customer Churn Intelligence Dashboard

2. Hybrid Time Series Sales Forcasting - Tensorflow/Keras (LSTM) | Prophet | XGBoost | Hybrid Ensemble

3. Credit Risk Default Pipeline

4. Steam Reviews Segmentation & Topic Mining

5. State of the Art - Digital Twin Project using PyTorch | Transformers | MCP | RAG (In Progress)

🛠 Tech Stack

Pinned Loading

Uh oh!