Skip to content
View KnightBytePy's full-sized avatar

Highlights

  • Pro

Block or report KnightBytePy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KnightBytePy/README.md

Hi 👋, I'm Fares

Data Scientist Master's @ McGill | Applied ML & AI

  • 🔭 I’m currently working on many things, check down below! I just finished a RAG Agent that summarizes my youtube videos on my watch list =)

  • 🌱 I’m currently learning Aurelien's Hands-On Machine Learning with Scikit-Learn and PyTorch (on chapter 12 - CNNs) and planning on reading AI Engineering by Chip Huyen next! Please give me AI/ML book suggestions,I would love to read them and continously evolve.

  • 👯 I’m looking to collaborate on a Data Science/ML education game that i'm working on that takes a player from a complete beginner to a Data Scientist. Please reach out to me if you are a Sound Engineer, Art Designer, or just have any fun ideas/levels to collab on.

  • 💬 Ask me about current book i'm reading! or "Why is Kafka fast?" (Hint: it's because it uses Sequential I/O instead of Random I/O and appends at the end of a list) or Ask me "WHAT IS BACKPROPAGATION FARES!?" so I can whip out my basketball shooting example.

  • ⚡ Fun fact I played collegiate E-Sports!

Languages and Tools:

aws azure cassandra docker figma gcp kafka kubernetes mysql pandas python pytorch scikit_learn seaborn tensorflow unity unreal

Actively looking for Data Science / ML roles (Intern / New Grad 2026)


🧠 What I Do

  • End-to-end ML: data → features → model → dashboard → decision
  • Forecasting & risk modeling in high-stakes settings (retail demand, credit risk, portfolio allocation)
  • Voice-of-customer NLP at scale (6M+ Steam reviews → topic clusters → player segments)
  • Explainable ML for non-technical stakeholders (SHAP, what-if simulation)
  • Reproducible delivery (Docker, VS Code Dev Containers)

🔥 Featured Projects

These are built like products, not class assignments.

1. Customer Churn Intelligence Dashboard

What it is

  • Churn model using SMOTE + XGBoost (F1 on churners = 0.88)
  • Flags high-risk customers and simulates “what if we give this segment a 10% discount?”
  • Delivered in Streamlit so non-technical teams can use it

Why it matters

  • It's not “here’s a model,” it’s “here’s who to save today and how.”

Repo: customer-churn-streamlit


2. Hybrid Time Series Sales Forcasting - Tensorflow/Keras (LSTM) | Prophet | XGBoost | Hybrid Ensemble

What it is

  • Hybrid LSTM + Prophet + XGBoost forecaster on ~421k rows of Walmart-style weekly sales
  • Handles promo spikes, seasonal effects, macro signals

How it's built

  • Leakage-safe time-aware CV
  • Rolling / lag features
  • Holiday & promo awareness

Why it matters

  • Helps planners avoid stockouts during high-demand weeks

Repo: chronosblend-forecasting


3. Credit Risk Default Pipeline

What it is

  • Default probability model with gradient boosting
  • SHAP explanations: “here’s exactly why this borrower is high risk”
  • Outputs risk tiers underwriter teams can defend to compliance

Why it matters

  • Turns “black box says no” into transparent, auditable reasoning

Repo: credit-risk-scoring


4. Steam Reviews Segmentation & Topic Mining

What it is

  • Analyzed ~6,000,000 Steam game reviews to understand player sentiment, pain points, and engagement drivers
  • NLP pipeline: text cleaning → TF-IDF → NMF topic modeling → SentenceTransformer embeddings → clustering
  • Generated player segments (“performance complainers”, “balance grinders”, “content hunters”) a studio could target

Why it matters

  • It's automated voice-of-customer analytics: you can point this at any product with reviews and instantly know what to fix first

Repo: steam-reviews-segmentation


5. State of the Art - Digital Twin Project using PyTorch | Transformers | MCP | RAG (In Progress)

What it is

  • Designed a digital twin that simulates customer response to discounts and campaigns; estimated heterogeneous treatment effects and individualized treatment rules using T-/X-/DR-learners and causal forests.
  • Learns budget-constrained policies via contextual bandits and offline RL (CQL/BCQ); validated with IPS/DR/SNIPS showing higher uplift@k and lower regret vs business-as-usual and strong XGBoost baselines.
  • Productionized with Spark/Delta pipelines, feature store, MLflow model registry, FastAPI microservices, Docker/Kubernetes, CI/CD, and drift/data-quality monitoring (Evidently/Great Expectations).

Status: Private / request access


🛠 Tech Stack

Core        : Python, SQL, Pandas, NumPy
Modeling    : Scikit-Learn, XGBoost, LightGBM, CatBoost
Deep / TS   : PyTorch, Keras (LSTM), Prophet, time-aware CV
NLP         : TF-IDF, NMF topic modeling, SentenceTransformer embeddings, clustering
Apps        : Streamlit dashboards, FastAPI services
Infra       : Docker, VS Code Dev Containers
Explainability : SHAP / model interpretability



Pinned Loading

  1. Data-Science-Portfolio Data-Science-Portfolio Public

    Jupyter Notebook 1