Companion code for tutorials at largo.dev.
Practical, code-heavy implementations of ML concepts—embeddings, transformers, time series, edge deployment, and more.
- Sentence Embeddings from Scratch — Build a BiLSTM sentence encoder with PyTorch. Covers tokenization, embedding layers, and pooling strategies.
-
Hard Drive Failure Prediction — Predict drive failures using Backblaze SMART data (24.8M records, Q4 2023). Compares XGBoost, LSTM, Transformer, and Mamba architectures.
Model Test AUC F2 Recall Notes XGBoost 0.920 0.095 54% Feature engineering wins Transformer 0.916 0.006 80% Conv1D preprocessing LSTM 0.907 0.006 78% Bidirectional Mamba (SSM) 0.901 0.017 69% Linear complexity O(n) Key techniques: balanced sampling, GPU acceleration with cuDF, Conv1D preprocessing, threshold optimization for extreme class imbalance (0.01% positive rate).
New:
model_mamba.pyimplements Mamba SSM using the officialmamba-ssmlibrary. All neural models achieve ~0.90 AUC (good ranking), but XGBoost with engineered features achieves 10x better F2.
# Clone the repo
git clone https://github.com/StoliRocks/largo-tutorials.git
cd largo-tutorials
# Install common dependencies
pip install -r requirements.txt
# Or install for a specific tutorial
pip install -r embeddings/sentence-embeddings-from-scratch/requirements.txtSteven W. White — largo.dev · LinkedIn
MIT