This repository contains both major assessments completed for the Machine Learning & Parallel Computing (ITS66604) module.
Across these projects, I explored the foundations of supercomputing, GPU architecture, parallel processing, and machine learning workflows, applying them to real-world analytical tasks.
The repository includes:
- π₯ High-Performance Computing (HPC) Supercomputer Analysis
- π€ Machine Learning Classification Project (Weather Extreme Detection)
- Exploratory Data Analysis (EDA)
- Data preprocessing & cleaning
- Handling anomalies and missing values
- Label engineering (custom rule-based classification)
- Feature engineering (interaction features, distractor features)
- Gaussian noise injection for robustness
- Class imbalance handling (oversampling)
- Encoding & scaling techniques (LabelEncoder, StandardScaler)
- Machine learning workflows using:
- Decision Tree
- Random Forest
- Model evaluation using:
- Confusion matrix
- Precision/Recall/F1-score
- ROC Curve & AUC scores
(Random Forest AUC = 0.99 β excellent performance)
From the Supercomputer project:
- Supercomputer architecture (NVIDIA DGX A100 + SuperPOD)
- Parallel GPU processing (CUDA cores, Tensor Cores)
- NVLink + NVSwitch high-speed interconnect
- Distributed computing & InfiniBand networking
- GPU virtualization (MIG)
- HPC benchmark interpretation (HPL, HPCG, MLPerf)
- Storage hierarchy & parallel file systems
- Energy-efficient computing (Green500 insights)
- Use cases:
- AI training at scale
- Autonomous driving simulation
- Weather & climate modeling
- Scientific computing & medical research
A complete study of HPC design, including:
- GPU/CPU topology
- NVLink / NVSwitch connectivity
- InfiniBand networking
- Multi-node scaling
- Power & cooling strategies
- AI-accelerated workloads
π Location: /Supercomputer/
A full ML pipeline to classify Normal vs Extreme weather using global meteorological data.
Includes:
- 40+ weather features
- Full preprocessing pipeline
- Noise injection & advanced feature engineering
- Oversampling to fix class imbalance
- Decision Tree vs Random Forest comparison
- Achieved up to 95% accuracy and AUC 0.99
π Location: /Extreme Weather Classification/
- Machine Learning (scikit-learn)
- Data cleaning & feature engineering
- EDA visualization (Seaborn, Matplotlib)
- GPU architecture & parallel computing
- HPC system design & benchmarking
- Jupyter/Colab development
- CSV data processing at scale
- Performance comparison & interpretation
- Model explainability (feature importance)
- Handling real-world ML challenges (noise, imbalance, anomalies)
- Technical reporting
- Academic poster design
- Structured documentation
- Scientific analysis & evaluation
β¨ Thank you for exploring my MLPC repository!
