A comprehensive collection of technical projects demonstrating end-to-end expertise in Machine Learning, Quantitative Finance, and Full-Stack Data Engineering.
This repository serves as a centralized portfolio containing six production-grade projects. Each directory represents a standalone application or research pipeline, complete with source code, documentation, and rigorous performance analysis.
| Project | Domain | Tech Stack | Key Impact |
|---|---|---|---|
| Solar Energy Forecasting | Smart Grid / Energy | XGBoost LSTM Optuna |
Reduced MAPE error significantly vs. ARIMA baselines; engineered hybrid forecasting models for prosumer consumption/production. |
| CNN Image Classification | Computer Vision | PyTorch torchvision |
Achieved 84.6% Accuracy on CIFAR-10 using a custom 3-layer CNN with adaptive pooling and augmentation pipelines. |
| Project | Domain | Tech Stack | Key Impact |
|---|---|---|---|
| Loan Approval Prediction | FinTech / Risk | Scikit-Learn SHAP Random Forest |
Built an automated underwriting system with 99.25% Precision and 100% Recall; integrated SHAP for regulatory explainability. |
| TV Show Analytics | Data Mining | SciPy BeautifulSoup Requests |
End-to-end scraper for 200+ shows; applied Kruskal-Wallis & Robust Regression to debunk "Golden Age" TV myths. |
| Project | Domain | Tech Stack | Key Impact |
|---|---|---|---|
| Portfolio Risk Modeling | Quant Finance | R GARCH Quadprog |
Implemented Mean-Variance optimization (Markowitz) and Dynamic Volatility forecasting using GARCH(1,1). |
| UniBooks System | DBMS | MS Access VBA SQL |
Designed a normalized relational database with RBAC security and automated inventory tracking triggers. |
- Challenge: Mitigate energy imbalance costs in smart grids by predicting erratic prosumer behavior.
- Solution: Developed a comparative pipeline using Gradient Boosting (XGBoost) and Recurrent Neural Networks (LSTM).
- Highlights:
- Automated hyperparameter tuning via Optuna (Bayesian Optimization).
- Implemented 5-fold
TimeSeriesSplitcross-validation to prevent look-ahead bias. - Artifacts: Full technical report (
.pdf) and production-ready Python scripts.
- 👉 View Project
- Challenge: Automate loan eligibility while minimizing financial risk and maintaining interpretability.
- Solution: A Random Forest classifier tuned for high precision in the "Safe-to-Approve" band.
- Highlights:
- Feature Engineering: Created high-impact ratios (e.g., Debt-to-Income, Asset Liquidity).
- Governance: Utilized SHAP (SHapley Additive exPlanations) to audit model decisions for bias.
- Performance: Achieved ROC-AUC of 0.999 on the test set.
- 👉 View Project
- Challenge: Model portfolio risk beyond simple standard deviation in volatile markets.
- Solution: A statistical framework combining Modern Portfolio Theory (MPT) with time-series econometrics.
- Highlights:
- Convex Optimization: Calculated Global Minimum Variance (GMV) and Tangency portfolios using quadratic programming.
- Volatility Modeling: Integrated GARCH(1,1) to capture volatility clustering and "fat tails" in asset returns.
- Backtesting: Rolling-window analysis to validate Value-at-Risk (VaR) estimations.
- 👉 View Project
- Challenge: Implement a robust vision pipeline from scratch without relying on pre-trained models.
- Solution: Designed a custom 3-layer Convolutional Neural Network (CNN) for the CIFAR-10 dataset.
- Highlights:
- Architecture: Utilized
Conv2dblocks with Batch Normalization and Max Pooling; integrated Dropout to prevent overfitting. - Augmentation: Applied random rotations and horizontal flips to improve generalization.
- Result: Achieved 84.6% Accuracy, with strong performance on mechanical classes (Cars/Trucks).
- Architecture: Utilized
- 👉 View Project
- Challenge: Validate cultural theories ("Golden Age of TV") using real-world unstructured data.
- Solution: A dual-phase pipeline: Automated Scraper (Python/Requests) + Statistical Inference (SciPy).
- Highlights:
- Data Engineering: Built a resilient scraper to harvest metadata for 200+ shows, handling retries and rate limiting.
- Inference: Applied non-parametric tests (Mann-Whitney U, Kruskal-Wallis) to handle non-normal rating distributions.
- Insight: Disproved "Longer is Better" myths using robust regression analysis.
- 👉 View Project
- Challenge: Replace manual bookstore tracking with a scalable, atomic transaction system.
- Solution: A relational database system built with MS Access and VBA automation.
- Highlights:
- Schema Design: 3NF Normalized database ensuring data integrity across Inventory, Sales, and Procurement.
- Automation: VBA triggers for real-time stock level checks (
Inventory < Order_Qtylogic). - Analytics: SQL-driven dashboards for "Best Sellers" and monthly revenue tracking.
- 👉 View Project
Each project is self-contained. To run a specific project:
- Navigate to the project folder.
- Read the local
README.mdfor specific dependency installation (e.g.,pip install -r requirements.txtor R library installation). - Launch the corresponding Jupyter Notebook (
.ipynb) or R Script (.R).
# Example: Cloning the repo
git clone [https://github.com/andyhu11/Project-Documentation.git](https://github.com/andyhu11/Project-Documentation.git)
cd Project-Documentation
This repository is licensed under the MIT License. See individual project folders for specific third-party attributions.
- LinkedIn: Andy
- Portfolio: github.com/andyhu11/Project-Documentation
- Email: jiahuiapply26@163.com