Course: WIE 3007 – Data Mining
Semester: 2025/2026 – Semester 1
Institution: Universiti Malaya, Faculty of Computer Science and Information Technology
Due: Week 13
| Name | Matric No. | GitHub Username | Role |
|---|---|---|---|
| Ryan Chin Jian Hwa | 23005233 | @wrynaft | Model Evaluation |
| Liew Jin Sze | 23005226 | @jinsze | Data Modelling |
| Kueh Pang Lang | 23005227 | @pang-lang | EDA |
| Koay Khoon Lyn | 23005235 | @khoonlyn913 | Dataset Simulation |
| Maxwell Jared Daniel | 22002648 | @oatmeal2211 | Feature Engineering |
This project applies Data Mining and AI-enhanced analytics in the financial/business domain using Generative AI (GenAI), Large Language Models (LLMs), and Small Language Models (SLMs). The project encompasses dataset simulation, feature engineering, predictive modelling, and model interpretation with AI support.
- Apply data-mining workflows in real financial/business contexts
- Integrate GenAI/LLMs/SLMs into data analysis
- Build and evaluate predictive models
- Collaborate effectively using professional GitHub practices
- Simulate 1000+ financial/business-related records
- Use GenAI to create realistic numerical and textual patterns
- Apply LLMs/SLMs for feature extraction (sentiment analysis, risk categorization, customer segmentation)
- Develop classification/regression models using:
- Random Forest
- Logistic Regression
- XGBoost
- Neural Networks
- Utilize AI tools for text-based feature engineering
- Compare results across at least two models
- Evaluate using appropriate metrics (Accuracy, F1-score, ROC-AUC, RMSE)
- Use LLMs to summarize findings and provide insights
- Interpret feature importance
- 5–7 page comprehensive report
- Includes: objectives, dataset details, EDA, feature engineering, modelling, results, business insights
- AI usage disclosure and GitHub contribution summary