Welcome to the Data Mining 2 project repository for the academic year 2023/2024! 🎓 This repository serves as a structured guide to tackle the challenges outlined in the DM2 course, from understanding datasets to advanced machine learning techniques.
- 🌟 Explore, clean, and preprocess tabular and time-series datasets.
- ✨ Generate meaningful features or variables.
- 🕵️♂️ Discover motifs, anomalies, and prepare data for clustering and classification.
- 🔍 Find motifs and discords in time series data.
- 🔗 Apply clustering with various algorithms and dimensionality reduction techniques.
- 🏆 Solve classification tasks using advanced techniques like DTW, Shapelets, and CNN/RNN.
- ❗ Outlier Detection: Use density-based and angle-based methods, visualize results.
- ⚖️ Imbalanced Learning: Address class imbalance with undersampling and oversampling techniques.
- 🤖 Apply Logistic Regression, SVMs, Neural Networks, Gradient Boosting, and Ensemble Methods.
- 📈 Regression Analysis: Implement advanced non-linear regression.
- 💡 Explainability: Use tools like SHAP, LIME, or Counterfactual Explainers to make your models transparent.
- Tabular Dataset: Features over 100k records with 114 genre classes and additional artist information.
- Time Series Dataset: Spectral centroids from song audio files (~10k time series).
💾 Note: Preprocess and explore the datasets before diving into analysis.
- 📘 Module Reports: Detailed analysis and results for each module.
- 📊 Visualizations: Highlight key insights through clustering plots, classification performance graphs, etc.
Let’s turn data into golden insights! 🏆