Welcome! Here, you'll find a collection of Google Colab notebooks used on article that explore and optimize clustering techniques, specifically tailored for applications using unsupervised learning methods. Each notebook contains code and detailed explanations, making it suitable for both beginners and experienced data scientists.
- Clone this repository:
git clone https://github.com/tsa2ufpe/trace-clustering-article.git
File: param_search.ipynb
This notebook provides a creative approach to generating a sequence of numbers for tuning the min_cluster_size parameter in HDBSCAN. The values are dynamically adjusted based on a starting limit and adaptive learning rates, creating a progressively narrowing search space.
Key Features:
- Dynamically calculates a range of parameter values.
- Adapts sil rates over iterations for efficient optimization.
- Ideal for experimentation and fine-tuning clustering parameters.
📌 Use this notebook to optimize clustering method parameters and ensure effective clustering results.
Files:
clustering_experiment_v2_c1.ipynbclustering_experiment_v2_c2.ipynbclustering_experiment_v2_c3.ipynb
These notebooks guide you through the four key steps of clustering analysis, as described in cutting-edge research:
Identify the most significant attributes that distinguish patterns across clusters. Proper feature selection simplifies analysis and enhances clustering outcomes.
Select the optimal clustering algorithm and proximity measure to structure your data effectively. This ensures the accuracy and relevance of the formed clusters.
Validate the generated clusters using:
- Structural Metrics: Analyze cluster structure directly.
- Behavioral Metrics: Evaluate process mining metrics (e.g., fitness, precision, f-score, simplicity).
- Quantitative Business Metrics: Assess frequency and lift of key activities within clusters.
- Qualitative Business Metrics: Interpret results with domain experts to identify actionable insights.
Dive deep into the insights derived from clustering, identifying behaviors and trends aligned with business goals.
📌 Leverage these notebooks to conduct robust clustering experiments and derive meaningful insights.
param_search.ipynb: Generate a parameter search space for HDBSCAN.clustering_experiment_v2_c1.ipynb: First step of clustering methodology (Feature Selection).clustering_experiment_v2_c2.ipynb: Algorithm design and validation techniques.clustering_experiment_v2_c3.ipynb: Advanced validation and business interpretation.