# **Praktikum 1**

Pada praktikum ini kita akan mempelajari algoritma **HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise)** sebagai salah satu metode _clustering_ berbasis densitas yang lebih robust dibandingkan `DBSCAN`. Melalui pendekatan hierarki, `HDBSCAN` mampu mengatasi keterbatasan parameter `eps` yang sensitif pada `DBSCAN` serta dapat menyesuaikan diri dengan data yang memiliki kepadatan berbeda. Praktikum ini akan difokuskan pada eksplorasi hasil _clustering_ menggunakan dataset sintetis serta pengaruh _hyperparameter_ penting seperti `min_cluster_size`, `min_samples`, dan `cut_distance`, sehingga nantinya dapat memahami bagaimana `HDBSCAN` bekerja dalam memisahkan _cluster_, mengidentifikasi _noise_, dan beradaptasi dengan struktur data yang kompleks.

### **Persiapan Lingkungan**

In [2]:
# Instalasi pustaka hdbscan (tidak tersedia default di sklearn)
%pip install hdbscan

# Import modul
import hdbscan
import matplotlib.pyplot as plt
import numpy as np

from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs

Collecting hdbscan
  Using cached hdbscan-0.8.40.tar.gz (6.9 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: hdbscan
  Building wheel for hdbscan (pyproject.toml): started
  Building wheel for hdbscan (pyproject.toml): finished with status 'error'
Failed to build hdbscan
Note: you may need to restart the kernel to use updated packages.


  error: subprocess-exited-with-error
  
  × Building wheel for hdbscan (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [39 lines of output]
      !!
      
              ********************************************************************************
              Please consider removing the following classifiers in favor of a SPDX license expression:
      
              License :: OSI Approved
      
              See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.
              ********************************************************************************
      
      !!
        self._finalize_license_expression()
      running bdist_wheel
      running build
      running build_py
      creating build\lib.win-amd64-cpython-313\hdbscan
      copying hdbscan\branches.py -> build\lib.win-amd64-cpython-313\hdbscan
      copying hdbscan\flat.py -> build\lib.win-amd64-cpython-313\hdbscan
      copying hdbscan\hdbscan_.py ->

ModuleNotFoundError: No module named 'hdbscan'

### **Langkah 2: Definisi Fungsi Visualisasi**

Jalankan fungsi ini agar kita bisa mem-plot hasil _clustering_ dengan warna berbeda.

In [None]:
def plot(X, labels, probabilities=None, parameters=None, ground_truth=False, ax=None):
    if ax is None:
        _, ax = plt.subplots(figsize=(10, 4))
    labels = labels if labels is not None else np.ones(X.shape[0])
    probabilities = probabilities if probabilities is not None else np.ones(X.shape[0])
    unique_labels = set(labels)
    colors = [plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))]
    proba_map = {idx: probabilities[idx] for idx in range(len(labels))}
    
    for k, col in zip(unique_labels, colors):
        if k == -1:
            col = [0, 0, 0, 1]  # warna hitam untuk noise
        class_index = (labels == k).nonzero()[0]
        for ci in class_index:
            ax.plot(
                X[ci, 0],
                X[ci, 1],
                "x" if k == -1 else "o",
                markerfacecolor=tuple(col),
                markeredgecolor="k",
                markersize=4 if k == -1 else 1 + 5 * proba_map[ci],
            )
    n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
    preamble = "True" if ground_truth else "Estimated"
    title = f"{preamble} number of clusters: {n_clusters_}"
    if parameters is not None:
        parameters_str = ", ".join(f"{k}={v}" for k, v in parameters.items())
        title += f" | {parameters_str}"
    ax.set_title(title)
    plt.tight_layout()