<h1>DBSCAN</h1>
<h2>Penjelasan Singkat dan Pseudocode</h2>
<p>DBSCAN adalah sebuah algoritma clustering berdasarkan kerapatan pada data. DBSCAN sendiri adalah singkatan dari <i>Density-based spatial clustering of applications with noise</i>. DBSCAN mengelompokkan instance-instance yang dekat menjadi satu kelompok berdasarkan jarak maksimum yang disebut epsilon. Sebuah cluster minimal memiliki anggota sebanyak MinPts.<p>
<p>Pseudocode algoritma DBSCAN adalah sebagai berikut:<p>
<ol>
    <li>Hitung jarak antar masing-masing instance. Buatlah matriks jarak dari setiap instance.</li>
    <li>Untuk setiap instance, tentukan instance mana saja yang termasuk dalam neighborhoodnya: yakni instance mana saja yang memiliki jarak kurang dari epsilon dengan instance yang sedang ditentukan neighborhoodnya.</li>
    <li>Leburkan setiap neighborhood yang memiliki irisan keanggotaan.</li>
</ol>

<h2>Source Code</h2>

Jarak yang digunakan pada tugas ini adalah Euclidean distance.

In [15]:
from math import sqrt

def euclidean_distance(instance1, instance2):
    return sqrt((instance1[0]-instance2[0])**2 + (instance1[1]-instance2[1])**2 + (instance1[2]-instance2[2])**2
        + (instance1[3]-instance2[3])**2)

Fungsi di bawah adalah mengembalikan sebuah matriks jarak. Masukannya berupa array yang berisi nilai atribut masing-masing elemen: [[atr11, atr21, atr31, atr41], [atr12, atr22, atr32, atr42], . . . .]

In [16]:
def make_distance_matrix(array):
    distance_matrix = []
    for i in range(0, len(array)):
        distance_matrix.append([])
        for j in range (0, len(array)):
            distance_matrix[i].append(euclidean_distance(array[i], array[j]))
    return distance_matrix

Fungsi di bawah menghasilkan neighborhood-neighborhood yang terbentuk berdasarkan matriks jarak dan radius neighborhood (epsilon)

In [17]:
def make_neighborhoods(distance_matrix, epsilon):
    neighborhoods = []
    for i in range(0, len(distance_matrix)):
        neighborhoods.append([])
        for j in range (0, len(distance_matrix[i])):
            if (distance_matrix[i][j] <= epsilon):
                neighborhoods[i].append(j)
    return neighborhoods

Fungsi di bawah melakukan peleburan pada neighborhood-neighborhood yang beririsan.

In [18]:
def merge_neighborhoods(neighborhoods, min_pts):
    merged_neighborhoods = []
    for i in range(0, len(neighborhoods)):
        if (len(neighborhoods[i]) >= min_pts):
            if (len(merged_neighborhoods) == 0):
                merged_neighborhoods.append(neighborhoods[i])
            else:
                already_listed = False
                first_match_index = -1
                for j in range(0, len(merged_neighborhoods)):
                    if (not set(merged_neighborhoods[j]).isdisjoint(set(neighborhoods[i]))):
                        already_listed = True
                        if (first_match_index == -1):
                            merged_neighborhoods[j] = list(set(merged_neighborhoods[j]).union(set(neighborhoods[i])))
                            first_match_index = j
                        else:
                            merged_neighborhoods[first_match_index] = list(set(merged_neighborhoods[first_match_index]).union(set(neighborhoods[i])))
                            merged_neighborhoods[j] = []
                    else:
                        pass
                if (not already_listed):
                    merged_neighborhoods.append(neighborhoods[i])
        else:
            pass
    i = len(merged_neighborhoods)-1
    while (i >= 0):
        if (len(merged_neighborhoods[i]) == 0):
            del merged_neighborhoods[i]
        i = i-1
    return merged_neighborhoods

Berikut adalah fungsi prediksi DBSCAN.

In [19]:
def DBSCAN_predict(data, epsilon, min_pts):
    distance_matrix = make_distance_matrix(data)
    neighborhoods = make_neighborhoods(distance_matrix, epsilon)
    merged_clusters = merge_neighborhoods(neighborhoods, min_pts)
    labels = []
    for i in range(0, len(data)):
        labels.append(-1)
    for i in range(0, len(merged_clusters)):
        for j in range(0, len(merged_clusters[i])):
            labels[merged_clusters[i][j]] = i
    return labels

<h2>Clustering pada Dataset Iris</h2>

In [20]:
from sklearn import datasets
iris = datasets.load_iris()

In [21]:
from sklearn.metrics import accuracy_score
accuracy_score(iris.target, DBSCAN_predict(iris.data, 0.7, 2))

0.68000000000000005

<h2>Pembagian Tugas Kelompok</h2>
<ul>
    <li>13515032 Helena Suzane Graciella Ringoringo: DBSCAN</li>
    <li>13515046 Lathifah Nurrahmah: K-Means dan K-Medoids</li>
    <li>13515098 Aya Aurora Rimbamorani: Agglomerative Clustering</li>
</ul>