<a href="https://colab.research.google.com/github/Dimas0824/Machine_Learning/blob/main/Jobsheet_6/Week7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Praktikum 4

Percobaan kali ini kita akan melihat perbedaan ketiga model yang telah kita bahas dan bandingkan hasilnya.

In [6]:
!pip install annoy hnswlib faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.1 kB)
Downloading faiss_cpu-1.12.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (31.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.12.0


In [7]:
import numpy as np
import time
from annoy import AnnoyIndex
import faiss
import hnswlib

# ===============================
# 1. Buat dataset 1 juta data 5D
# ===============================
n_data = 1_000_000   # bisa coba 100_000 dulu jika RAM terbatas
dim = 5
X = np.random.random((n_data, dim)).astype(np.float32)

# Query point
query = np.random.random((1, dim)).astype(np.float32)
k = 10

# ===============================
# 2. Annoy
# ===============================
print("=== Annoy ===")
ann_index = AnnoyIndex(dim, 'euclidean')

start = time.time()
for i in range(n_data):
    ann_index.add_item(i, X[i])
ann_index.build(10)  # 10 trees
build_time = time.time() - start

start = time.time()
neighbors = ann_index.get_nns_by_vector(query[0], k, include_distances=True)
query_time = time.time() - start

print("Build time:", build_time, "detik")
print("Query time:", query_time, "detik")
print("Neighbors:", neighbors[0][:5], "...")

# ===============================
# 3. FAISS (Flat Index)
# ===============================
print("\n=== FAISS (IndexFlatL2) ===")
faiss_index = faiss.IndexFlatL2(dim)

start = time.time()
faiss_index.add(X)
build_time = time.time() - start

start = time.time()
distances, indices = faiss_index.search(query, k)
query_time = time.time() - start

print("Build time:", build_time, "detik")
print("Query time:", query_time, "detik")
print("Neighbors:", indices[0][:5], "...")

# ===============================
# 4. HNSW (hnswlib)
# ===============================
print("\n=== HNSW (hnswlib) ===")
hnsw_index = hnswlib.Index(space='l2', dim=dim)

start = time.time()
hnsw_index.init_index(max_elements=n_data, ef_construction=200, M=16)
hnsw_index.add_items(X)
build_time = time.time() - start

hnsw_index.set_ef(50)

start = time.time()
labels, distances = hnsw_index.knn_query(query, k=k)
query_time = time.time() - start

print("Build time:", build_time, "detik")
print("Query time:", query_time, "detik")
print("Neighbors:", labels[0][:5], "...")


=== Annoy ===
Build time: 28.73296308517456 detik
Query time: 0.0002200603485107422 detik
Neighbors: [817987, 20461, 285934, 962481, 29827] ...

=== FAISS (IndexFlatL2) ===
Build time: 0.014519929885864258 detik
Query time: 0.006082057952880859 detik
Neighbors: [817987  20461 285934 962481  29827] ...

=== HNSW (hnswlib) ===
Build time: 172.63050293922424 detik
Query time: 0.0002682209014892578 detik
Neighbors: [817987  20461 285934 962481  29827] ...


Lakukan percobaan pada metric distance yang berbeda. catat hasilnya pada tabel yang anda buat sendiri seperti pada praktikum 1.

| Dataset | Algoritma | Build Time (detik) | Query Time (detik) | Kualitas Neighbor (Top-5) | Catatan Singkat |
|----------|------------|--------------------|---------------------|-----------------------------|------------------|
| **1 Juta data (5D)** | **Annoy** | 22.5747 | **0.000198** | [701916, 852514, 242199, 884110, 691326] | Build sedang, query tercepat, akurasi tinggi |
|  | **FAISS (IndexFlatL2)** | **0.0077** | 0.006384 | [701916, 852514, 242199, 884110, 691326] | Build sangat cepat, query sedikit lebih lambat |
|  | **HNSW (hnswlib)** | 189.6506 | 0.000372 | [701916, 852514, 242199, 884110, 691326] | Build sangat lama, query cepat dan akurat |
| **500 Ribu data (5D)** | **Annoy** | 15.6125 | **0.000210** | [237644, 244818, 269372, 240874, 321280] | Build cukup lama, query sangat cepat |
|  | **FAISS (IndexFlatL2)** | **0.0044** | 0.003382 | [237644, 244818, 269372, 240874, 321280] | Build super cepat, query lumayan cepat |
|  | **HNSW (hnswlib)** | 82.9095 | 0.000299 | [237644, 244818, 269372, 240874, 321280] | Build berat, query cepat |
| **100 Ribu data (5D)** | **Annoy** | 1.6402 | 0.000203 | [75994, 50595, 4296, 60502, 87975] | Build ringan, query sangat cepat |
|  | **FAISS (IndexFlatL2)** | **0.000875** | 0.000880 | [75994, 50595, 4296, 60502, 87975] | Build tercepat, query cepat |
|  | **HNSW (hnswlib)** | 12.7151 | **0.000171** | [75994, 50595, 4296, 60502, 87975] | Build lebih berat, query tercepat |

In [1]:
import numpy as np
import time
from annoy import AnnoyIndex
import faiss
import hnswlib

# ===============================
# 1. Buat dataset 500K data 5D
# ===============================
n_data = 500_000   # bisa coba 100_000 dulu jika RAM terbatas
dim = 5
X = np.random.random((n_data, dim)).astype(np.float32)

# Query point
query = np.random.random((1, dim)).astype(np.float32)
k = 10

# ===============================
# 2. Annoy
# ===============================
print("=== Annoy ===")
ann_index = AnnoyIndex(dim, 'euclidean')

start = time.time()
for i in range(n_data):
    ann_index.add_item(i, X[i])
ann_index.build(10)  # 10 trees
build_time = time.time() - start

start = time.time()
neighbors = ann_index.get_nns_by_vector(query[0], k, include_distances=True)
query_time = time.time() - start

print("Build time:", build_time, "detik")
print("Query time:", query_time, "detik")
print("Neighbors:", neighbors[0][:5], "...")

# ===============================
# 3. FAISS (Flat Index)
# ===============================
print("\n=== FAISS (IndexFlatL2) ===")
faiss_index = faiss.IndexFlatL2(dim)

start = time.time()
faiss_index.add(X)
build_time = time.time() - start

start = time.time()
distances, indices = faiss_index.search(query, k)
query_time = time.time() - start

print("Build time:", build_time, "detik")
print("Query time:", query_time, "detik")
print("Neighbors:", indices[0][:5], "...")

# ===============================
# 4. HNSW (hnswlib)
# ===============================
print("\n=== HNSW (hnswlib) ===")
hnsw_index = hnswlib.Index(space='l2', dim=dim)

start = time.time()
hnsw_index.init_index(max_elements=n_data, ef_construction=200, M=16)
hnsw_index.add_items(X)
build_time = time.time() - start

hnsw_index.set_ef(50)

start = time.time()
labels, distances = hnsw_index.knn_query(query, k=k)
query_time = time.time() - start

print("Build time:", build_time, "detik")
print("Query time:", query_time, "detik")
print("Neighbors:", labels[0][:5], "...")


=== Annoy ===
Build time: 15.61251711845398 detik
Query time: 0.00020956993103027344 detik
Neighbors: [237644, 244818, 269372, 240874, 321280] ...

=== FAISS (IndexFlatL2) ===
Build time: 0.004393100738525391 detik
Query time: 0.0033822059631347656 detik
Neighbors: [237644 244818 269372 240874 321280] ...

=== HNSW (hnswlib) ===
Build time: 82.90947318077087 detik
Query time: 0.0002994537353515625 detik
Neighbors: [237644 244818 269372 240874 321280] ...


In [17]:
import numpy as np
import time
from annoy import AnnoyIndex
import faiss
import hnswlib

# ===============================
# 1. Buat dataset 100K data 5D
# ===============================
n_data = 100_000   # bisa coba 100_000 dulu jika RAM terbatas
dim = 5
X = np.random.random((n_data, dim)).astype(np.float32)

# Query point
query = np.random.random((1, dim)).astype(np.float32)
k = 10

# ===============================
# 2. Annoy
# ===============================
print("=== Annoy ===")
ann_index = AnnoyIndex(dim, 'euclidean')

start = time.time()
for i in range(n_data):
    ann_index.add_item(i, X[i])
ann_index.build(10)  # 10 trees
build_time = time.time() - start

start = time.time()
neighbors = ann_index.get_nns_by_vector(query[0], k, include_distances=True)
query_time = time.time() - start

print("Build time:", build_time, "detik")
print("Query time:", query_time, "detik")
print("Neighbors:", neighbors[0][:5], "...")

# ===============================
# 3. FAISS (Flat Index)
# ===============================
print("\n=== FAISS (IndexFlatL2) ===")
faiss_index = faiss.IndexFlatL2(dim)

start = time.time()
faiss_index.add(X)
build_time = time.time() - start

start = time.time()
distances, indices = faiss_index.search(query, k)
query_time = time.time() - start

print("Build time:", build_time, "detik")
print("Query time:", query_time, "detik")
print("Neighbors:", indices[0][:5], "...")

# ===============================
# 4. HNSW (hnswlib)
# ===============================
print("\n=== HNSW (hnswlib) ===")
hnsw_index = hnswlib.Index(space='l2', dim=dim)

start = time.time()
hnsw_index.init_index(max_elements=n_data, ef_construction=200, M=16)
hnsw_index.add_items(X)
build_time = time.time() - start

hnsw_index.set_ef(50)

start = time.time()
labels, distances = hnsw_index.knn_query(query, k=k)
query_time = time.time() - start

print("Build time:", build_time, "detik")
print("Query time:", query_time, "detik")
print("Neighbors:", labels[0][:5], "...")


=== Annoy ===
Build time: 1.6402196884155273 detik
Query time: 0.0002028942108154297 detik
Neighbors: [75994, 50595, 4296, 60502, 87975] ...

=== FAISS (IndexFlatL2) ===
Build time: 0.0008745193481445312 detik
Query time: 0.0008797645568847656 detik
Neighbors: [75994 50595  4296 60502 87975] ...

=== HNSW (hnswlib) ===
Build time: 12.71508264541626 detik
Query time: 0.00017070770263671875 detik
Neighbors: [75994 50595  4296 60502 87975] ...
