# Jaccard Index

- The __Jaccard index__, also known as __Intersection over Union__ 

- The __Jaccard similarity coefficient__ (originally given the French name: *coefficient de communauté* by Paul Jaccard), is a statistic used for gauging the __similarity__ and __diversity__ of sample sets. 

- The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

$$ J(A, B) = \frac {|A \cap B|} {|A \cup B|} = \frac {|A \cap B|} {|A| + |B| - |A \cap B|} $$

In [3]:
import numpy as np
import pandas as pd

In [54]:
# Misalkan terdapat data aktual (y) & data hasil prediksi model (yp)
y = np.array([1, 1, 1, 0, 1])
yp = np.array([0, 1, 1, 1, 1])

<hr>

### Jaccard Manual calculation

$\displaystyle J(y, yp) = \frac {|y \cap yp|} {|y \cup yp|} = \frac {|y \cap yp|} {|y| + |yp| - |y \cap yp|} $

- $\displaystyle |y \cap yp| = $ jumlah y & yp yang sama

- $\displaystyle |y \cup yp| = $ jumlah total data y atau yp

In [57]:
j = 3 / 5
j

0.6

<hr>

### Jaccard Using Sklearn

In [58]:
from sklearn.metrics import jaccard_score

jaccard_score(y, yp)

0.6

In [62]:
y_true = np.array([[0, 1, 1], [1, 1, 0]])
y_pred = np.array([[1, 1, 1], [1, 0, 0]])

# binary case: hanya ada 2 val target: 0/1, False/True, No,Yes
print(jaccard_score(y_true[0], y_pred[0]))

# multilabel case: memiliki >1 dimensi
print(jaccard_score(y_true, y_pred, average=None))

0.6666666666666666
[0.5 0.5 1. ]


In [63]:
# multiclass case: prediksi >2 val target/kategori
y_pred = [0, 2, 1, 2]
y_true = [0, 1, 2, 2]
jaccard_score(y_true, y_pred, average=None)

array([1.        , 0.        , 0.33333333])