# Jaccard Coefficient Calculation using Scikit-learn

This notebook solves the pathological test results problem using scikit-learn's `jaccard_score`.

## Problem Statement

Calculate the Jaccard coefficient for pathological test results of three individuals:

| Name | Gender | Fever | Cough | Test-1 | Test-2 | Test-3 | Test-4 |
|------|--------|-------|-------|--------|--------|--------|--------|
| Jack | M      | Y     | N     | P      | N      | N      | A      |
| Mary | F      | Y     | N     | P      | A      | P      | N      |
| Jim  | M      | Y     | P     | N      | N      | N      | A      |

In [1]:
import numpy as np
import pandas as pd
from sklearn.metrics import jaccard_score

In [21]:
# Create the data as a DataFrame
data = {
    'Name': ['Jack', 'Mary', 'Jim'],
    'Fever': ['Y', 'Y', 'Y'],
    'Cough': ['N', 'N', 'P'],
    'Test-1': ['P', 'P', 'N'],
    'Test-2': ['N', 'A', 'N'],
    'Test-3': ['N', 'P', 'N'],
    'Test-4': ['A', 'N', 'A']
}

df = pd.DataFrame(data)
print("Original Data:")
df

Original Data:


Unnamed: 0,Name,Fever,Cough,Test-1,Test-2,Test-3,Test-4
0,Jack,Y,N,P,N,N,A
1,Mary,Y,N,P,A,P,N
2,Jim,Y,P,N,N,N,A


Scikit-learn's `jaccard_score` requires numeric labels

In [22]:
df_encoded = df.copy()
features = ['Fever', 'Cough', 'Test-1', 'Test-2', 'Test-3', 'Test-4']
map = {'Y': 1, 'P': 1, 'N': 0, 'A': 0}

# convert categorical data to numeric
for column in features:
    df_encoded[column] = df_encoded[column].map(map)

print("Encoded Data:")
df_encoded

Encoded Data:


Unnamed: 0,Name,Fever,Cough,Test-1,Test-2,Test-3,Test-4
0,Jack,1,0,1,0,0,0
1,Mary,1,0,1,0,1,0
2,Jim,1,1,0,0,0,0


In [25]:
jack_encoded = df_encoded[df_encoded['Name'] == 'Jack'][features].values[0]
mary_encoded = df_encoded[df_encoded['Name'] == 'Mary'][features].values[0]
jim_encoded = df_encoded[df_encoded['Name'] == 'Jim'][features].values[0]

print("Encoded vectors:")
print(f"Jack: {jack_encoded}")
print(f"Mary: {mary_encoded}")
print(f"Jim:  {jim_encoded}")

Encoded vectors:
Jack: [1 0 1 0 0 0]
Mary: [1 0 1 0 1 0]
Jim:  [1 1 0 0 0 0]


## Jaccard Scores

In [27]:
# (Jack, Mary)
jaccard_jack_mary = jaccard_score(jack_encoded, mary_encoded,)
print(f"Jaccard(Jack, Mary) = {jaccard_jack_mary:.2f}")

# (Jack, Jim)
jaccard_jack_jim = jaccard_score(jack_encoded, jim_encoded)
print(f"Jaccard(Jack, Jim) = {jaccard_jack_jim:.2f}")

# (Jim, Mary)
jaccard_jim_mary = jaccard_score(jim_encoded, mary_encoded)
print(f"Jaccard(Jim, Mary) = {jaccard_jim_mary:.2f}")

Jaccard(Jack, Mary) = 0.67
Jaccard(Jack, Jim) = 0.33
Jaccard(Jim, Mary) = 0.25
