Given dataframe with the following format:

| Name        | Source | Cluster |
|-------------|--------|---------|
| Ricky Rubio | Club   | 0       |
| Marc Gasol  | Club   | 5       |
| Ricky Rubio | FIBA   | 1       |

Generate confusion matrix mapping players from club to FIBA:

| Club\FIBA | 0 | 1 | 2 |
|-----------|---|---|---|
| 0         | 5 | 2 | 0 |
| 1         | 1 | 8 | 3 |
| 2         | 0 | 4 | 6 |

In [1]:
import pandas as pd

players = (('A', 0, 0), ('B', 0, 1), ('C', 1, 1), ('D', 1, 2), ('E', 2, 0), ('F', 2, 1), ('G', 1, 1))
data = []
for p in players:
    data.append([p[0], 'Club', p[1]])
    data.append([p[0], 'FIBA', p[2]])
data.append(['H', 'Club', 1])
           
df = pd.DataFrame(data, columns=('Name', 'Source', 'Cluster'))
df = df.sample(frac=1).reset_index(drop=True) # Shuffle (ttps://stackoverflow.com/questions/29576430/shuffle-dataframe-rows)
df

Unnamed: 0,Name,Source,Cluster
0,E,Club,2
1,H,Club,1
2,A,FIBA,0
3,C,FIBA,1
4,D,Club,1
5,F,FIBA,1
6,G,Club,1
7,B,Club,0
8,D,FIBA,2
9,F,Club,2


In [2]:
s = df['Name'].value_counts()
names_to_keep = s[s >= 2].index

filt_df = df[df['Name'].isin(names_to_keep)]
filt_df

Unnamed: 0,Name,Source,Cluster
0,E,Club,2
2,A,FIBA,0
3,C,FIBA,1
4,D,Club,1
5,F,FIBA,1
6,G,Club,1
7,B,Club,0
8,D,FIBA,2
9,F,Club,2
10,G,FIBA,1


In [3]:
club_df = filt_df[filt_df['Source'] == 'Club'].sort_values('Name')
fiba_df = filt_df[filt_df['Source'] == 'FIBA'].sort_values('Name')
if False in club_df['Name'].values == fiba_df['Name'].values:
    print("Some names do not match!")
else:
    print("All names match! Ready to create confusion matrix.")

All names match! Ready to create confusion matrix.


In [4]:
from sklearn.metrics import confusion_matrix
cm = pd.DataFrame(confusion_matrix(club_df['Cluster'].values, fiba_df['Cluster'].values))
cm

Unnamed: 0,0,1,2
0,1,1,0
1,0,2,1
2,1,1,0
