# Analyze Prediction Results
This notebook takes the output of `predict-new-glasses` and determines a few representative predictions

In [1]:
import pandas as pd

## Load in the Predictions
`new_glasses.csv` contains the compositions and the certainty of the ML prediction

In [2]:
data = pd.read_csv('new_glasses.csv').iloc[:,:-1] # Last column is a bug

Compute the compositions (as string)

In [3]:
elems = data.columns[:-4]

In [4]:
data['composition'] = data[elems].apply(lambda x: ''.join('%s%d'%(e,x*100) for e,x in x.items() if x > 0), axis=1)

Compute the system

In [5]:
data['system'] = data[elems].apply(lambda x: '-'.join(sorted(e for e,x in x.items() if x > 0)), axis=1)

In [6]:
data.head(5)

Unnamed: 0,B,Mg,Al,Si,P,Ca,Sc,Ti,V,Cr,...,Pt,Au,Pb,U,gfa_measured,gfa_predicted,P(gfa=AM)_measured,P(gfa=AM)_predicted,composition,system
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,,0.0,,0.965,Ge26Hf74,Ge-Hf
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,,0.0,,0.955,Ge26Gd74,Gd-Ge
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,,0.0,,0.96,Ge28Hf72,Ge-Hf
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,,0.0,,0.955,Ge28Gd72,Gd-Ge
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,,0.0,,0.955,Ge30Hf70,Ge-Hf


## Determine the Top Picks
Get the entries with the highest certainity, and only pick one per system

In [8]:
top_alloys = data.sort_values('P(gfa=AM)_predicted', ascending=False).drop_duplicates('system', 'first')

Print all of the top alloys

In [9]:
top_alloys.head(8)[['composition', 'system', 'P(gfa=AM)_predicted']]

Unnamed: 0,composition,system,P(gfa=AM)_predicted
11260,Cr34Ni38Hf28,Cr-Hf-Ni,1.0
56755,Fe28Ni20Hf52,Fe-Hf-Ni,1.0
51362,Co32Ni34Hf34,Co-Hf-Ni,1.0
49972,Co40Zr24Rh36,Co-Rh-Zr,1.0
60526,Co54Zr24Nb22,Co-Nb-Zr,1.0
59746,Fe24Co28Hf48,Co-Fe-Hf,1.0
61692,Si34Y22Hf44,Hf-Si-Y,1.0
52599,Ni32Zr28Hf40,Hf-Ni-Zr,1.0


Most of the top 8 contain Hf. To give some more variety, let's filter out the Hf entries

In [10]:
top_alloys.query('Hf == 0').head(8)[['composition', 'system', 'P(gfa=AM)_predicted']]

Unnamed: 0,composition,system,P(gfa=AM)_predicted
49972,Co40Zr24Rh36,Co-Rh-Zr,1.0
60526,Co54Zr24Nb22,Co-Nb-Zr,1.0
54069,Ni44Ta26W30,Ni-Ta-W,1.0
21039,Fe26Co30Zr44,Co-Fe-Zr,1.0
3720,B20Fe54Sm26,B-Fe-Sm,1.0
61597,Co40Zr22Ru38,Co-Ru-Zr,1.0
51215,Cr28Co38Zr34,Co-Cr-Zr,1.0
8091,Ni52Zr6W42,Ni-W-Zr,0.995


Most of these alloys are different than those reported in the paper, which can be attributed to the randomness inherent in decision trees and differences in versions between the version of Magpie in the SI and the one used originally in the paper.