# Analysis of validate-mappings results

This takes the result of the validate-mappings command over multiple models. The command
was executed with the predicate hidden, so we asked to evaluate as-if it were an exact match.

The notebook produces files:

- produces pivoted tables that can be browsed in google sheets
- scores each model by how much it agreed with assigned predicates

In [5]:
from utils import load_all, read_mappings

In [6]:
import pandas as pd

## Existing Curated GO Mappings

These were extracted from the edit version of GO. It includes curator calls as
to the relationship between GO terms and RHEA.

Note many are "uncommitted" and left as oio:hasDbXref.

In [7]:
# Read existing curated GO mappings (includes predicates) from SSSOM
mappings = read_mappings()
mappings_df = pd.DataFrame(mappings)
mappings_df = mappings_df[["subject_id", "object_id", "predicate_id"]]
mappings_df

Unnamed: 0,subject_id,object_id,predicate_id
0,GO:0000010,EC:2.5.1.30,oio:hasDbXref
1,GO:0000010,MetaCyc:TRANS-HEXAPRENYLTRANSTRANSFERASE-RXN,oio:hasDbXref
2,GO:0000010,RHEA:27794,oio:hasDbXref
3,GO:0000016,EC:3.2.1.108,oio:hasDbXref
4,GO:0000016,MetaCyc:BETAGALACTOSID-RXN,oio:hasDbXref
...,...,...,...
20375,results_in_transport_along,RO:0002341,oio:hasDbXref
20376,results_in_transport_to_from_or_in,RO:0002344,oio:hasDbXref
20377,starts_during,RO:0002091,oio:hasDbXref
20378,starts_with,RO:0002224,oio:hasDbXref


## Load and process the results of validate-mappings

These must be executed previously. The results are checked in to github.

In [18]:
df = load_all()
# remove column predicate_id, as this was masked and is not the true predicate
df = df.drop(columns=['predicate_id'])
df

Unnamed: 0,subject_id,subject_info,object_id,object_info,problem,info,confidence,suggested_predicate,suggested_modifications,model
0,GO:0000010,Name: trans-hexaprenyltranstransferase activit...,RHEA:27794,"Name: (2E,6E)-farnesyl diphosphate + 4 isopent...",True,The subject is an enzymatic activity while the...,1.0,A skos:relatedMatch or skos:broadMatch predica...,;,claude-3-opus-generic
1,GO:0000016,Name: lactase activity Definition: Catalysis o...,RHEA:10076,Name: H2O + lactose = beta-D-galactose + D-glu...,True,"The SUBJECT is an enzymatic activity, while th...",1.0,skos:closeMatch or skos:relatedMatch may be mo...,;,claude-3-opus-generic
2,GO:0000034,Name: adenine deaminase activity Definition: C...,RHEA:23688,Name: adenine + H(+) + H2O = hypoxanthine + NH...,False,The mapping looks accurate. The subject descri...,1.0,,;,claude-3-opus-generic
3,GO:0000104,Name: succinate dehydrogenase activity Definit...,RHEA:16357,Name: A + succinate = AH2 + fumarate Definitio...,True,The mapping is incorrect. The SUBJECT refers t...,1.0,skos:closeMatch,;,claude-3-opus-generic
4,GO:0000107,Name: imidazoleglycerol-phosphate synthase act...,RHEA:24793,Name: 5-[(5-phospho-1-deoxy-D-ribulos-1-ylimin...,True,The SUBJECT is an enzyme activity while the OB...,1.0,skos:closeMatch,;,claude-3-opus-generic
...,...,...,...,...,...,...,...,...,...,...
17787,GO:0047676,Name: arachidonate-CoA ligase activity Definit...,RHEA:19713,"Name: (5Z,8Z,11Z,14Z)-eicosatetraenoate + ATP ...",False,The SUBJECT 'arachidonate-CoA ligase activity'...,1.0,,;,gpt-4-generic
17788,GO:0047677,Name: arachidonate 8(R)-lipoxygenase activity ...,RHEA:14985,"Name: (5Z,8Z,11Z,14Z)-eicosatetraenoate + O2 =...",False,The mapping between the SUBJECT and OBJECT is ...,1.0,,;,gpt-4-generic
17789,GO:0047678,Name: arginine 2-monooxygenase activity Defini...,RHEA:10548,Name: L-arginine + O2 = 4-guanidinobutanamide ...,False,The SUBJECT's definition exactly describes the...,1.0,,;,gpt-4-generic
17790,GO:0047679,Name: arginine racemase activity Definition: C...,RHEA:18069,Name: L-arginine = D-arginine Definition: None...,False,"The SUBJECT 'arginine racemase activity', defi...",1.0,,;,gpt-4-generic


## Group results by distinct mappings

The results will have the same subj-obj pair multiple times, once for each model.

Group these, and calculate the sum of problems and the average confidence.

In [19]:
df_grouped = df.groupby(['subject_id', 'object_id']).agg(problem_sum = ('problem', 'sum'), confidence_avg=('confidence', 'mean')).reset_index()
df_grouped = df_grouped.merge(df, on=['subject_id', 'object_id'])
df_grouped = df_grouped.merge(mappings_df, on=['subject_id', 'object_id'], how='left')
df_grouped

Unnamed: 0,subject_id,object_id,problem_sum,confidence_avg,subject_info,object_info,problem,info,confidence,suggested_predicate,suggested_modifications,model,predicate_id
0,GO:0000010,RHEA:27794,1,1.0,Name: trans-hexaprenyltranstransferase activit...,"Name: (2E,6E)-farnesyl diphosphate + 4 isopent...",True,The subject is an enzymatic activity while the...,1.0,A skos:relatedMatch or skos:broadMatch predica...,;,claude-3-opus-generic,oio:hasDbXref
1,GO:0000010,RHEA:27794,1,1.0,Name: trans-hexaprenyltranstransferase activit...,"Name: (2E,6E)-farnesyl diphosphate + 4 isopent...",False,The SUBJECT is an enzyme activity that catalyz...,1.0,,;,gpt4full-generic,oio:hasDbXref
2,GO:0000010,RHEA:27794,1,1.0,Name: trans-hexaprenyltranstransferase activit...,"Name: (2E,6E)-farnesyl diphosphate + 4 isopent...",False,"The subject and object are equivalent, they bo...",1.0,The predicate 'skos:exactMatch' is typically u...,;,groq-llama-generic,oio:hasDbXref
3,GO:0000010,RHEA:27794,1,1.0,Name: trans-hexaprenyltranstransferase activit...,"Name: (2E,6E)-farnesyl diphosphate + 4 isopent...",False,The mapping is scientifically accurate. The su...,1.0,,Remove the reactant and product names from the...,mixtral-prompted,oio:hasDbXref
4,GO:0000010,RHEA:27794,1,1.0,Name: trans-hexaprenyltranstransferase activit...,"Name: (2E,6E)-farnesyl diphosphate + 4 isopent...",False,The mapping is accurate and the reaction subst...,1.0,,;,gpt3t-prompted,oio:hasDbXref
...,...,...,...,...,...,...,...,...,...,...,...,...,...
17787,GO:1990699,RHEA:45340,True,1.0,Name: palmitoleyl hydrolase activity Definitio...,Name: [Wnt protein]-O-(9Z)-hexadecenoyl-L-seri...,True,The proposed mapping is incorrect as the enzym...,1.0,,;,gpt3t-prompted,skos:narrowMatch
17788,GO:1990738,RHEA:10944,False,1.0,Name: pseudouridine 5'-phosphatase activity De...,Name: H2O + psi-UMP = phosphate + pseudouridin...,False,The mapping is accurate as the reaction cataly...,1.0,,;,gpt3t-prompted,oio:hasDbXref
17789,GO:1990817,RHEA:11332,False,1.0,Name: poly(A) RNA polymerase activity Definiti...,Name: ATP + RNA(n) = diphosphate + RNA(n)-3'-a...,False,The mapping is accurate based on the provided ...,1.0,,;,gpt3t-prompted,oio:hasDbXref
17790,GO:1990930,RHEA:49516,False,1.0,Name: mRNA N1-methyladenosine dioxygenase acti...,Name: 2-oxoglutarate + an N(1)-methyladenosine...,False,The mapping is accurate with high confidence a...,1.0,,;,gpt3t-prompted,skos:exactMatch


In [20]:
# EXPLORE: How many problems are there?
df_grouped[df_grouped["predicate_id"] != "oio:hasDbXref"]

Unnamed: 0,subject_id,object_id,problem_sum,confidence_avg,subject_info,object_info,problem,info,confidence,suggested_predicate,suggested_modifications,model,predicate_id
78,GO:0000253,RHEA:18409,4,0.85,Name: 3-keto sterol reductase activity Definit...,Name: 4alpha-methyl-5alpha-cholest-7-en-3beta-...,True,The SUBJECT is an enzyme activity defined by a...,1.0,"skos:narrowMatch may be more appropriate, as t...",;,claude-3-opus-generic,skos:narrowMatch
79,GO:0000253,RHEA:18409,4,0.85,Name: 3-keto sterol reductase activity Definit...,Name: 4alpha-methyl-5alpha-cholest-7-en-3beta-...,True,While both SUBJECT and OBJECT refer to catalys...,1.0,skos:closeMatch,None; A more general term describing the activ...,gpt4full-generic,skos:narrowMatch
80,GO:0000253,RHEA:18409,4,0.85,Name: 3-keto sterol reductase activity Definit...,Name: 4alpha-methyl-5alpha-cholest-7-en-3beta-...,True,The subject and object do not have the same me...,0.1,predicate_modifications could be 'catalyzes' o...,3-keto sterol reductase activity; 4alpha-methy...,groq-llama-generic,skos:narrowMatch
81,GO:0000253,RHEA:18409,4,0.85,Name: 3-keto sterol reductase activity Definit...,Name: 4alpha-methyl-5alpha-cholest-7-en-3beta-...,False,The provided mapping between '3-keto sterol re...,1.0,skos:exactMatch,None; None,mixtral-prompted,skos:narrowMatch
82,GO:0000253,RHEA:18409,4,0.85,Name: 3-keto sterol reductase activity Definit...,Name: 4alpha-methyl-5alpha-cholest-7-en-3beta-...,False,The mapping is accurate as the reaction cataly...,1.0,,;,gpt3t-prompted,skos:narrowMatch
...,...,...,...,...,...,...,...,...,...,...,...,...,...
17767,GO:0180009,RHEA:71075,True,1.00,Name: broad specificity neutral L-amino acid:b...,Name: None Definition: None Relationships:,True,The definitions of the SUBJECT and OBJECT do n...,1.0,,;,gpt3t-prompted,skos:narrowMatch
17783,GO:1990412,RHEA:42696,True,1.00,Name: hercynylselenocysteine lyase activity (s...,Name: None Definition: None Relationships:,True,The proposed mapping is incorrect as the enzym...,1.0,,None; None,gpt3t-prompted,
17787,GO:1990699,RHEA:45340,True,1.00,Name: palmitoleyl hydrolase activity Definitio...,Name: [Wnt protein]-O-(9Z)-hexadecenoyl-L-seri...,True,The proposed mapping is incorrect as the enzym...,1.0,,;,gpt3t-prompted,skos:narrowMatch
17790,GO:1990930,RHEA:49516,False,1.00,Name: mRNA N1-methyladenosine dioxygenase acti...,Name: 2-oxoglutarate + an N(1)-methyladenosine...,False,The mapping is accurate with high confidence a...,1.0,,;,gpt3t-prompted,skos:exactMatch


In [21]:
# EXPLORE: how many mappings did multiple models agree there were problems with
df_grouped_filtered = df_grouped[(df_grouped['problem_sum'] > 5) & (df_grouped['confidence_avg'] >= 0.7)]
df_grouped_filtered

Unnamed: 0,subject_id,object_id,problem_sum,confidence_avg,subject_info,object_info,problem,info,confidence,suggested_predicate,suggested_modifications,model,predicate_id
390,GO:0003842,RHEA:24882,6,0.7,Name: 1-pyrroline-5-carboxylate dehydrogenase ...,Name: (S)-1-pyrroline-5-carboxylate + 2 H2O + ...,True,The subject and object are not an exact match....,1.0,skos:closeMatch,;,claude-3-opus-generic,skos:broadMatch
391,GO:0003842,RHEA:24882,6,0.7,Name: 1-pyrroline-5-carboxylate dehydrogenase ...,Name: (S)-1-pyrroline-5-carboxylate + 2 H2O + ...,True,The SUBJECT and OBJECT are not identically mat...,1.0,No modification needed,No modification needed; Change 'NADP(+) = H(+)...,gpt4full-generic,skos:broadMatch
392,GO:0003842,RHEA:24882,6,0.7,Name: 1-pyrroline-5-carboxylate dehydrogenase ...,Name: (S)-1-pyrroline-5-carboxylate + 2 H2O + ...,True,The predicate 'skos:exactMatch' is incorrect a...,0.1,Change 'skos:exactMatch' to 'skos:closeMatch' ...,None; Change '(S)-1-pyrroline-5-carboxylate' t...,groq-llama-generic,skos:broadMatch
393,GO:0003842,RHEA:24882,6,0.7,Name: 1-pyrroline-5-carboxylate dehydrogenase ...,Name: (S)-1-pyrroline-5-carboxylate + 2 H2O + ...,True,The subject and object do not represent the sa...,0.1,Consider changing the predicate to a more appr...,; Consider changing the object to: 1-pyrroline...,mixtral-prompted,skos:broadMatch
394,GO:0003842,RHEA:24882,6,0.7,Name: 1-pyrroline-5-carboxylate dehydrogenase ...,Name: (S)-1-pyrroline-5-carboxylate + 2 H2O + ...,True,The reaction substrates and products do not ma...,1.0,,;,gpt3t-prompted,skos:broadMatch
...,...,...,...,...,...,...,...,...,...,...,...,...,...
11713,GO:0043774,RHEA:42332,6,0.7,Name: coenzyme F420-2 alpha-glutamyl ligase ac...,Name: ATP + L-glutamate + oxidized coenzyme F4...,True,The predicate skos:exactMatch suggests that th...,1.0,,NA; Change ATP to GTP and ADP to GDP in the ob...,gpt4full-generic,oio:hasDbXref
11714,GO:0043774,RHEA:42332,6,0.7,Name: coenzyme F420-2 alpha-glutamyl ligase ac...,Name: ATP + L-glutamate + oxidized coenzyme F4...,True,The SUBJECT and OBJECT do not have an exact ma...,0.1,Change to 'skos:relatedMatch' or a similar pre...,None; Change to 'Formation of coenzyme F420-3 ...,groq-llama-generic,oio:hasDbXref
11715,GO:0043774,RHEA:42332,6,0.7,Name: coenzyme F420-2 alpha-glutamyl ligase ac...,Name: ATP + L-glutamate + oxidized coenzyme F4...,True,The subject and object do not represent the sa...,0.1,,None; Replace ATP with GTP to match the subject,mixtral-prompted,oio:hasDbXref
11716,GO:0043774,RHEA:42332,6,0.7,Name: coenzyme F420-2 alpha-glutamyl ligase ac...,Name: ATP + L-glutamate + oxidized coenzyme F4...,True,The proposed mapping is incorrect. The reactio...,1.0,,;,gpt3t-prompted,oio:hasDbXref


In [24]:
df_grouped[(df_grouped['problem_sum'] > 4) & (df_grouped['predicate_id'] == "skos:exactMatch")]

Unnamed: 0,subject_id,object_id,problem_sum,confidence_avg,subject_info,object_info,problem,info,confidence,suggested_predicate,suggested_modifications,model,predicate_id
3288,GO:0004740,RHEA:23052,5,0.7,Name: pyruvate dehydrogenase (acetyl-transferr...,Name: ATP + L-seryl-[pyruvate dehydrogenase E1...,True,The subject and object do not represent the sa...,1.0,skos:narrowMatch or skos:broadMatch may be mor...,;,claude-3-opus-generic,skos:exactMatch
3289,GO:0004740,RHEA:23052,5,0.7,Name: pyruvate dehydrogenase (acetyl-transferr...,Name: ATP + L-seryl-[pyruvate dehydrogenase E1...,True,The two entities represent different biochemic...,1.0,skos:narrowMatch or skos:closeMatch,N/A; A more general term or reaction involving...,gpt4full-generic,skos:exactMatch
3290,GO:0004740,RHEA:23052,5,0.7,Name: pyruvate dehydrogenase (acetyl-transferr...,Name: ATP + L-seryl-[pyruvate dehydrogenase E1...,True,The SUBJECT and OBJECT do not represent the sa...,0.1,"Consider using a more appropriate predicate, s...",None; Modify the OBJECT to match the SUBJECT's...,groq-llama-generic,skos:exactMatch
3291,GO:0004740,RHEA:23052,5,0.7,Name: pyruvate dehydrogenase (acetyl-transferr...,Name: ATP + L-seryl-[pyruvate dehydrogenase E1...,True,The SUBJECT and OBJECT do not represent the sa...,0.1,skos:closeMatch,; ATP + L-seryl-[pyruvate dehydrogenase E1 alp...,mixtral-prompted,skos:exactMatch
3292,GO:0004740,RHEA:23052,5,0.7,Name: pyruvate dehydrogenase (acetyl-transferr...,Name: ATP + L-seryl-[pyruvate dehydrogenase E1...,False,The mapping is correct as the reaction catalyz...,1.0,,;,gpt3t-prompted,skos:exactMatch
3293,GO:0004740,RHEA:23052,5,0.7,Name: pyruvate dehydrogenase (acetyl-transferr...,Name: ATP + L-seryl-[pyruvate dehydrogenase E1...,True,The SUBJECT describes the enzyme activity of p...,1.0,skos:closeMatch,;,gpt-4-generic,skos:exactMatch
12495,GO:0046932,RHEA:58158,5,0.82,Name: sodium-transporting ATP synthase activit...,Name: None Definition: None Relationships:,True,"The OBJECT is completely missing, so there is ...",1.0,,;,claude-3-opus-generic,skos:exactMatch
12496,GO:0046932,RHEA:58158,5,0.82,Name: sodium-transporting ATP synthase activit...,Name: None Definition: None Relationships:,True,The OBJECT details are missing. There is nothi...,1.0,,; Needs some definitions and relationships att...,gpt4full-generic,skos:exactMatch
12497,GO:0046932,RHEA:58158,5,0.82,Name: sodium-transporting ATP synthase activit...,Name: None Definition: None Relationships:,True,"The subject is described in great detail, whil...",0.1,Change the predicate to a more appropriate one...,None; Provide a name and definition for the ob...,groq-llama-generic,skos:exactMatch
12498,GO:0046932,RHEA:58158,5,0.82,Name: sodium-transporting ATP synthase activit...,Name: None Definition: None Relationships:,True,The proposed mapping between a specific enzyme...,1.0,,;,gpt3t-prompted,skos:exactMatch


In [25]:
# Store results
df_grouped.to_csv('output/unpivoted.tsv', index=False, sep="\t")

## Consensus analysis

Here we calculate whether the model agreed with the curator as to the predicate.False

We first calculate whether the predicate was exact or not (note the hasDbXref is uncommitted to either)

We will then assign a consensus value based on whether the model agreed with the curator as to the predicate.

In [40]:
df_grouped.loc[(df_grouped['predicate_id'] == "skos:exactMatch"), 'exact'] = True
df_grouped.loc[(df_grouped['predicate_id'] != "skos:exactMatch") & (df_grouped['predicate_id'] != "oio:hasDbXref"), 'inexact'] = True

In [42]:
df_grouped.loc[(df_grouped['problem'] == True) & (df_grouped['exact']), 'consensus'] = False
df_grouped.loc[(df_grouped['problem'] == True) & (df_grouped['inexact']), 'consensus'] = True
df_grouped.loc[(df_grouped['problem'] == False) & (df_grouped['exact']), 'consensus'] = True
df_grouped.loc[(df_grouped['problem'] == False) & (df_grouped['inexact']), 'consensus'] = False
df_grouped[df_grouped['consensus'] == False]

Unnamed: 0,subject_id,object_id,problem_sum,confidence_avg,subject_info,object_info,problem,info,confidence,suggested_predicate,suggested_modifications,model,predicate_id,exact,inexact,consensus
81,GO:0000253,RHEA:18409,4,0.850000,Name: 3-keto sterol reductase activity Definit...,Name: 4alpha-methyl-5alpha-cholest-7-en-3beta-...,False,The provided mapping between '3-keto sterol re...,1.0,skos:exactMatch,None; None,mixtral-prompted,skos:narrowMatch,,True,False
82,GO:0000253,RHEA:18409,4,0.850000,Name: 3-keto sterol reductase activity Definit...,Name: 4alpha-methyl-5alpha-cholest-7-en-3beta-...,False,The mapping is accurate as the reaction cataly...,1.0,,;,gpt3t-prompted,skos:narrowMatch,,True,False
228,GO:0001733,RHEA:20613,2,0.833333,Name: galactosylceramide sulfotransferase acti...,Name: 3'-phosphoadenylyl sulfate + a beta-D-ga...,True,The mapping is incorrect. The SUBJECT is an en...,1.0,The appropriate predicate would be something l...,;,claude-3-opus-generic,skos:exactMatch,True,,False
232,GO:0001733,RHEA:20613,2,0.833333,Name: galactosylceramide sulfotransferase acti...,Name: 3'-phosphoadenylyl sulfate + a beta-D-ga...,True,The proposed skos:exactMatch relationship is n...,1.0,,;,gpt3t-prompted,skos:exactMatch,True,,False
236,GO:0001733,RHEA:41744,3,1.000000,Name: galactosylceramide sulfotransferase acti...,Name: 3'-phosphoadenylyl sulfate + a 1-O-alkyl...,False,The SUBJECT and OBJECT are biochemical reactio...,1.0,The predicate skos:exactMatch is correct becau...,;,groq-llama-generic,skos:narrowMatch,,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17724,GO:0140868,RHEA:31399,False,1.000000,"Name: 4,4'-diapophytoene desaturase (4,4'-diap...","Name: all-trans-4,4'-diapophytofluene + FAD + ...",False,The mapping is accurate as both the subject an...,1.0,,;,gpt3t-prompted,skos:narrowMatch,,True,False
17725,GO:0140868,RHEA:31403,False,0.500000,"Name: 4,4'-diapophytoene desaturase (4,4'-diap...","Name: all-trans-4,4'-diapo-zeta-carotene + FAD...",False,The proposed skos:exactMatch mapping is accura...,0.5,,;,gpt3t-prompted,skos:narrowMatch,,True,False
17726,GO:0140868,RHEA:31407,False,1.000000,"Name: 4,4'-diapophytoene desaturase (4,4'-diap...","Name: all-trans-4,4'-diaponeurosporene + FAD +...",False,The mapping is accurate as both the subject an...,1.0,,;,gpt3t-prompted,skos:narrowMatch,,True,False
17731,GO:0140932,RHEA:65388,False,1.000000,Name: 5'-(N(7)-methyl 5'-triphosphoguanosine)-...,Name: a 5'-end (N(7)-methyl 5'-triphosphoguano...,False,The mapping is accurate. The reaction descript...,1.0,,;,gpt3t-prompted,skos:narrowMatch,,True,False


In [43]:
committed = df_grouped[df_grouped['predicate_id'] != "oio:hasDbXref"]
committed.to_csv('output/committed.tsv', index=False, sep="\t")

### Ranking models by score

Here score is the average number of times where the model agreed with the curator as to the predicate.

In [45]:
summary = committed.groupby(['model']).agg(score = ('consensus', 'mean')).reset_index()
summary

Unnamed: 0,model,score
0,claude-3-opus-generic,0.827068
1,gpt-4-generic,0.50365
2,gpt3t-prompted,0.429787
3,gpt4full-generic,0.578231
4,groq-llama-generic,0.691667
5,mixtral-prompted,0.319588


A big win for Claude-3-Opus! This is a new model and many people are saying it excels GPT-4.

However we should look at these results cautiously. Other models may be correctly flagging problematic
mappings which curators marked as exact.

## Create a pivot table

For ease of browsability we will make a pivoted / unmelted table where the each model has a block of columns in the table.

We then place the table [here](https://docs.google.com/spreadsheets/d/1yFD24UT_Bc76mbwAie65KQEqfEVVIeFljrUy0HpHhNI/edit#gid=293565601)

In [26]:
df_grouped_selected = df_grouped[['subject_id', 'predicate_id', 'object_id', 'problem_sum', 'confidence_avg', 'subject_info', 'object_info', 'model', 'info', 'suggested_modifications', 'suggested_predicate']]

In [27]:
df_pivot = df_grouped_selected.pivot_table(index=['subject_id', 'predicate_id', 'object_id', 'problem_sum', 'confidence_avg', 'subject_info', 'object_info'],
                                  columns='model',
                                  aggfunc='first').fillna(0)
df_pivot

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,info,info,info,info,info,info,suggested_modifications,suggested_modifications,suggested_modifications,suggested_modifications,suggested_modifications,suggested_modifications,suggested_predicate,suggested_predicate,suggested_predicate,suggested_predicate,suggested_predicate,suggested_predicate
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,model,claude-3-opus-generic,gpt-4-generic,gpt3t-prompted,gpt4full-generic,groq-llama-generic,mixtral-prompted,claude-3-opus-generic,gpt-4-generic,gpt3t-prompted,gpt4full-generic,groq-llama-generic,mixtral-prompted,claude-3-opus-generic,gpt-4-generic,gpt3t-prompted,gpt4full-generic,groq-llama-generic,mixtral-prompted
subject_id,predicate_id,object_id,problem_sum,confidence_avg,subject_info,object_info,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2
GO:0000010,oio:hasDbXref,RHEA:27794,1,1.000000,"Name: trans-hexaprenyltranstransferase activity Definition: Catalysis of the reaction: (2E,6E)-farnesyl diphosphate + 4 isopentenyl diphosphate = 4 diphosphate + all-trans-heptaprenyl diphosphate. Relationships:","Name: (2E,6E)-farnesyl diphosphate + 4 isopentenyl diphosphate = all-trans-heptaprenyl diphosphate + 4 diphosphate Definition: None Relationships:",The subject is an enzymatic activity while the...,The SUBJECT's name and definition directly des...,The mapping is accurate and the reaction subst...,The SUBJECT is an enzyme activity that catalyz...,"The subject and object are equivalent, they bo...",The mapping is scientifically accurate. The su...,;,;,;,;,;,Remove the reactant and product names from the...,A skos:relatedMatch or skos:broadMatch predica...,,,,The predicate 'skos:exactMatch' is typically u...,0
GO:0000016,oio:hasDbXref,RHEA:10076,2,0.850000,Name: lactase activity Definition: Catalysis of the reaction: lactose + H2O = D-glucose + D-galactose. Relationships:,Name: H2O + lactose = beta-D-galactose + D-glucose Definition: None Relationships:,"The SUBJECT is an enzymatic activity, while th...",The equation H2O + lactose = beta-D-galactose ...,The mapping is correct as the reactions are eq...,The mappings are correct. The lactase activity...,"The subject and object are very similar, but n...",The proposed mapping between 'lactase activity...,;,; Consider adjusting 'beta-D-galactose' to 'D-...,;,;,None; Consider changing to 'Lactase-catalyzed ...,None; Rename 'H2O + lactose = D-galactose + D-...,skos:closeMatch or skos:relatedMatch may be mo...,,,,"Consider changing to skos:closeMatch, as the s...",0
GO:0000034,oio:hasDbXref,RHEA:23688,3,0.683333,Name: adenine deaminase activity Definition: Catalysis of the reaction: adenine + H2O = hypoxanthine + NH3. Relationships:,Name: adenine + H(+) + H2O = hypoxanthine + NH4(+) Definition: None Relationships:,The mapping looks accurate. The subject descri...,The chemical equation given as the object cont...,The mapping is accurate as both the SUBJECT an...,The OBJECT seems to be a representation of the...,Failed to parse JSON,The SUBJECT and OBJECT do not represent the sa...,;,; Modify to 'adenine + H2O = hypoxanthine + NH3',;,"; Change the object to 'adenine deaminase', wh...",;,; Replace NH4(+) with NH3,,,,,0,
GO:0000104,oio:hasDbXref,RHEA:16357,3,0.683333,Name: succinate dehydrogenase activity Definition: Catalysis of the reaction: succinate + acceptor = fumarate + reduced acceptor. Relationships:,Name: A + succinate = AH2 + fumarate Definition: None Relationships:,The mapping is incorrect. The SUBJECT refers t...,The SUBJECT describes an enzymatic activity in...,The SUBJECT represents the enzyme activity of ...,The SUBJECT 'succinate dehydrogenase activity'...,The subject and object do not have an exact ma...,Failed to parse JSON,;,;,;,;,succinate dehydrogenase; the reaction: A + suc...,;,skos:closeMatch,,,,skos:relatedMatch,0
GO:0000107,oio:hasDbXref,RHEA:24793,3,1.000000,Name: imidazoleglycerol-phosphate synthase activity Definition: Catalysis of the reaction: phosphoribulosylformimino-AICAR-P + L-glutamine = D-erythro-imidazole-glycerol-phosphate + aminoimidazole carboxamide ribonucleotide + L-glutamate + 2 H+. Relationships:,Name: 5-[(5-phospho-1-deoxy-D-ribulos-1-ylimino)methylamino]-1-(5-phospho-beta-D-ribosyl)imidazole-4-carboxamide + L-glutamine = 5-amino-1-(5-phospho-beta-D-ribosyl)imidazole-4-carboxamide + D-erythro-1-(imidazol-4-yl)glycerol 3-phosphate + H(+) + L-glutamate Definition: None Relationships:,The SUBJECT is an enzyme activity while the OB...,The SUBJECT and OBJECT descriptions correctly ...,The proposed skos:exactMatch relationship is i...,The SUBJECT and OBJECT are not an exact match....,The subject and object are chemical compounds ...,The given mapping is valid. The subject and ob...,;,;,;,None; None,None; None,None; None,skos:closeMatch,,,Change the predicate to 'is involved in' inste...,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
GO:1990699,skos:narrowMatch,RHEA:45340,1,1.000000,"Name: palmitoleyl hydrolase activity Definition: Catalysis of a hydrolase reaction that removes a palmitoleyl moiety, a 16-carbon monounsaturated fatty acid (C16:1), from some substrate. Relationships:",Name: [Wnt protein]-O-(9Z)-hexadecenoyl-L-serine + H2O = (9Z)-hexadecenoate + [Wnt protein]-L-serine + H(+) Definition: None Relationships:,0,0,The proposed mapping is incorrect as the enzym...,0,0,0,0,0,;,0,0,0,0,0,,0,0,0
GO:1990738,oio:hasDbXref,RHEA:10944,0,1.000000,Name: pseudouridine 5'-phosphatase activity Definition: Catalysis of the reaction: pseudouridine 5'-phosphate + H2O = pseudouridine + phosphate. Relationships:,Name: H2O + psi-UMP = phosphate + pseudouridine Definition: None Relationships:,0,0,The mapping is accurate as the reaction cataly...,0,0,0,0,0,;,0,0,0,0,0,,0,0,0
GO:1990817,oio:hasDbXref,RHEA:11332,0,1.000000,"Name: poly(A) RNA polymerase activity Definition: Catalysis of the reaction: ATP + RNA(n) = diphosphate + RNA(n)-3'-adenine ribonucleotide. The primer may be an RNA or DNA fragment, or oligo(A) bearing a 3'-OH terminal group. Relationships:",Name: ATP + RNA(n) = diphosphate + RNA(n)-3'-adenine ribonucleotide Definition: None Relationships:,0,0,The mapping is accurate based on the provided ...,0,0,0,0,0,;,0,0,0,0,0,,0,0,0
GO:1990930,skos:exactMatch,RHEA:49516,0,1.000000,"Name: mRNA N1-methyladenosine dioxygenase activity Definition: Catalysis of the oxidative demethylation of N1-methyladenosine RNA, with concomitant decarboxylation of 2-oxoglutarate and releases oxidized methyl group on N1-methyladenosine as formaldehyde. Relationships:",Name: 2-oxoglutarate + an N(1)-methyladenosine in mRNA + O2 = an adenosine in mRNA + CO2 + formaldehyde + succinate Definition: None Relationships:,0,0,The mapping is accurate with high confidence a...,0,0,0,0,0,;,0,0,0,0,0,,0,0,0


In [28]:
df_pivot.columns = df_pivot.columns.map(lambda x: '{0[1]}_{0[0]}'.format(x))
#df_pivot.columns = df_pivot.columns.get_level_values(0)+'_'+df_pivot.columns.get_level_values(1)
df_pivot

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,claude-3-opus-generic_info,gpt-4-generic_info,gpt3t-prompted_info,gpt4full-generic_info,groq-llama-generic_info,mixtral-prompted_info,claude-3-opus-generic_suggested_modifications,gpt-4-generic_suggested_modifications,gpt3t-prompted_suggested_modifications,gpt4full-generic_suggested_modifications,groq-llama-generic_suggested_modifications,mixtral-prompted_suggested_modifications,claude-3-opus-generic_suggested_predicate,gpt-4-generic_suggested_predicate,gpt3t-prompted_suggested_predicate,gpt4full-generic_suggested_predicate,groq-llama-generic_suggested_predicate,mixtral-prompted_suggested_predicate
subject_id,predicate_id,object_id,problem_sum,confidence_avg,subject_info,object_info,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
GO:0000010,oio:hasDbXref,RHEA:27794,1,1.000000,"Name: trans-hexaprenyltranstransferase activity Definition: Catalysis of the reaction: (2E,6E)-farnesyl diphosphate + 4 isopentenyl diphosphate = 4 diphosphate + all-trans-heptaprenyl diphosphate. Relationships:","Name: (2E,6E)-farnesyl diphosphate + 4 isopentenyl diphosphate = all-trans-heptaprenyl diphosphate + 4 diphosphate Definition: None Relationships:",The subject is an enzymatic activity while the...,The SUBJECT's name and definition directly des...,The mapping is accurate and the reaction subst...,The SUBJECT is an enzyme activity that catalyz...,"The subject and object are equivalent, they bo...",The mapping is scientifically accurate. The su...,;,;,;,;,;,Remove the reactant and product names from the...,A skos:relatedMatch or skos:broadMatch predica...,,,,The predicate 'skos:exactMatch' is typically u...,0
GO:0000016,oio:hasDbXref,RHEA:10076,2,0.850000,Name: lactase activity Definition: Catalysis of the reaction: lactose + H2O = D-glucose + D-galactose. Relationships:,Name: H2O + lactose = beta-D-galactose + D-glucose Definition: None Relationships:,"The SUBJECT is an enzymatic activity, while th...",The equation H2O + lactose = beta-D-galactose ...,The mapping is correct as the reactions are eq...,The mappings are correct. The lactase activity...,"The subject and object are very similar, but n...",The proposed mapping between 'lactase activity...,;,; Consider adjusting 'beta-D-galactose' to 'D-...,;,;,None; Consider changing to 'Lactase-catalyzed ...,None; Rename 'H2O + lactose = D-galactose + D-...,skos:closeMatch or skos:relatedMatch may be mo...,,,,"Consider changing to skos:closeMatch, as the s...",0
GO:0000034,oio:hasDbXref,RHEA:23688,3,0.683333,Name: adenine deaminase activity Definition: Catalysis of the reaction: adenine + H2O = hypoxanthine + NH3. Relationships:,Name: adenine + H(+) + H2O = hypoxanthine + NH4(+) Definition: None Relationships:,The mapping looks accurate. The subject descri...,The chemical equation given as the object cont...,The mapping is accurate as both the SUBJECT an...,The OBJECT seems to be a representation of the...,Failed to parse JSON,The SUBJECT and OBJECT do not represent the sa...,;,; Modify to 'adenine + H2O = hypoxanthine + NH3',;,"; Change the object to 'adenine deaminase', wh...",;,; Replace NH4(+) with NH3,,,,,0,
GO:0000104,oio:hasDbXref,RHEA:16357,3,0.683333,Name: succinate dehydrogenase activity Definition: Catalysis of the reaction: succinate + acceptor = fumarate + reduced acceptor. Relationships:,Name: A + succinate = AH2 + fumarate Definition: None Relationships:,The mapping is incorrect. The SUBJECT refers t...,The SUBJECT describes an enzymatic activity in...,The SUBJECT represents the enzyme activity of ...,The SUBJECT 'succinate dehydrogenase activity'...,The subject and object do not have an exact ma...,Failed to parse JSON,;,;,;,;,succinate dehydrogenase; the reaction: A + suc...,;,skos:closeMatch,,,,skos:relatedMatch,0
GO:0000107,oio:hasDbXref,RHEA:24793,3,1.000000,Name: imidazoleglycerol-phosphate synthase activity Definition: Catalysis of the reaction: phosphoribulosylformimino-AICAR-P + L-glutamine = D-erythro-imidazole-glycerol-phosphate + aminoimidazole carboxamide ribonucleotide + L-glutamate + 2 H+. Relationships:,Name: 5-[(5-phospho-1-deoxy-D-ribulos-1-ylimino)methylamino]-1-(5-phospho-beta-D-ribosyl)imidazole-4-carboxamide + L-glutamine = 5-amino-1-(5-phospho-beta-D-ribosyl)imidazole-4-carboxamide + D-erythro-1-(imidazol-4-yl)glycerol 3-phosphate + H(+) + L-glutamate Definition: None Relationships:,The SUBJECT is an enzyme activity while the OB...,The SUBJECT and OBJECT descriptions correctly ...,The proposed skos:exactMatch relationship is i...,The SUBJECT and OBJECT are not an exact match....,The subject and object are chemical compounds ...,The given mapping is valid. The subject and ob...,;,;,;,None; None,None; None,None; None,skos:closeMatch,,,Change the predicate to 'is involved in' inste...,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
GO:1990699,skos:narrowMatch,RHEA:45340,1,1.000000,"Name: palmitoleyl hydrolase activity Definition: Catalysis of a hydrolase reaction that removes a palmitoleyl moiety, a 16-carbon monounsaturated fatty acid (C16:1), from some substrate. Relationships:",Name: [Wnt protein]-O-(9Z)-hexadecenoyl-L-serine + H2O = (9Z)-hexadecenoate + [Wnt protein]-L-serine + H(+) Definition: None Relationships:,0,0,The proposed mapping is incorrect as the enzym...,0,0,0,0,0,;,0,0,0,0,0,,0,0,0
GO:1990738,oio:hasDbXref,RHEA:10944,0,1.000000,Name: pseudouridine 5'-phosphatase activity Definition: Catalysis of the reaction: pseudouridine 5'-phosphate + H2O = pseudouridine + phosphate. Relationships:,Name: H2O + psi-UMP = phosphate + pseudouridine Definition: None Relationships:,0,0,The mapping is accurate as the reaction cataly...,0,0,0,0,0,;,0,0,0,0,0,,0,0,0
GO:1990817,oio:hasDbXref,RHEA:11332,0,1.000000,"Name: poly(A) RNA polymerase activity Definition: Catalysis of the reaction: ATP + RNA(n) = diphosphate + RNA(n)-3'-adenine ribonucleotide. The primer may be an RNA or DNA fragment, or oligo(A) bearing a 3'-OH terminal group. Relationships:",Name: ATP + RNA(n) = diphosphate + RNA(n)-3'-adenine ribonucleotide Definition: None Relationships:,0,0,The mapping is accurate based on the provided ...,0,0,0,0,0,;,0,0,0,0,0,,0,0,0
GO:1990930,skos:exactMatch,RHEA:49516,0,1.000000,"Name: mRNA N1-methyladenosine dioxygenase activity Definition: Catalysis of the oxidative demethylation of N1-methyladenosine RNA, with concomitant decarboxylation of 2-oxoglutarate and releases oxidized methyl group on N1-methyladenosine as formaldehyde. Relationships:",Name: 2-oxoglutarate + an N(1)-methyladenosine in mRNA + O2 = an adenosine in mRNA + CO2 + formaldehyde + succinate Definition: None Relationships:,0,0,The mapping is accurate with high confidence a...,0,0,0,0,0,;,0,0,0,0,0,,0,0,0


In [29]:
df_pivot.reset_index(col_level=1, inplace=True)
df_pivot

Unnamed: 0,subject_id,predicate_id,object_id,problem_sum,confidence_avg,subject_info,object_info,claude-3-opus-generic_info,gpt-4-generic_info,gpt3t-prompted_info,...,gpt3t-prompted_suggested_modifications,gpt4full-generic_suggested_modifications,groq-llama-generic_suggested_modifications,mixtral-prompted_suggested_modifications,claude-3-opus-generic_suggested_predicate,gpt-4-generic_suggested_predicate,gpt3t-prompted_suggested_predicate,gpt4full-generic_suggested_predicate,groq-llama-generic_suggested_predicate,mixtral-prompted_suggested_predicate
0,GO:0000010,oio:hasDbXref,RHEA:27794,1,1.000000,Name: trans-hexaprenyltranstransferase activit...,"Name: (2E,6E)-farnesyl diphosphate + 4 isopent...",The subject is an enzymatic activity while the...,The SUBJECT's name and definition directly des...,The mapping is accurate and the reaction subst...,...,;,;,;,Remove the reactant and product names from the...,A skos:relatedMatch or skos:broadMatch predica...,,,,The predicate 'skos:exactMatch' is typically u...,0
1,GO:0000016,oio:hasDbXref,RHEA:10076,2,0.850000,Name: lactase activity Definition: Catalysis o...,Name: H2O + lactose = beta-D-galactose + D-glu...,"The SUBJECT is an enzymatic activity, while th...",The equation H2O + lactose = beta-D-galactose ...,The mapping is correct as the reactions are eq...,...,;,;,None; Consider changing to 'Lactase-catalyzed ...,None; Rename 'H2O + lactose = D-galactose + D-...,skos:closeMatch or skos:relatedMatch may be mo...,,,,"Consider changing to skos:closeMatch, as the s...",0
2,GO:0000034,oio:hasDbXref,RHEA:23688,3,0.683333,Name: adenine deaminase activity Definition: C...,Name: adenine + H(+) + H2O = hypoxanthine + NH...,The mapping looks accurate. The subject descri...,The chemical equation given as the object cont...,The mapping is accurate as both the SUBJECT an...,...,;,"; Change the object to 'adenine deaminase', wh...",;,; Replace NH4(+) with NH3,,,,,0,
3,GO:0000104,oio:hasDbXref,RHEA:16357,3,0.683333,Name: succinate dehydrogenase activity Definit...,Name: A + succinate = AH2 + fumarate Definitio...,The mapping is incorrect. The SUBJECT refers t...,The SUBJECT describes an enzymatic activity in...,The SUBJECT represents the enzyme activity of ...,...,;,;,succinate dehydrogenase; the reaction: A + suc...,;,skos:closeMatch,,,,skos:relatedMatch,0
4,GO:0000107,oio:hasDbXref,RHEA:24793,3,1.000000,Name: imidazoleglycerol-phosphate synthase act...,Name: 5-[(5-phospho-1-deoxy-D-ribulos-1-ylimin...,The SUBJECT is an enzyme activity while the OB...,The SUBJECT and OBJECT descriptions correctly ...,The proposed skos:exactMatch relationship is i...,...,;,None; None,None; None,None; None,skos:closeMatch,,,Change the predicate to 'is involved in' inste...,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4460,GO:1990699,skos:narrowMatch,RHEA:45340,1,1.000000,Name: palmitoleyl hydrolase activity Definitio...,Name: [Wnt protein]-O-(9Z)-hexadecenoyl-L-seri...,0,0,The proposed mapping is incorrect as the enzym...,...,;,0,0,0,0,0,,0,0,0
4461,GO:1990738,oio:hasDbXref,RHEA:10944,0,1.000000,Name: pseudouridine 5'-phosphatase activity De...,Name: H2O + psi-UMP = phosphate + pseudouridin...,0,0,The mapping is accurate as the reaction cataly...,...,;,0,0,0,0,0,,0,0,0
4462,GO:1990817,oio:hasDbXref,RHEA:11332,0,1.000000,Name: poly(A) RNA polymerase activity Definiti...,Name: ATP + RNA(n) = diphosphate + RNA(n)-3'-a...,0,0,The mapping is accurate based on the provided ...,...,;,0,0,0,0,0,,0,0,0
4463,GO:1990930,skos:exactMatch,RHEA:49516,0,1.000000,Name: mRNA N1-methyladenosine dioxygenase acti...,Name: 2-oxoglutarate + an N(1)-methyladenosine...,0,0,The mapping is accurate with high confidence a...,...,;,0,0,0,0,0,,0,0,0


In [30]:
df_pivot.to_csv('output/pivoted.tsv', index=False, sep="\t")

In [20]:
df_pivot[df_pivot['problem_sum'] > 5]

Unnamed: 0,subject_id,object_id,problem_sum,confidence_avg,subject_info,object_info,claude-3-opus-generic_info,gpt-4-generic_info,gpt3t-prompted_info,gpt4full-generic_info,...,gpt3t-prompted_suggested_modifications,gpt4full-generic_suggested_modifications,groq-llama-generic_suggested_modifications,mixtral-prompted_suggested_modifications,claude-3-opus-generic_suggested_predicate,gpt-4-generic_suggested_predicate,gpt3t-prompted_suggested_predicate,gpt4full-generic_suggested_predicate,groq-llama-generic_suggested_predicate,mixtral-prompted_suggested_predicate
0,GO:0003842,RHEA:24882,6,0.7,Name: 1-pyrroline-5-carboxylate dehydrogenase ...,Name: (S)-1-pyrroline-5-carboxylate + 2 H2O + ...,The subject and object are not an exact match....,The chemical reactions and cofactors do not ma...,The reaction substrates and products do not ma...,The SUBJECT and OBJECT are not identically mat...,...,;,No modification needed; Change 'NADP(+) = H(+)...,None; Change '(S)-1-pyrroline-5-carboxylate' t...,; Consider changing the object to: 1-pyrroline...,skos:closeMatch,,,No modification needed,Change 'skos:exactMatch' to 'skos:closeMatch' ...,Consider changing the predicate to a more appr...
1,GO:0003848,RHEA:11412,6,0.7,Name: 2-amino-4-hydroxy-6-hydroxymethyldihydro...,"Name: 6-hydroxymethyl-7,8-dihydropterin + ATP ...",The SUBJECT and OBJECT are not an exact match....,The SUBJECT and OBJECT describe similar but no...,The SUBJECT and OBJECT do not have identical r...,"The subject and object are close, but not exac...",...,;,; The name of the object should match its reac...,2-amino-4-hydroxy-6-hydroxymethyldihydropterid...,None; Add 2-amino-4-hydroxy substituents to th...,skos:closeMatch may be more appropriate than s...,,,skos:closeMatch,skos:narrowerMatch,Change to skos:closeMatch or a similar predica...
2,GO:0004051,RHEA:32307,6,0.7,Name: arachidonate 5-lipoxygenase activity Def...,"Name: (5Z,8Z,11Z,14Z)-eicosatetraenoate + O2 =...",The SUBJECT describes the catalysis of a react...,The SUBJECT refers to the enzyme activity (ara...,The proposed mapping is not accurate as the re...,The SUBJECT and OBJECT descriptions refer to d...,...,;,No modifications needed; Update with correct O...,"arachidonate 5-lipoxygenase; (5Z,8Z,11Z,14Z)-e...",None; Change the name to 'arachidonate + O2 = ...,A more appropriate predicate could be skos:rel...,skos:closeMatch,,Change predicate to 'skos:closeMatch' or 'skos...,skos:relatedMatch,
3,GO:0004164,RHEA:36415,6,0.7,Name: diphthine synthase activity Definition: ...,Name: 2-[(3S)-amino-3-carboxypropyl]-L-histidy...,The subject is an enzymatic activity while the...,The SUBJECT describes an enzyme activity (diph...,The proposed mapping between the SUBJECT and O...,The definition of the SUBJECT and the name of ...,...,;,no modifications; consider a different OBJECT ...,None; None,None; None,skos:closeMatch,,,consider a looser predicate such as 'related' ...,The predicate 'skos:exactMatch' is not appropr...,
4,GO:0004416,RHEA:25245,6,0.7,Name: hydroxyacylglutathione hydrolase activit...,Name: (R)-S-lactoylglutathione + H2O = (R)-lac...,The subject is an enzyme activity while the ob...,The SUBJECT describes the enzymatic activity i...,The SUBJECT and OBJECT have similar reactions ...,The SUBJECT and OBJECT aren't exact matches. W...,...,;,None; Include the name of the enzyme catalyzin...,hydroxyacylglutathione hydrolase activity; R-S...,None; Change the reactants and products to mat...,A more appropriate relationship may be 'cataly...,skos:closeMatch,,Change skos:exactMatch to skos:closeMatch,skos:relatedMatch,skos:closeMatch instead of skos:exactMatch
5,GO:0004453,RHEA:16393,6,0.7,Name: juvenile-hormone esterase activity Defin...,Name: H2O + juvenile hormone II = H(+) + juven...,The subject refers to the enzymatic activity o...,The SUBJECT describes an enzyme activity (juve...,The proposed SKOS:exactMatch relationship is i...,Even though both SUBJECT and OBJECT involve re...,...,;,;,None; None,None; Modify the OBJECT to explicitly state th...,skos:closeMatch,Change to 'relatedTo' or 'hasOutput',,skos:closeMatch,skos:exactMatch -> skos:relatedMatch,The PREDICATE skos:exactMatch is not appropria...
6,GO:0004741,RHEA:12669,6,0.7,Name: [pyruvate dehydrogenase (lipoamide)] pho...,Name: H2O + O-phospho-L-seryl-[pyruvate dehydr...,The subject refers to the enzyme activity that...,The subject describes the phosphatase activity...,The proposed mapping between 'pyruvate dehydro...,The SUBJECT and OBJECT are not identical. The ...,...,;,;,None; Modify the OBJECT to describe a reaction...,None; None,skos:closeMatch,,,skos:closeMatch,Modify the predicate to a more appropriate one...,"Replace with a more suitable predicate, such a..."
7,GO:0008353,RHEA:10216,6,0.7,Name: RNA polymerase II CTD heptapeptide repea...,Name: [DNA-directed RNA polymerase] + ATP = AD...,The subject is more specific than the object. ...,The SUBJECT describes an enzyme activity speci...,The proposed exact match between 'RNA polymera...,The SUBJECT and OBJECT are not exact matches a...,...,;,; Should specify the RNA polymerase II large s...,None; Change to 'Phosphorylated RNA polymerase...,; Name: [DNA-directed RNA polymerase] + ATP = ...,skos:closeMatch,,,Replace skos:exactMatch with a more appropriat...,Change to 'skos:related' or 'rdfs:seeAlso' to ...,skos:closeMatch
8,GO:0008422,RHEA:69655,6,0.766667,Name: beta-glucosidase activity Definition: Ca...,Name: H2O + quercetin 3-O-beta-D-glucoside = b...,The subject refers to the general enzymatic ac...,"The SUBJECT describes an enzymatic activity, s...",The mapping is incorrect. The SUBJECT involves...,The SUBJECT and OBJECT are not exact matches. ...,...,;,;,None; Consider changing the object to 'Beta-D-...,None; None,skos:narrowMatch,change to 'relatedMatch' or specify the relati...,,Replace skos:exactMatch with skos:closeMatch,Consider changing the predicate to 'dc:describ...,The predicate skos:closeMatch might be more ap...
9,GO:0008824,RHEA:11120,6,0.7,Name: cyanate hydratase activity Definition: C...,Name: cyanate + 3 H(+) + hydrogencarbonate = 2...,The subject is an enzyme activity while the ob...,The SUBJECT refers to the enzymatic activity t...,The proposed exact match between 'cyanate hydr...,The subject and object do not match exactly as...,...,;,; Should refer to the same chemical reaction a...,None; Change to 'Chemical Reaction: cyanate + ...,None; Modify the object to represent the same ...,skos:closeMatch,,,,Consider changing to 'dct:isVersionOf' or 'dct...,0


In [42]:
df_pivot[df_pivot['problem_sum'] > 4]

model,index,subject_id,object_id,problem_sum,confidence_avg,subject_info,object_info,claude-3-opus-generic,gpt-4-generic,gpt3t-prompted,...,gpt3t-prompted.1,gpt4full-generic,groq-llama-generic,mixtral-prompted,claude-3-opus-generic.1,gpt-4-generic.1,gpt3t-prompted.2,gpt4full-generic.1,groq-llama-generic.1,mixtral-prompted.1
14,14,GO:0000254,RHEA:55220,5,0.683333,Name: C-4 methylsterol oxidase activity\nDefin...,"Name: 4,4-dimethyl-5alpha-cholest-7-en-3beta-o...",1.0,1.0,1.0,...,;,"; 4,4-dimethyl-5-alpha-cholesta-8,24-dien-3-be...",None; None,;,A more appropriate predicate may be 'catalyzes...,,,,skos:closeMatch or a similar predicate would b...,0
23,23,GO:0000514,RHEA:70967,5,0.85,"Name: 3-sulfino-L-alanine: proton, glutamate a...",Name: None\nDefinition: None\nRelationships:,1.0,1.0,1.0,...,;,;,Revise the subject to be a clear and defined e...,None; None,,,,,,
37,37,GO:0001730,RHEA:34407,5,0.616667,Name: 2'-5'-oligoadenylate synthetase activity...,Name: 3 ATP = 5'-triphosphoadenylyl-(2'->5')-a...,0.5,1.0,1.0,...,;,; Define the process catalyzed by 2'-5'-oligoa...,2'-5'-oligoadenylate synthetase activity; 5'-t...,None; Change to '2 ATP = pppA(2'p5'A)n oligome...,skos:closeMatch,,,replace skos:exactMatch with a predicate denot...,skos:narrowMatch,skos:closeMatch
61,61,GO:0003838,RHEA:21128,5,0.85,Name: sterol 24-C-methyltransferase activity\n...,Name: S-adenosyl-L-methionine + zymosterol = f...,1.0,1.0,1.0,...,;,None; Repurpose as a chemical reaction related...,None; Modify to a general description of a che...,;,skos:relatedMatch,,,skos:closeMatch,Change to 'skos:relatedMatch' or 'skos:broadMa...,None needed as skos:exactMatch is appropriate ...
65,65,GO:0003842,RHEA:24882,6,0.7,Name: 1-pyrroline-5-carboxylate dehydrogenase ...,Name: (S)-1-pyrroline-5-carboxylate + 2 H2O + ...,1.0,1.0,1.0,...,;,No modification needed; Change 'NADP(+) = H(+)...,None; Change '(S)-1-pyrroline-5-carboxylate' t...,; Consider changing the object to: 1-pyrroline...,skos:closeMatch,,,No modification needed,Change 'skos:exactMatch' to 'skos:closeMatch' ...,Consider changing the predicate to a more appr...
66,66,GO:0003842,RHEA:24943,5,0.683333,Name: 1-pyrroline-5-carboxylate dehydrogenase ...,"Name: (3R,5S)-1-pyrroline-3-hydroxy-5-carboxyl...",1.0,1.0,1.0,...,;,None; None,None; Modify the object to better match the su...,;,skos:relatedMatch,skos:closeMatch,,Replace skos:exactMatch with skos:closeMatch,Change the predicate to 'skos:relatedMatch' or...,0
67,67,GO:0003842,RHEA:28234,5,0.85,Name: 1-pyrroline-5-carboxylate dehydrogenase ...,Name: L-glutamate 5-semialdehyde = (S)-1-pyrro...,1.0,1.0,1.0,...,;,; Change OBJECT to match the L-glutamate trans...,1-pyrroline-5-carboxylate dehydrogenase activi...,; Remove the equals sign from the beginning of...,skos:relatedMatch,skos:closeMatch,,Change to skos:closeMatch,skos:relatedMatch,
74,74,GO:0003848,RHEA:11412,6,0.7,Name: 2-amino-4-hydroxy-6-hydroxymethyldihydro...,"Name: 6-hydroxymethyl-7,8-dihydropterin + ATP ...",1.0,1.0,1.0,...,;,; The name of the object should match its reac...,2-amino-4-hydroxy-6-hydroxymethyldihydropterid...,None; Add 2-amino-4-hydroxy substituents to th...,skos:closeMatch may be more appropriate than s...,,,skos:closeMatch,skos:narrowerMatch,Change to skos:closeMatch or a similar predica...
77,77,GO:0003851,RHEA:10856,5,0.85,Name: 2-hydroxyacylsphingosine 1-beta-galactos...,Name: a N-(2-hydroxyacyl)sphing-4-enine + UDP-...,1.0,1.0,1.0,...,;,No modifications suggested for SUBJECT; Change...,2-hydroxyacylsphingosine 1-beta-galactosyltran...,None; None,skos:closeMatch,Change 'skos:exactMatch' to a less exact relat...,,Change skos:exactMatch to a predicate that mea...,skos:relatedMatch,0
117,117,GO:0003935,RHEA:23704,5,0.7,Name: GTP cyclohydrolase II activity\nDefiniti...,"Name: GTP + 4 H2O = 2,5-diamino-6-hydroxy-4-(5...",1.0,1.0,1.0,...,;,None; Change '4 H2O' to '3 H2O' in the object ...,None; Change 4 H2O to 3 H2O in the OBJECT's re...,"None; GTP + 4 H2O = 2,5-diamino-6-hydroxy-4-(5...",skos:closeMatch,,,,,0


In [46]:
import pandas as pd
def show(df: pd.DataFrame):
    """
    Render

    :param df:
    :param n:
    :return:
    """
    for i, row in df.iterrows():
        print(f"# Pair: {row['subject_id']} - {row['object_id']}")
        print("\nSUBJECT:\n")
        print(row['subject_info'])
        print("\nOBJECT:\n")
        print(row['object_info'])
        print(f"\nPROBLEMS: {row['problem_sum']} (confidence: {row['confidence_avg']})")
        for k, v in row.items():
            if k not in ['subject_id', 'object_id', 'subject_info', 'object_info', 'problem_sum', 'confidence_avg']:
                print(f"* {k}: {v}")
        
show(df_pivot[df_pivot['problem_sum'] > 5])

# Pair: GO:0003842 - RHEA:24882

SUBJECT:

Name: 1-pyrroline-5-carboxylate dehydrogenase activity
Definition: H2O + L-glutamate 5-semialdehyde + NAD+ = 2 H+ + L-glutamate + NADH.
Relationships:

OBJECT:

Name: (S)-1-pyrroline-5-carboxylate + 2 H2O + NADP(+) = H(+) + L-glutamate + NADPH
Definition: None
Relationships:

PROBLEMS: 6 (confidence: 0.7000000000000001)
* index: 65
* claude-3-opus-generic: 1.0
* gpt-4-generic: 1.0
* gpt3t-prompted: 1.0
* gpt4full-generic: 1.0
* groq-llama-generic: 0.1
* mixtral-prompted: 0.1
* claude-3-opus-generic: The subject and object are not an exact match. The subject describes the activity of the enzyme 1-pyrroline-5-carboxylate dehydrogenase, which catalyzes a reaction involving L-glutamate 5-semialdehyde, NAD+, L-glutamate, and NADH. The object, on the other hand, describes a chemical reaction involving (S)-1-pyrroline-5-carboxylate, H2O, NADP+, H+, L-glutamate, and NADPH. While the reactions are related, they differ in substrates and products.
* gpt-