# Full Text Search + Extraction, pt ii

In this notebook I will reconnect FTS with the GPT extraction function, and evaluate the whole process across single and multi group studies.

To improve selection of the correct chunk, we'll use the following logic:

- Attempt to use heuristic to narrow down chunk, and search across entire document as fall back
- Feed one section to GPT, and if nothing is return, try next nearest section
- Evaluate using various output "cleaning rules"
- Try using GPT4 as well to see if there's improvements

In [1]:
import pandas as pd
import numpy as np
from labelrepo.projects.participant_demographics import get_participant_demographics

# Load annotations
subgroups = get_participant_demographics(include_locations=True)

# Load multi group as well
jerome_pd = subgroups[(subgroups.project_name == 'participant_demographics') & \
                      (subgroups.annotator_name == 'Jerome_Dockes')]

# Subset annotation df to only include studies with body annotations
subset_cols = ['count', 'diagnosis', 'group_name', 'subgroup_name', 'male count',
       'female count', 'age mean', 'age minimum', 'age maximum',
       'age median', 'pmcid']
jerome_pd_subset = jerome_pd[subset_cols].sort_values('pmcid')

## Search and extract across all documents

In [2]:
import openai
import pickle

openai.api_key = open('/home/zorro/.keys/open_ai.key').read().strip()
all_embeddings = pickle.load(open('data/all_embeddings.pkl', 'rb'))

In [3]:
from extract import search_extract
from templates import ZERO_SHOT_MULTI_GROUP
from evaluate import evaluate_predictions, clean_predictions

query = 'How many participants or subjects were recruited for this study?' 

### No heuristic

In [4]:
# predictions_full_search = search_extract(all_embeddings, query, **ZERO_SHOT_MULTI_GROUP, num_workers=3)
# predictions_full_search.to_csv('data/predictions_full_search.csv', index=False)
predictions_full_search = pd.read_csv('data/predictions_full_search.csv')
# Clean predictions
clean_preds = clean_predictions(predictions_full_search)

In [5]:
ix_corr_n_groups, ix_more_groups, ix_less_groups = evaluate_predictions(clean_preds, jerome_pd_subset)

Exact match # of groups: 0.84
 More groups predicted: 0.09
 Less groups predicted: 0.07
 Missing pmcids: set()

Column wise comparison of predictions and annotations (error):

{'age maximum': 0.31,
 'age mean': 0.32,
 'age median': 0.21,
 'age minimum': 0.32,
 'count': 0.31,
 'diagnosis': 0.45,
 'female count': 0.37,
 'group_name': 0.21,
 'male count': 0.33,
 'subgroup_name': 0.98}

Percentage response given by pmcid:

{'age maximum': 0.7,
 'age mean': 0.75,
 'age median': 0.04,
 'age minimum': 0.7,
 'count': 1.0,
 'female count': 0.73,
 'male count': 0.78}

Summed Mean percentage error:

{'age maximum': 0.0,
 'age mean': 0.14,
 'age median': 0.0,
 'age minimum': 0.0,
 'count': 0.13,
 'female count': 0.06,
 'male count': 0.1}

Averaged Mean percentage error:

{'age maximum': 0.06,
 'age mean': 0.18,
 'age median': 0.0,
 'age minimum': 0.08,
 'count': 0.17,
 'female count': 0.09,
 'male count': 0.14}


### Heuristic - methods
uses heuristic to find methods section hen full text search using query

In [7]:
predictions_methods_fts = search_extract(all_embeddings, query, heuristic_strategy='methods', **ZERO_SHOT_MULTI_GROUP, num_workers=3)
predictions_methods_fts.to_csv('data/predictions_methods_fts.csv', index=False)
# predictions_methods_fts = pd.read_csv('data/predictions_methods_fts.csv')
# Clean predictions
predictions_methods_fts_clean = clean_predictions(predictions_methods_fts)

100%|█████████████████████████████████████████| 153/153 [02:51<00:00,  1.12s/it]


In [8]:
ix_corr_n_groups, ix_more_groups, ix_less_groups = evaluate_predictions(predictions_methods_fts_clean, jerome_pd_subset)

Exact match # of groups: 0.82
 More groups predicted: 0.11
 Less groups predicted: 0.07
 Missing pmcids: {5416685, 4352055}

Column wise comparison of predictions and annotations (error):

{'age maximum': 0.29,
 'age mean': 0.31,
 'age median': 0.17,
 'age minimum': 0.29,
 'count': 0.32,
 'diagnosis': 0.45,
 'female count': 0.37,
 'group_name': 0.22,
 'male count': 0.33,
 'subgroup_name': 0.99}

Percentage response given by pmcid:

{'age maximum': 0.71,
 'age mean': 0.77,
 'age median': 0.04,
 'age minimum': 0.72,
 'count': 1.0,
 'female count': 0.73,
 'male count': 0.77}

Summed Mean percentage error:

{'age maximum': 0.0,
 'age mean': 0.14,
 'age median': 0.0,
 'age minimum': 0.0,
 'count': 0.15,
 'female count': 0.06,
 'male count': 0.1}

Averaged Mean percentage error:

{'age maximum': 0.07,
 'age mean': 0.18,
 'age median': 0.0,
 'age minimum': 0.08,
 'count': 0.17,
 'female count': 0.09,
 'male count': 0.13}


### Heurisic - demographics section

In [9]:
predictions_demographics_fts = search_extract(all_embeddings, query, heuristic_strategy='demographics', **ZERO_SHOT_MULTI_GROUP, num_workers=3)
predictions_demographics_fts.to_csv('data/predictions_demographics_fts.csv', index=False)
# predictions_demographics_fts = pd.read_csv('data/predictions_demographics_fts.csv')
# Clean predictions
predictions_demographics_fts_clean = clean_predictions(predictions_demographics_fts)

100%|█████████████████████████████████████████| 153/153 [02:53<00:00,  1.13s/it]


In [10]:
ix_corr_n_groups, ix_more_groups, ix_less_groups = evaluate_predictions(predictions_demographics_fts_clean, jerome_pd_subset)

Exact match # of groups: 0.81
 More groups predicted: 0.12
 Less groups predicted: 0.07
 Missing pmcids: {5460048, 5416685, 4352055}

Column wise comparison of predictions and annotations (error):

{'age maximum': 0.3,
 'age mean': 0.32,
 'age median': 0.2,
 'age minimum': 0.31,
 'count': 0.32,
 'diagnosis': 0.48,
 'female count': 0.38,
 'group_name': 0.23,
 'male count': 0.34,
 'subgroup_name': 0.99}

Percentage response given by pmcid:

{'age maximum': 0.68,
 'age mean': 0.75,
 'age median': 0.04,
 'age minimum': 0.69,
 'count': 1.0,
 'female count': 0.72,
 'male count': 0.78}

Summed Mean percentage error:

{'age maximum': 0.0,
 'age mean': 0.14,
 'age median': 0.0,
 'age minimum': 0.01,
 'count': 0.15,
 'female count': 0.07,
 'male count': 0.1}

Averaged Mean percentage error:

{'age maximum': 0.04,
 'age mean': 0.17,
 'age median': 0.0,
 'age minimum': 0.06,
 'count': 0.23,
 'female count': 0.08,
 'male count': 0.13}


# GPT4 - No Heuristic

In [None]:
predictions_gpt4 = search_extract(all_embeddings, query, **ZERO_SHOT_MULTI_GROUP, num_workers=1, model_name='gpt-4')
# Clean predictions
predictions_gpt4 = clean_predictions(predictions_gpt4)

  3%|█▍                                         | 5/153 [01:03<27:42, 11.23s/it]

## Conclusion

Heuristic doesn't seem to improve things much, but does increase the chance that we don't find anything at all (when info is in Results section).

Could modify heuristic to fall back onto entire document if extraction comes  back null, but it doesn't seem to improve prediction much otherwise, so I'm not sure it's worthwhile. 

# Manual results revision