# Check Test Predictions

## Ideas for Improvements

### Prompting with Deberta v3
- Can a more explicit prompt help the model?

### First fine-tune the last layer, then the full model

### Larger Models
- Try out Llama instruct and Mistral, needs new data loader code.

#### Inference
- Inference can be run layer by layer: https://www.kaggle.com/competitions/kaggle-llm-science-exam/discussion/440620

#### Training
- Freeze the lower layers of the model
    - Can we cycle through training the upper and the lower half of a model?

In [2]:
import polars as pl
import numpy as np
from pathlib import Path

In [3]:
run_name = 'deberta_v2_xlarge'
folder_path = Path(f'./data/output/{run_name}')

test_preds = {}
for test_name in ['test', 'daniel', 'osmu_sci_6k', 'mgoksu']:
    file_path = folder_path/f'{test_name}.parquet'
    test_preds[test_name] = pl.read_parquet(file_path)
    print(f'Loaded {test_preds[test_name].shape[0]:,} prediction for test_name {test_name}')

Loaded 200 prediction for test_name test
Loaded 500 prediction for test_name daniel
Loaded 500 prediction for test_name osmu_sci_6k
Loaded 500 prediction for test_name mgoksu


In [4]:
np.array(test_preds['osmu_sci_6k'].columns)

array(['question', 'A', 'C', 'B', 'D', 'E', 'answer', 'passage_id_A_1',
       'answer_passage_score_A_1', 'titled_passage_A_1', 'passage_id_A_2',
       'answer_passage_score_A_2', 'titled_passage_A_2', 'passage_id_B_1',
       'answer_passage_score_B_1', 'titled_passage_B_1', 'passage_id_B_2',
       'answer_passage_score_B_2', 'titled_passage_B_2', 'passage_id_C_1',
       'answer_passage_score_C_1', 'titled_passage_C_1', 'passage_id_C_2',
       'answer_passage_score_C_2', 'titled_passage_C_2', 'passage_id_D_1',
       'answer_passage_score_D_1', 'titled_passage_D_1', 'passage_id_D_2',
       'answer_passage_score_D_2', 'titled_passage_D_2', 'passage_id_E_1',
       'answer_passage_score_E_1', 'titled_passage_E_1', 'passage_id_E_2',
       'answer_passage_score_E_2', 'titled_passage_E_2',
       'best_answer_match', 'second_answer_context', 'answer_score',
       'best_answer_letter', 'best_answer_id', 'second_answer_id',
       'question_ctx_id', 'titled_passage', 'all_prediction'

## Idea Log
- Easy: present question and answer together for answer context matching.
- Hard: use LLM to highlight the differences between the answers.

## Test Check Log
1. The 3 answers picked are all true, but 2 are more specific and the top one is names first in the context.
2. Picked a true answer that is less specific.
3. IVCD: Context is missing specific information, questions ctx is the same as answer ctx 2.
4. Partial fractions in complex analysis: Parsing problem frac1 needs to be replaced with 1/  


## Osmu Check Log
1. Lead astray by answer match, the question passage had the right info. The answers are short and don't contain the context of the question.
2. Selected the other true but less specific answer.
3. Selected the other true but less specific answer.


In [5]:
pl.Config(fmt_str_lengths=2000)
comp_cols = ['question', 'A', 'B', 'C', 'D', 'E', 'answer', 'all_prediction', 
             'best_answer_letter', 'best_answer_match', 'second_answer_context', 'titled_passage']

In [9]:
test_preds['test'].filter(pl.col('best_answer_match') == pl.col('titled_passage'))[comp_cols].shape

(10, 12)

In [1]:
#test_preds['osmu_sci_6k'].filter(pl.col('answer') != pl.col('prediction'))[comp_cols]

In [8]:
wiki_passages = pl.read_parquet('./data/wiki_with_category_v_4.parquet')

In [11]:
wiki_passages.filter(pl.col('title') == 'Intraventricular block')[['title', 'section_title']]

title,section_title
str,str
"""Intraventricular block""","""Summary"""
"""Intraventricular block""","""Types"""
"""Intraventricular block""","""Intraventricular conduction delay"""
