<a href="https://colab.research.google.com/github/automix-llm/automix/blob/main/colabs/Step3_MetaVerify.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AutoMix: Solving the task

- This is the third and final step of the process. The step assumes, we have access to verifier confidence. We run inference on both the 13b and 70b models for all tasks. Note that in practice, we don't have to run inference on both the models. This is just for ease of implementation.

*Note: The outputs of this step are provided [here](data/automix_release_with_decision.jsonl).*

## TOODs:
1. Data Wise Results   -> @Pranjal
2. A bit more beautify
3. Can provide a visualization of decisions or something?

In [1]:
import json
import pandas as pd
import numpy as np
from automix import POMDP, Threshold, SelfConsistency, Automix

## Read and Split Data

In [22]:
# TODO: @Pranjal, please change the file path
llama2_outputs = pd.read_json("data/automix_release_with_decision.jsonl", lines=True, orient="records")
train_outputs = llama2_outputs[llama2_outputs['split'] == 'train']
test_outputs  = llama2_outputs[llama2_outputs['split'] == 'val']

## Instantiate AutoMix Object

In [23]:
# Create seperate AutoMix Variants
threshold_variant = Automix(Threshold(num_bins=8))
sc_variant = Automix(SelfConsistency(num_bins=8))
pomdp_variant = Automix(POMDP(num_bins=8))

## Train AutoMix

In [24]:
threshold_variant.train(train_outputs)
sc_variant.train(train_outputs)
pomdp_variant.train(train_outputs)

0.625 0.08105463689324811
0.5 0.07049058864861137
((0, 0, 1, 1, 0, 0, 0, 0, 0), 0) 0.19449230430671569


## Visualize Learnerd Parameters

In [25]:
print(f'Automix w/ Thresholding : Verifier Confidence below {threshold_variant.best_param} are routed to LLM')
print(f'Automix w/ Self-Consistency : Verifier Confidence below {sc_variant.best_param} are routed to LLM')

Automix w/ Thresholding : Verifier Confidence below 0.625 are routed to LLM
Automix w/ Self-Consistency : Verifier Confidence below 0.5 are routed to LLM


In [26]:
# TODO: Improve Visualization?
# TODO: @Pranjal: Visualization directly in library
print(f'Automix w/ POMDP : Verifier Confidence routed to LLM are: {", ".join([str(i*(1/8)) for i,x in enumerate(pomdp_variant.best_param[0]) if x==1])} are routed to llm')

Automix w/ POMDP : Verifier Confidence routed to LLM are: 0.25, 0.375 are routed to llm


## Run Inference on Test Set

In [36]:
threshold_results = threshold_variant.evaluate(test_outputs, return_dict = True)
sc_results = sc_variant.evaluate(test_outputs, return_dict = True)
pomdp_results = pomdp_variant.evaluate(test_outputs, return_dict = True)

In [None]:
# We have meta-verifier decisions returned as well :) 

del threshold_results['route_to_llm']
del sc_results['route_to_llm']
del pomdp_results['route_to_llm']

## Display the Results

In [41]:
# Create pandas df 
data = {
    'Threshold Variant': threshold_results,
    'SC Variant': sc_results,
    'POMDP Variant': pomdp_results,
}
df = pd.DataFrame(data).transpose()
print(df)

                   ibc_lift  automix_slm_slope  avg_performance   avg_cost
Threshold Variant  0.086537           0.002572         0.401705  31.541555
SC Variant         0.099600           0.002603         0.384088  24.410953
POMDP Variant      0.051105           0.002488         0.353582  13.231213
