<a href="https://colab.research.google.com/github/automix-llm/automix/blob/main/colabs/Step3_MetaVerify.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AutoMix: Solving the task

- This is the third and final step of the process. The step assumes, we have access to verifier confidence. We run inference on both the 13b and 70b models for all tasks. Note that in practice, we don't have to run inference on both the models. This is just for ease of implementation.

*Note: The outputs of this step are provided [here](data/automix_release_with_decision.jsonl).*

In [2]:
import json
import pandas as pd
import numpy as np
from scipy.ndimage import gaussian_filter

from automix import POMDP, Threshold, SelfConsistency, Automix

## Read and Split Data

In [28]:
llama2_outputs = pd.read_json("../data/automix_release_with_decision.jsonl", lines=True, orient="records")
train_outputs = llama2_outputs[llama2_outputs['split'] == 'train']
test_outputs  = llama2_outputs[llama2_outputs['split'] == 'val']

In [29]:
# Let's just use the CNLI dataset
train_outputs = train_outputs[train_outputs['dataset'] == 'cnli']
test_outputs  = test_outputs[test_outputs['dataset'] == 'cnli']

## Instantiate AutoMix Object

In [30]:
# Create seperate AutoMix Variants
threshold_variant = Automix(Threshold(num_bins=8))
sc_variant = Automix(SelfConsistency(num_bins=8))
pomdp_variant = Automix(POMDP(num_bins=8))

## Train AutoMix

In [31]:
threshold_variant.train(train_outputs)
sc_variant.train(train_outputs)
pomdp_variant.train(train_outputs)

## Visualize Learnerd Parameters

In [32]:
print(f'Automix w/ Thresholding : Verifier Confidence below {threshold_variant.best_param} are routed to LLM')
print(f'Automix w/ Self-Consistency : Verifier Confidence below {sc_variant.best_param} are routed to LLM')

Automix w/ Thresholding : Verifier Confidence below 1.0 are routed to LLM
Automix w/ Self-Consistency : Verifier Confidence below 0.5 are routed to LLM


In [37]:
print(f'Automix w/ POMDP : Verifier Confidences routed to LLM are: {", ".join([str(i*(1/8)) for i,x in enumerate(pomdp_variant.best_param[0]) if x==1])} are routed to llm')

Automix w/ POMDP : Verifier Confidences routed to LLM are: 0.625, 0.75, 0.875, 1.0 are routed to llm


## Run Inference on Test Set

In [34]:
threshold_results = threshold_variant.evaluate(test_outputs, return_dict = True)
sc_results = sc_variant.evaluate(test_outputs, return_dict = True)
pomdp_results = pomdp_variant.evaluate(test_outputs, return_dict = True)

## Display the Results

In [35]:
# Create pandas df 
data = {
    'Threshold Variant': threshold_results,
    'SC Variant': sc_results,
    'POMDP Variant': pomdp_results,
}
df = pd.DataFrame(data).transpose()
print(df)

                   ibc_lift  automix_slm_slope  avg_performance   avg_cost
Threshold Variant -0.039216           0.003025         0.555448  52.000000
SC Variant        -0.169584           0.002615         0.522662  47.467695
POMDP Variant      0.882127           0.005926         0.433944   6.532305
