This notebook shows how to use Smoothie. 

What you will need for a given task:
* The list of sample inputs (`test_inputs`). In this tutorial, we load this from a jsonl file in `smoothie_data`.
* A set of models to route among, namely their generations for `test_inputs`. In this tutorial, we have previously saved different model generations in separate jsonl files, and we load and concatenate them to form a numpy array `test_generations` (of size `n_samples` x `n_models`).


We will walk through an example on CNN dailymail. To follow along, make sure you download `smoothie_data` from Huggingface, `cd` into the directory, and do `git lfs pull`.

If interested in the mathematical details of the Smoothie algorithm, please see `algorithm.ipynb`.

In [1]:
import jsonlines
import json 
import numpy as np
from sentence_transformers import SentenceTransformer
from fastembed import TextEmbedding
from sklearn.neighbors import NearestNeighbors

import sys 
sys.path.append("..")
from src.model import Smoothie

  from tqdm.autonotebook import tqdm, trange


Load and format data

In [2]:
# load test_inputs for the task 
with jsonlines.open("tutorial_data/datasets/cnn_dailymail_test.jsonl") as file: 
    test_dataset = list(file.iter())
test_inputs = [sample['embedding_input'] for sample in test_dataset] # get the raw inputs for the task (no formatting)

n_samples = len(test_inputs)

In [3]:
# load test_generations, numpy array (n_samples x n_models) of generations

models = ["mistral-7b", "llama-2-7b", "vicuna-7b", "gemma-7b", "nous-capybara"]
n_models = len(models)
test_generations = []
for model in models:
    predictions_path = f"tutorial_data/generations/cnn_dailymail/{model}_test.json"
    with open(predictions_path, "r") as f:
        test_generations.append(json.load(f)['generations'])

test_generations = np.array(test_generations).T

In [4]:
# embed test_inputs for sample-dependent routing 
# this is used for Smoothie-dependent, in KNN to determine which samples should be used to learn the Smoothie weights for a given test sample 

model_name = "all-mpnet-base-v2"
model = SentenceTransformer(model_name)

test_input_embeddings = model.encode(test_inputs)




In [7]:
# embed test_generations --- these are the embeddings used in the main Smoothie algorithm
def clean_generation(generation: str):
    """
    Extracts a generation from the full output of the model.
    """
    generation = generation.replace("<pad>", "")
    generation = generation.replace("<s>", "")
    generation = generation.replace("</s>", "")
    generation = generation.replace("</eos>", "")
    generation = generation.replace("\\n", "\n")
    return generation.strip().split("\n")[0]

cleaned_test_generations = np.array([clean_generation(gen) for gens_per_sample in test_generations for gen in gens_per_sample])

embedding_model = TextEmbedding(
    model_name="BAAI/bge-small-en-v1.5", 
    #providers=["CUDAExecutionProvider"] # Uncomment for GPU
    providers = ["CPUExecutionProvider"]
)
smoothie_embeddings = np.array(list(embedding_model.embed(cleaned_test_generations))).reshape(n_samples, n_models, -1)
embed_dim = smoothie_embeddings.shape[2]

Fetching 5 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 46707.17it/s]


Use either smoothie-dependent or smoothie-independent (only run one of the two cells below!)

In [8]:
# Code for smoothie-dependent
# produces smoothie_dataset_weights, an n_samples x n_models numpy array of scores for each generation in test_generations
# for smoothie-dependent, each row of weights is different 

# adjust n_neighbors as you wish
nbrs = NearestNeighbors(n_neighbors=20, algorithm="auto")
nbrs.fit(test_input_embeddings)
_, test_indices = nbrs.kneighbors(test_input_embeddings)

smoothie_dataset_weights = []
for sample_idx in range(n_samples):
    embs_per_sample = smoothie_embeddings[test_indices[sample_idx]]
    smoothie = Smoothie(n_voters=n_models, dim=embed_dim)
    smoothie.fit(embs_per_sample)
    smoothie_dataset_weights.append(smoothie.theta)

smoothie_dataset_weights = np.array(smoothie_dataset_weights)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [9]:
# Code for smoothie-independent
# each row of weights in smoothie_dataset_weights is the same - we just use one model for the dataset.
smoothie = Smoothie(n_voters=n_models, dim=embed_dim)
smoothie.fit(smoothie_embeddings)
smoothie_dataset_weights = np.tile(smoothie.theta, (n_samples, 1))


Select samples according to smoothie weights

In [10]:
# finally, select samples according to smoothie weights

routed_texts = []
routed_models = []

for sample_idx in range(n_samples):
    max_idx = smoothie_dataset_weights[sample_idx].argmax()
    text = test_generations[sample_idx][max_idx]
    routed_texts.append(text)
    routed_models.append(models[max_idx])

In [11]:
routed_texts

['Justin Rose, a top-five ranked golfer for the past three years, has been struggling with his form recently. He has spent the past two weeks practicing and it seems to have paid off as he had a good round at the Shell Houston Open. He is confident that he is improving and is looking forward to the Masters, where he has had success in',
 'Lewis Ferguson, an 18-year-old jockey, survived a spectacular fall from Merrion Square at Wincanton on Wednesday. The fall has been watched by hundreds of thousands of people online and Ferguson was mucking out the stables as usual on Thursday morning. He said he was',
 'The East of England Ambulance Service received a call about a person who was run over by a car in Epping Forest, but when the ambulances arrived, they found out that the "victim" was actually a squirrel. The service also received calls about a man who dropped his burger and it was "bleeding," a woman who a',
 "A couple in China who were unable to get married due to the girlfriend's ho

In [12]:
routed_models

['vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicuna-7b',
 'vicu