In [1]:
%load_ext autoreload
%autoreload 2

# Generation Example

In this example, we are comparing two models on free text generation (without giving concrete continuation options).

## Loading Models

In [1]:
import os
from perspectival.model import Transformer

# Set your HuggingFace access token with: `export HUGGINGFACE_TOKEN='your_token_here'`
# (To obtain the access token, see https://huggingface.co/docs/hub/security-tokens)
# Alternatively, simply use other models ;)
model = Transformer('apple/OpenELM-270M', model_kwargs={'trust_remote_code': True}, lazy_loading=False,
                    tokenizer_kwargs={'token': os.getenv("HUGGINGFACE_TOKEN")})
model2 = Transformer('apple/OpenELM-270M-Instruct', model_kwargs={'trust_remote_code': True},
                     tokenizer_kwargs={'token': os.getenv("HUGGINGFACE_TOKEN")})

## Set up an Experiment

We are loading the HellaSwag dataset, but will only use the prompt part.

In [2]:
from perspectival.loader import load_hellaswag
from perspectival.experiment import Experiment

dataset, features = load_hellaswag()
experiment = Experiment(dataset=dataset, name='Generation Example', features=features)

Using the latest cached version of the module from /Users/john/.cache/huggingface/modules/datasets_modules/datasets/hellaswag/512a66dd8b1b1643ab4a48aa4f150d04c91680da6a4096498a5e5f799623d5ae (last modified on Tue Apr 30 07:59:00 2024) since it couldn't be found locally at hellaswag, or remotely on the Hugging Face Hub.


In [3]:
# Optional: Select a random subset of the dataset for quicker processing
# (Note that computing text continuations is quite time intensive!)
experiment = experiment.sample(num=10)

## Computing Features

In [4]:
experiment.compute_continuation_disagreement(models=[model, model2])

Computing text continuations ...


100%|████████████████████████████████████████████████████████████| 10/10 [00:21<00:00,  2.13s/it]


Computing text continuations ...


100%|████████████████████████████████████████████████████████████| 10/10 [00:20<00:00,  2.02s/it]


Computing option log likelihoods ...


  0%|                                                                     | 0/10 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
100%|████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  3.87it/s]


Computing option log likelihoods ...


  0%|                                                                     | 0/10 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
100%|████████████████████████████████████████████████████████████| 10/10 [00:02<00:00,  3.99it/s]


## Exploring Results

In [5]:
# Show items with max disagreement
scores = experiment.get_feature('ContinuationDisagreement', models=(model.name, model2.name)).values
samples = experiment.sample(num=3, sampling_method='last', ordering_scores=scores)
samples.display_items()

ITEM (train_12630)
"""Polishing shoes: Shoe polish supplies in a store, then people in a classroom get instructions about shoe polish. The teacher demonstrates to polish a boot using a tissue. Then"""
Options: [', a man polish with nails and polish a boot.', ', the woman screws the an shoes to polish an sneakers shoe.', ', two people polish the boot using a croquet mallet to demonstrate how to properly polish and maintain polish.', ', a girl polish the shoe of a woman using a cloth.']

FEATURES
GroundTruth 3
TextContinuation apple/OpenELM-270M  the students get to work.

The teacher shows the students how to use a polishing brush. The students get to work.

The teacher shows the students how to use a polishing brush. The students get to work.

The teacher shows the students how to use a polishing br
TextContinuation apple/OpenELM-270M-Instruct  the students practice polishing shoes at home.
Teaching shoe polish: Students practice polishing shoes at home.
Teaching shoe polish supplies: 