<a href="https://colab.research.google.com/github/DanielWarfield1/MLWritingAndResearch/blob/main/SpeculativeSampling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Speculative Sampling
Implimenting a naiive aproach for demonstration purposes.

Requires a high ram instance

In [3]:
!pip install sentencepiece

Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m14.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99


In [24]:
"""Loading the draft model
"""

from transformers import T5Tokenizer, T5ForConditionalGeneration

#loading the draft model
draft = "google/flan-t5-large"
draft_tokenizer = T5Tokenizer.from_pretrained(draft)
draft_model = T5ForConditionalGeneration.from_pretrained(draft)

#generating a sample response, to make sure everythings working.
prompt = "Question: What is the geological compesition of the Moon? Explain. \nAnswer:"
input_ids = draft_tokenizer(prompt, return_tensors="pt").input_ids
outputs = draft_model.generate(input_ids)
print(draft_tokenizer.decode(outputs[0]))

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


<pad> The Moon is a crater.</s>


In [6]:
"""Loading the target model
"""

#loading the target model
target = "google/flan-t5-xl"
target_tokenizer = T5Tokenizer.from_pretrained(target)
target_model = T5ForConditionalGeneration.from_pretrained(target)

#generating a sample response, to make sure everythings working.
input_ids = target_tokenizer(prompt, return_tensors="pt").input_ids
outputs = target_model.generate(input_ids)
print(target_tokenizer.decode(outputs[0]))

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.45G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

<pad>The Moon is composed of rock and ice. Rock and ice are the same thing.


In [14]:
"""Ensuring the tokenizers are identical
in order for speculative sampling to work, tokenization for both the draft
and target model must be identical. This is a sanity check to make sure they are.
"""

#tokenizing a test sequence
tokenizer_test = "this, is, some [text] for 1234comparing, tokenizers adoihayyuz"
ex1 = target_tokenizer(prompt, return_tensors="pt").input_ids
ex2 = draft_tokenizer(prompt, return_tensors="pt").input_ids

#zero means all tokenized values are the same, so the tokenizers are
#more than likely identical
print((ex1-ex2).abs().max())

tensor(0)


Example, I think, of next word prediction: https://jamesmccaffrey.wordpress.com/2021/10/21/a-predict-next-word-example-using-hugging-face-and-gpt-2/

actually, T5 is encoder decoder apparently. Here's an example. https://huggingface.co/docs/transformers/model_doc/t5#:~:text=from%20transformers%20import%20AutoTokenizer%2C%20T5Model,decoder_input_ids)%0A%3E%3E%3E%20last_hidden_states%20%3D%20outputs.last_hidden_state


In [161]:
"""Performing Speculative Sampling
"""

#initializing an empty input to feed to the decoder.
#this is updated each loop with valid generations
decoder_ids = draft_model._shift_right(draft_tokenizer("", return_tensors="pt").input_ids)

#defining input. T5 is an encoder-decoder model, so input and output are handled seperatly
input_ids = draft_tokenizer("Translate to German \n Battle not with monsters, lest ye become a monster, and if you gaze into the abyss, the abyss gazes also into you.", return_tensors="pt").input_ids

#defining the number of draft generations
k = 5

#keeps track of generation information, for later printouts
generated = []

#Generating Text
iter = 0
for _ in range(15):
    print('========== Speculative Sampling Iteration {} =========='.format(iter))
    iter+=1

    #creating a holding place for the generated draft
    decoder_ids_draft = decoder_ids.clone()

    before_text = draft_tokenizer.decode(decoder_ids_draft[0])
    initial_length = decoder_ids.shape[1]

    #generating draft
    for i in range(k):

        #predicting the next token with the draft model
        with torch.no_grad():
            logits = draft_model(input_ids=input_ids, decoder_input_ids=decoder_ids_draft).logits
            genid = torch.argmax(logits, dim=2)[0][-1]

        #appending the generated id to the draft
        genid = genid.expand(1,1)
        decoder_ids_draft = torch.cat((decoder_ids_draft,genid),1)

    print('=== Draft Generation')
    current_draft = draft_tokenizer.decode(decoder_ids_draft[0])
    print('generated draft tokens: {}'.format(decoder_ids_draft))
    print('generated draft text: {}'.format(current_draft))

    #Generating all next token predictions with the target
    logits = target_model(input_ids=input_ids, decoder_input_ids=decoder_ids_draft).logits
    genids = torch.argmax(logits, dim=2)[0]
    print('=== Target Generation')
    current_target = draft_tokenizer.decode(genids)
    print('generated target tokens: {}'.format(genids))
    print('generated target text: {}'.format(current_target))

    #checking draft against target
    for i, (dv, tv) in enumerate(zip(decoder_ids_draft[0,1:],genids[:-1])):
        #target does not agree with the draft
        if dv != tv:
            #genids is next word, so this is done to preserve the first token
            first_token = decoder_ids[0][:1]
            decoder_ids = genids[:i+1]
            decoder_ids = torch.cat((first_token,decoder_ids),0)
            break
    else:
        #no disagreements
        decoder_ids = genids

    print('=== Validated Generation')
    current_target = draft_tokenizer.decode(decoder_ids)
    print('generated target tokens: {}'.format(decoder_ids))
    print('generated target text: {}'.format(current_target))

    #expanding dimensions so that the shape of the tensor is the same
    decoder_ids = decoder_ids.expand(1,len(decoder_ids))

    #logging
    numgen = decoder_ids.shape[1] - initial_length
    generated.append({'tokens generated': numgen, 'text before': before_text, 'text after': current_target})

    #printing
    # print('\nsummary:')
    # print('Number of tokens generated: {}'.format(numgen))
    # print('Before: {}'.format(before_text))
    # print('After: {}'.format(current_target))
    # print('')

=== Draft Generation
generated draft tokens: tensor([[    0,   316, 20256,    15,   311,   181]])
generated draft text: <pad>Die Kampfe nicht mit
=== Target Generation
generated target tokens: tensor([316,   3,  15, 181, 181, 177])
generated target text: Die e mit mit den
=== Validated Generation
generated target tokens: tensor([  0, 316,   3])
generated target text: <pad>Die

summary:
Number of tokens generated: 2
Before: <pad>
After: <pad>Die

=== Draft Generation
generated draft tokens: tensor([[    0,   316,     3,     2, 25231,     3,   547,   289]])
generated draft text: <pad> Die <unk>ffentlichkeit hat sich
=== Target Generation
generated target tokens: tensor([ 316,    3, 9465,   40,    3,  547,  289,    3])
generated target text: Die Brul hat sich
=== Validated Generation
generated target tokens: tensor([   0,  316,    3, 9465])
generated target text: <pad>Die Bru

summary:
Number of tokens generated: 1
Before: <pad>Die
After: <pad>Die Bru

=== Draft Generation
generated draft

In [162]:
import pandas as pd
print('assumed average speedup: {}')
pd.DataFrame(generated)

Unnamed: 0,tokens generated,text before,text after
0,2,<pad>,<pad>Die
1,1,<pad>Die,<pad>Die Bru
2,2,<pad>Die Bru,<pad>Die Brutto
3,1,<pad>Die Brutto,<pad>Die Bruttos
4,1,<pad>Die Bruttos,<pad>Die Bruttose
5,1,<pad>Die Bruttose,<pad>Die Bruttoseit
6,1,<pad>Die Bruttoseit,<pad>Die Bruttoseiten
7,1,<pad>Die Bruttoseiten,<pad>Die Bruttoseiten
8,5,<pad>Die Bruttoseiten,<pad>Die Bruttoseiten kämpfen nicht mit dem
9,1,<pad>Die Bruttoseiten kämpfen nicht mit dem,<pad>Die Bruttoseiten kämpfen nicht mit dem Ge


In [174]:
prompt = "What is the geological compesition of the Moon?"
ids = draft_tokenizer(prompt, return_tensors="pt").input_ids
tokens = []
for id in ids[0]:
    tokens.append(draft_tokenizer.decode(id))

In [181]:
print('Prompt: "{}"'.format(prompt))
print('Tokens: {}'.format(tokens))
print('TokenIds: {}'.format(ids.tolist()[0]))

Prompt: "What is the geological compesition of the Moon?"
Tokens: ['What', 'is', 'the', 'ge', 'ological', 'comp', 'e', 's', 'ition', 'of', 'the', 'Moon', '?', '</s>']
TokenIds: [363, 19, 8, 873, 4478, 2890, 15, 7, 4749, 13, 8, 9023, 58, 1]


In [180]:
ids.tolist()[0]

[363, 19, 8, 873, 4478, 2890, 15, 7, 4749, 13, 8, 9023, 58, 1]