If you're opening this Notebook on colab, you will probably need to install 🤗 Tokenizers. Uncomment the following cell and run it.


In [14]:
%pip install tokenizers



If you're opening this notebook locally, make sure your environment has an install from source for both those libraries.

## Prepare the dataset

In [None]:
# first off we create the data/ dir, download raw wiki-103, and finally unzip the file
!mkdir data
!wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip -P data
!unzip data/wikitext-103-raw-v1.zip -d data

--2023-11-22 14:16:37--  https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.108.13, 54.231.134.168, 52.216.43.160, ...
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.108.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 191984949 (183M) [application/zip]
Saving to: ‘data/wikitext-103-raw-v1.zip’


2023-11-22 14:16:44 (31.4 MB/s) - ‘data/wikitext-103-raw-v1.zip’ saved [191984949/191984949]

Archive:  data/wikitext-103-raw-v1.zip
   creating: data/wikitext-103-raw/
  inflating: data/wikitext-103-raw/wiki.test.raw  
  inflating: data/wikitext-103-raw/wiki.valid.raw  
  inflating: data/wikitext-103-raw/wiki.train.raw  


## Tokenizer from scratch

First, BERT relies on WordPiece, so we instantiate a new Tokenizer with this model:

In [None]:
from tokenizers import Tokenizer
from tokenizers.models import WordPiece

bert_tokenizer = Tokenizer(WordPiece())

Then we know that BERT preprocesses texts by removing accents and lowercasing. We also use a unicode normalizer:

In [None]:
from tokenizers import normalizers
from tokenizers.normalizers import Lowercase, NFD, StripAccents

bert_tokenizer.normalizer = normalizers.Sequence([NFD(), Lowercase(), StripAccents()])

The pre-tokenizer is just splitting on whitespace and punctuation:

In [None]:
from tokenizers.pre_tokenizers import Whitespace

bert_tokenizer.pre_tokenizer = Whitespace()

And the post-processing uses the template we saw in the previous section:

In [None]:
from tokenizers.processors import TemplateProcessing

bert_tokenizer.post_processor = TemplateProcessing(
    single="[CLS] $A [SEP]",
    pair="[CLS] $A [SEP] $B:1 [SEP]:1",
    special_tokens=[
        ("[CLS]", 1),
        ("[SEP]", 2),
    ],
)

We can use this tokenizer and train on it on wikitext like in the [Quicktour](https://huggingface.co/docs/tokenizers/python/latest/quicktour.html):

In [15]:
from tokenizers.trainers import WordPieceTrainer

trainer = WordPieceTrainer(
    vocab_size=30522, special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"]
)

files = [f"data/wikitext-103-raw/wiki.{split}.raw" for split in ["test", "train", "valid"]]
bert_tokenizer.train(files, trainer)

model_files = bert_tokenizer.model.save("data", "bert-wiki")
bert_tokenizer.model = WordPiece.from_file(*model_files, unk_token="[UNK]")

bert_tokenizer.save("data/bert-wiki.json")

### Decoding

On top of encoding the input texts, a `Tokenizer` also has an API for decoding, that is converting IDs generated by your model back to a text. This is done by the methods `decode()` (for one predicted text) and `decode_batch()` (for a batch of predictions).

The decoder will first convert the IDs back to tokens (using the tokenizer’s vocabulary) and remove all special tokens, then join those tokens with spaces:

In [16]:
output = bert_tokenizer.encode("Hello, y'all! How are you 😁 ?")
print(output.ids)
# [1, 27462, 16, 67, 11, 7323, 5, 7510, 7268, 7989, 0, 35, 2]

bert_tokenizer.decode([1, 27462, 16, 67, 11, 7323, 5, 7510, 7268, 7989, 0, 35, 2])
# "Hello , y ' all ! How are you ?"

[1, 27462, 16, 67, 11, 7323, 5, 7510, 7268, 7989, 0, 35, 2]


"hello , y ' all ! how are you ?"

If you used a model that added special characters to represent subtokens of a given “word” (like the `"##"` in WordPiece) you will need to customize the decoder to treat them properly. If we take our previous `bert_tokenizer` for instance the default decoing will give:

In [17]:
output = bert_tokenizer.encode("Welcome to the 🤗 Tokenizers library.")
print(output.tokens)
# ["[CLS]", "welcome", "to", "the", "[UNK]", "tok", "##eni", "##zer", "##s", "library", ".", "[SEP]"]

bert_tokenizer.decode(output.ids)
# "welcome to the tok ##eni ##zer ##s library ."

['[CLS]', 'welcome', 'to', 'the', '[UNK]', 'tok', '##eni', '##zer', '##s', 'library', '.', '[SEP]']


'welcome to the tok ##eni ##zer ##s library .'

But by changing it to a proper decoder, we get:

In [18]:
from tokenizers import decoders

bert_tokenizer.decoder = decoders.WordPiece()
bert_tokenizer.decode(output.ids)
# "welcome to the tokenizers library."

'welcome to the tokenizers library.'

In [27]:
output = bert_tokenizer.encode("From fairest creatures we desire increase, That thereby beauty’s rose might never die, But as the riper should by time decrease, His tender heir mught bear his memeory: But thou, contracted to thine own bright eyes, Feed’st thy light’st flame with self-substantial fuel, Making a famine where abundance lies, Thyself thy foe, to thy sweet self too cruel. Thou that art now the world’s fresh ornament And only herald to the gaudy spring, Within thine own bud buriest thy content And, tender churl, makest waste in niggarding.Pity the world, or else this glutton be, To eat the world’s due, by the grave and thee.")
print(output.tokens)

['[CLS]', 'from', 'faire', '##st', 'creatures', 'we', 'desire', 'increase', ',', 'that', 'thereby', 'beauty', '’', 's', 'rose', 'might', 'never', 'die', ',', 'but', 'as', 'the', 'rip', '##er', 'should', 'by', 'time', 'decrease', ',', 'his', 'tender', 'heir', 'mugh', '##t', 'bear', 'his', 'mem', '##eo', '##ry', ':', 'but', 'th', '##ou', ',', 'contracted', 'to', 'thin', '##e', 'own', 'bright', 'eyes', ',', 'feed', '’', 'st', 'thy', 'light', '’', 'st', 'flame', 'with', 'self', '-', 'substantial', 'fuel', ',', 'making', 'a', 'famine', 'where', 'abundance', 'lies', ',', 'thy', '##self', 'thy', 'fo', '##e', ',', 'to', 'thy', 'sweet', 'self', 'too', 'cruel', '.', 'th', '##ou', 'that', 'art', 'now', 'the', 'world', '’', 's', 'fresh', 'ornament', 'and', 'only', 'herald', 'to', 'the', 'gau', '##dy', 'spring', ',', 'within', 'thin', '##e', 'own', 'bud', 'bur', '##iest', 'thy', 'content', 'and', ',', 'tender', 'chur', '##l', ',', 'makes', '##t', 'waste', 'in', 'nig', '##gard', '##ing', '.', 'pity'

In [22]:
%pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires openai, which is not installed.[0m[31m
[0mSuccessfully installed tiktoken-0.5.1


In [24]:
import tiktoken
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [28]:
encoding.decode(output.ids)



In [1]:
# !git lfs clone https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ
!git lfs clone https://huggingface.co/KoboldAI/LLaMA2-13B-Tiefighter
!git lfs clone https://huggingface.co/teknium/Mistral-Trismegistus-7B

          with new flags from 'git clone'

'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into 'LLaMA2-13B-Tiefighter'...
remote: Enumerating objects: 36, done.[K
remote: Counting objects: 100% (32/32), done.[K
remote: Compressing objects: 100% (32/32), done.[K
remote: Total 36 (delta 10), reused 0 (delta 0), pack-reused 4[K
Unpacking objects: 100% (36/36), 10.28 KiB | 1.28 MiB/s, done.


## How to generate text: using different decoding methods for language generation with Transformers

In [5]:
# !pip -qqq install bitsandbytes accelerate
# !pip install auto-gptq
!pip install optimum

Collecting optimum
  Downloading optimum-1.14.1-py3-none-any.whl (399 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m399.9/399.9 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting coloredlogs (from optimum)
  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.0/46.0 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
Collecting humanfriendly>=9.1 (from coloredlogs->optimum)
  Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.8/86.8 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: humanfriendly, coloredlogs, optimum
Successfully installed coloredlogs-15.0.1 humanfriendly-10.0 optimum-1.14.1


In [2]:
import math
import random
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
import accelerate

In [None]:


model_name_or_path = "./LLaMA2-13B-Tiefighter"
# To use a different branch, change revision
# For example: revision="main"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             device_map="auto",
                                             trust_remote_code=True,
                                             revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "write the plot of an action movie that takes place in 1492 AD"
prompt_template=f'''A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:

'''
print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]



*** Generate:




<s> A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: write the plot of an action movie that takes place in 1492 AD ASSISTANT:

In 1492 AD, the Spanish monarchs, Isabella I and Ferdinand II, have just completed the Reconquista, reclaiming the Iberian Peninsula from Muslim rule. As they look towards expanding their empire, they dispatch the greatest naval fleet ever assembled, under the command of Christopher Columbus. Their mission: to sail west until they reach the Indies and claim its vast riches for Spain.

Meanwhile, a secret society of Templar Knights, known as the Order of Sword and Shield, has been gathering intelligence about the upcoming expedition. They learn that Columbus is actually a double agent, working for both the Spanish crown and their own order. His true intentions are to not only find the Indies but also to uncover a long-lost temple rumored to hold an anc