# Intro

A great course on Large Language Models (LLMs) will be [here](https://github.com/karpathy/LLM101n?tab=readme-ov-file) from Andrej Karapathy, currently under developement.

The filed of Natural Language Processing (NLP) has a lot more to offer than just "Language Models" that generate text, but to keep it simple we will focus on that part.

## A full Language Model example
We will use a "Small Language Model" as we can run it easily on our CPU or in Colab to do some fun first experiments.  Not sure if there even as a threshold when to consider a model "large", but for now let's call models above like 7 billion parameter "Large Language Models"
Model Source: https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct

In [None]:
%pip install transformers

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM-135M-Instruct"

device = "cpu" # for GPU usage "cuda" or for CPU usage "cpu"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)

messages = [{"role": "user", "content": "What is the capital of France."}]
input_text=tokenizer.apply_chat_template(messages, tokenize=False)
print(input_text)
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=50, temperature=0.2, top_p=0.9, do_sample=True)
print(tokenizer.decode(outputs[0]))

  from .autonotebook import tqdm as notebook_tqdm
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<|im_start|>user
What is the capital of France.<|im_end|>

<|im_start|>user
What is the capital of France.<|im_end|>
<|im_start|>assistant
The capital of France! While the capital city of France is Paris, the country's capital is also a popular choice for tourists. Here's a breakdown of the capital city:

**Paris, the City of Light**



## Breaking it down step by step

### Tokenization

In [2]:
print(tokenizer)

GPT2TokenizerFast(name_or_path='HuggingFaceTB/SmolLM-135M-Instruct', vocab_size=49152, model_max_length=2048, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|im_start|>', 'eos_token': '<|im_end|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|im_end|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	3: AddedToken("<repo_name>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	4: AddedToken("<reponame>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	5: AddedToken("<file_sep>

In [7]:
print(tokenizer("Hello"))
print(tokenizer("Hello,"))
print(tokenizer("Hello, World"))
print(tokenizer("Hello, World!"))

{'input_ids': [19556], 'attention_mask': [1]}
{'input_ids': [19556, 28], 'attention_mask': [1, 1]}
{'input_ids': [19556, 28, 2260], 'attention_mask': [1, 1, 1]}
{'input_ids': [19556, 28, 2260, 17], 'attention_mask': [1, 1, 1, 1]}
