# LLM Basics

##  1. Tokenizers

### 1.1 What is a tokenizer?

A tokenizer is a tool that breaks down text into smaller units, called tokens, which can be words, subwords, or characters. This process helps in preparing the text for further analysis or processing by language models.

![](../obsidian/Excalidraw/Tokenizers.svg)

### 1.2 Locating Pre-Trained Tokenizers

You can find the appropriate tokenizers for each open-source LLM in the [Hugging Face model hub](https://huggingface.co/models).

For instance, to download the tokenizer for the [Deepseek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) model, simply use the following code:





In [1]:
from tokenizers import Tokenizer

# Download tokenizer from Hugging Face
tokenizer = Tokenizer.from_pretrained("deepseek-ai/DeepSeek-R1")

# Encode a sample text
tokens = tokenizer.encode("This is a sample")
print(f"IDs: {tokens.ids}")
print(f"Tokens: {tokens.tokens}")

IDs: [2337, 344, 260, 6810]
Tokens: ['This', 'Ġis', 'Ġa', 'Ġsample']


### 1.3 Tokenizers and Languages

It's important to note that tokenizers are typically developed before the model's training process, using only a subset of the available data. As a result, each model comes with its own unique "vocabulary."

This means that tokenizers may perform significantly worse when processing languages or text types that were not included in the training data.

Below, we explore several tokenizers:

- [`deepseek-ai/DeepSeek-R1`](https://huggingface.co/deepseek-ai/DeepSeek-R1)
- [`google-bert/bert-base-uncased`](https://huggingface.co/google-bert/bert-base-uncased)
- [`deepseek-ai/DeepSeek-Coder-V2-Instruct`](https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct)
- [`intfloat/multilingual-e5-large`](https://huggingface.co/intfloat/multilingual-e5-large)

The code below visualizes these tokenizers:

In [2]:
from utils import visualize_tokens

visualize_tokens("This is a sample", tokenizer_name="deepseek-ai/DeepSeek-R1")
visualize_tokens("This is a sample", tokenizer_name="google-bert/bert-base-uncased")
visualize_tokens("This is a sample", tokenizer_name="deepseek-ai/DeepSeek-Coder-V2-Instruct")
visualize_tokens("This is a sample", tokenizer_name="intfloat/multilingual-e5-large")


### 1.4 Your Task:

Determine which tokenizer was trained on German texts and which was not. Then, perform the same analysis for code as text.

In [3]:
# Evaluate the tokenizers using a variety of German texts to identify which model has been trained on German data.
# Likewise, test the tokenizers on code samples to determine which model effectively handles code.


In [4]:
code = """class Foo():
    def  __int__(self):
        pass
"""

visualize_tokens(code, tokenizer_name="deepseek-ai/DeepSeek-R1")
visualize_tokens(code, tokenizer_name="google-bert/bert-base-uncased")
visualize_tokens(code, tokenizer_name="deepseek-ai/DeepSeek-Coder-V2-Instruct")
visualize_tokens(code, tokenizer_name="intfloat/multilingual-e5-large")

In [5]:
german = """Die einzige Weisheit, die man wirklich besitzt, ist die, die man anderen vermittelt."""

visualize_tokens(german, tokenizer_name="deepseek-ai/DeepSeek-R1")
visualize_tokens(german, tokenizer_name="google-bert/bert-base-uncased")
visualize_tokens(german, tokenizer_name="deepseek-ai/DeepSeek-Coder-V2-Instruct")
visualize_tokens(german, tokenizer_name="intfloat/multilingual-e5-large")

## 2. Ollama and LLM hosting

### 2.1 What is Ollama?

Ollama is a platform designed to simplify the hosting and deployment of large language models, making it easier for developers to integrate powerful AI capabilities into their applications.

To verify that Ollama is running properly, use the following command:

```bash
ollama -v
```

If the command doesn't return a version number, you'll need to start the Ollama server before proceeding.

### 2.2 Hosting a Local LLM

Below is an example demonstrating how to download and run Huggingface's [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) locally on your PC.

<img src="https://cdn-uploads.huggingface.co/production/uploads/61c141342aac764ce1654e43/oWWfzW4RbWkVIo7f-5444.png" alt="SmolLM2 Image" style="width:300px;">



In [6]:
from ollama import Client

MODEL = "smollm2:360m"
# Initialize the Ollama client
client = Client(host="http://localhost:11434")

In [7]:
# Download the model and list the downloaded models
client.pull(MODEL)
client.list()

ListResponse(models=[Model(model='smollm2:360m', modified_at=datetime.datetime(2025, 2, 28, 13, 13, 15, 71271, tzinfo=TzInfo(+01:00)), digest='297281b699fc51376006233ca400cd664c4f7b80ed88a47ef084f1e4b089803b', size=725566512, details=ModelDetails(parent_model='', format='gguf', family='llama', families=['llama'], parameter_size='361.82M', quantization_level='F16')), Model(model='all-minilm:33m', modified_at=datetime.datetime(2025, 2, 27, 14, 30, 20, 377122, tzinfo=TzInfo(+01:00)), digest='4f5da3bd944d9ad1cd3acc7d065ee54367a4c703f51fb6295bd8bc5007ed0c4a', size=67319908, details=ModelDetails(parent_model='', format='gguf', family='bert', families=['bert'], parameter_size='33M', quantization_level='F16')), Model(model='qwen2.5-coder:0.5b-instruct-q4_K_M', modified_at=datetime.datetime(2025, 2, 27, 12, 56, 46, 212520, tzinfo=TzInfo(+01:00)), digest='b0b7a69e69028a52e977165edcfd1e5b23476bd8fcdb99c65add8d3e260ac0ce', size=397821474, details=ModelDetails(parent_model='', format='gguf', family

In [8]:
# Your can now chat with the model by sending a message to the server. 

message = {"role": "user", "content": "What are you?"}

for part in client.chat(model=MODEL, messages=[message], stream=True, keep_alive=30):
    print(part["message"]["content"], end="", flush=True)

I am a text-based model for conversational AI, trained with the concept of language understanding and generation capabilities. I have been designed to process and respond to natural language inputs in a way that is understandable and helpful. I'm here to assist users like you by offering information on various topics through structured text input and providing clear responses when needed.

## 3. Langchain and LLM Integration  

<img src="https://opensource.muenchen.de/logo/langchain.jpg" alt="Langchain Logo" style="width:300px;">  

Langchain is a powerful framework that abstracts language model providers, allowing seamless integration of various LLMs, including:  

- Ollama  
- Claude  
- OpenAI  

In addition to model integration, Langchain offers a range of prebuilt components for common use cases, such as search and agent-based interactions.  

To use Ollama with Langchain, simply utilize the `ChatOllama` class from the `langchain_ollama` package.  

In [9]:
from langchain_ollama import ChatOllama
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage

model = ChatOllama(model=MODEL ,base_url="http://localhost:11434")

messages = [SystemMessage("You are a helpfull assistant"), HumanMessage("What are you?")]

result = model.invoke(messages)
result.pretty_print()


I'm a text-based AI language model. I can process and analyze written texts, including understanding context, detecting emotions, identifying patterns, and generating responses in various formats such as sentences or paragraphs.


### 3.2 Your Task: Chatbot with memory

Use the [Automatic Message Management](https://python.langchain.com/docs/how_to/chatbots_memory/#automatic-history-management) to let your bot remember the conversation history. This will allow the bot to remember previous messages and respond accordingly.

In [10]:
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# your code goes here
workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    system_prompt = (
        "You are a helpful assistant. "
        "Answer all questions to the best of your ability."
    )
    messages = [SystemMessage(content=system_prompt)] + state["messages"]
    response = model.invoke(messages)
    return {"messages": response}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

In [11]:
result = app.invoke(
    {"messages": [HumanMessage(content="Give me a hello world example in python")]},
    config={"configurable": {"thread_id": "1"}},
)
result['messages'][-1].pretty_print()


Here is a simple "Hello, World!" program written in Python:

```python
# Hello, World! Program
print("Hello, World!")
```

This code will output the following on your screen:

```
Hello, World!
```

If you want to print multiple lines, you can use a `for` loop and add newlines:

```python
# Hello, World! Program with printing multiple lines
for i in range(3):
    print("Hello, World!")
print("\nThis is the third line")
```


In [12]:
result = app.invoke(
    {"messages": [HumanMessage(content="What did I just ask you?")]},
    config={"configurable": {"thread_id": "1"}},
)
result['messages'][-1].pretty_print()


You asked me to write a hello world example in Python. Here it is:

```python
# Hello, World! Program
print("Hello, World!")
```

I provided the code as an example of writing a "Hello, World!" program in Python, which prints "Hello, World!" followed by a newline at the end of the output.


## 4. Continue

You can now use our [SmolLMV2](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) instance as the backend for the Continue plugin.

To set it up, follow these steps:

1. Open the Continue settings:  
   ![](./media/continue/continue_settings.png) ![](./media/continue/continue_file.png)
   
2. Copy the contents of [`example_config.json`](./example_config.json).  

3. Paste them into the "Configuration" file in Continue.

Once configured, you should be able to use the chat functionality of the Continue plugin.

### 4.1 Tab Auto Completions  

To enable tab auto-completions (Ghost Tab) in your editor, you'll need to download a tab auto-completion model.  

For this setup, we'll use the [Qwen2.5-Coder](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF) model.  

Once the model is downloaded, you should start seeing code completion suggestions directly in your editor.  

<video width="480" height="320" controls>  
  <source src="./media/continue/example.mp4" type="video/mp4">  
</video>  

In [13]:
from ollama import Client

#Download the model
client = Client(host="http://localhost:11434")
client.pull("qwen2.5-coder:0.5b-instruct-q4_K_M") 

ProgressResponse(status='success', completed=None, total=None, digest=None)

In [14]:
# genrate fibonacci numbers up to a max of 500
def fibonacci():

SyntaxError: incomplete input (2082750493.py, line 2)