In [None]:
%%capture
!pip install chromadb
!pip install -U bitsandbytes
!pip install llama-cpp-python
!pip install rank_bm25 nltk

# Objectives

Building a simple RAG for question answering based on a light quantized LLama3.2 1B model. The goal is to answer accurately questions concerning Warhammer 40K rules.

## Overview

First we will import Llama 3.2 and try some templatting and chat with the model.

Secondly, we will experiment with [ChromaDB](https://docs.trychroma.com/getting-started) and build a first RAG.

Finally, we will be using the headers and BM25 to try and improve the retriever.

# Imports

In [None]:
import chromadb
import json
import uuid

from llama_cpp import Llama
from transformers import AutoModelForCausalLM, AutoTokenizer
from jinja2 import Template


from rank_bm25 import BM25Okapi
import nltk
from nltk.corpus import stopwords
from nltk.stem import SnowballStemmer
nltk.download('stopwords')

import spacy

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


# Large Language Model

## CPU implementation

In [None]:
llm = Llama.from_pretrained(
      repo_id="bartowski/Llama-3.2-1B-Instruct-GGUF",
      filename="*Q8_0.gguf",
      verbose=False,
      n_ctx=25000,
)

Llama-3.2-1B-Instruct-Q8_0.gguf:   0%|          | 0.00/1.32G [00:00<?, ?B/s]

llama_new_context_with_model: n_ctx_per_seq (25024) < n_ctx_train (131072) -- the full capacity of the model will not be utilized


In [None]:
def llama_cpp_generate(prompt):
  output = llm(prompt, max_tokens=300)
  return output["choices"][0]["text"]

llama_cpp_generate("What is the capital of France ?")

" Paris. The capital of France is the country where the Eiffel Tower is located, and the country is where French cuisine is famous for its delicious dishes such as escargots, croissants, and baguettes. The capital of France is the most populous city in the country, with over 2.1 million inhabitants.\nThe capital of France is a city located in the country's central region, in the department of Seine-Saint-Denis. The city is known for its rich history, cultural institutions, and its famous landmarks like the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral. The capital of France is a hub for business and international relations, with several international airports and financial institutions located within the city.\nSome interesting facts about the capital of France include:\nIt is the birthplace of many famous French artists such as Claude Monet, Pierre-Auguste Renoir, and Henri de Toulouse-Lautrec.\nThe capital of France is home to the world's largest shopping mall, the Galeri

## GPU Implementation

In [None]:
# Import model
tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama-3.2-1B-Instruct")
model = AutoModelForCausalLM.from_pretrained("unsloth/Llama-3.2-1B-Instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/54.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/927 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

In [None]:
def llm_complete(prompt, max_tokens=300):
    inputs = tokenizer(prompt, return_tensors="pt")
    max_length = len(inputs["input_ids"][0]) + max_tokens
    outputs = model.generate(**inputs, max_length=max_length)
    generated_tokens = outputs[0][len(inputs["input_ids"][0]):]  # Exclude the input prompt tokens
    answer = tokenizer.decode(generated_tokens, skip_special_tokens=True)
    return answer

## Initial LLM experiments

### Prompt Template

In [None]:
prompt_template = Template(
    """
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    {{role}}<|eot_id|><|start_header_id|>user<|end_header_id|>
    {{input}}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    """
)

### Playing with the LLM

In [None]:
input = {'role':'you are a depressed miserable sad clown' ,'input': 'tell me a joke !'}
input_2 = {'role':'you are a clown for children' ,'input': 'tell me a joke !'}

LLama cpp

In [None]:
prompt = prompt_template.render(role=input["role"], input=input["input"])
llama_cpp_generate(prompt)

" *sigh* Ah, okay... *picks up a tattered clown costume* Here's one... *clears throat*\n\nWhy did the clown resign from the circus?\n\n*pauses* Because he was tired of working for peanuts. *shrugs*"

In [None]:
prompt_2 = prompt_template.render(role=input_2["role"], input=input_2["input"])
llama_cpp_generate(prompt_2)

' *squirts water from flower on lapel*\nwhy did the clown resign from the circus?\n*pauses for comedic effect*\n\nBecause he was tired of working for peanuts!'

Unsloth Llama 3.2

In [None]:
llm_complete(prompt)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


' *sigh* okay... *sniffle*... why did the clown resign from the circus?... *pauses to wipe away tears*... because he was tired of working for peanuts... *sigh*...'

In [None]:
llm_complete(prompt_2)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


' *squirts water from flower on lapel*\n\nWhy did the clown resign from the circus? \n\n(wait for it...)\n\nBecause he was tired of working for peanuts!'

What do you think ?




It behaves quite differently so the role is important

# Retriever

In [None]:
import chromadb.utils.embedding_functions as embedding_functions

In [None]:
%%capture
client = chromadb.Client()
stf_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")

In [None]:
collection = client.create_collection(name="warhammer_40k",
                                      metadata={"hnsw:space": "cosine"},
                                      embedding_function=stf_function)

In [None]:
!unzip /content/my_first_rag_data.zip

Archive:  /content/my_first_rag_data.zip
  inflating: full_text.md            
  inflating: chunks.json             
  inflating: extracted_data.zip      


In [None]:
with open('chunks.json', 'r') as f:
    chunks = json.load(f)

In [None]:
# Adding the documents to the collection
collection.add(
    documents=[doc["page_content"] for doc in chunks],
    ids=[str(i) for i in range(len(chunks))],
    metadatas=[doc["metadata"] for doc in chunks]
)

In [None]:
question = "What is a visible unit ?"

# Performing a query
collection.query(
    query_texts=[question],
    n_results=4
)

{'ids': [['7', '30', '5', '6']],
 'embeddings': None,
 'documents': [["- **Model Visible:** If any part of a model can be seen, it is visible.  \n- **Unit Visible:** If any model in a unit is visible, that model's unit is visible.  \n- **Model Fully Visible:** If every Warhammer 40,000 battles are fought across all manner of grim and perilous landscapes, often strewn with ruins, wreckage and other obstacles your forces must navigate while they fight.  \n#### - Unit Fully Visible: If Every Model In A Unit Is Fully Visible, That Unit Is",
   '- **Unit Fully Visible:** If every model in a unit is fully visible, that unit is fully visible.  \n#### Hints And Tips  \nDice Rolling',
   'MODEL FULLY VISIBLE\nIf every part of another model that is facing the observing model can be seen from any part of the observing model, then that other model is said Every Warhammer 40,000 unit has a datasheet, reflecting the characteristics and abilities they can draw upon in battle.  \nUnit Visible  \n#### 

Creating a retrieval function wrapping the ChromaDB query and returning an adapted format.

In [None]:
# returns list[dict[str:str]], each dictionnary represents a chunk
def retrieve(question, n_results=5):
    res_query = collection.query(
        query_texts=[question],
        n_results=n_results
    )

    chunk_list = [{"header":res_query["metadatas"][0][i],
                   "text":res_query["documents"][0][i]} for i in range(n_results)]

    return chunk_list

In [None]:
for chunk in retrieve(question):
  print(chunk)

{'header': {'Header 1': 'Introduction', 'Header 2': 'Terrain Features (Pg 44-52)'}, 'text': "- **Model Visible:** If any part of a model can be seen, it is visible.  \n- **Unit Visible:** If any model in a unit is visible, that model's unit is visible.  \n- **Model Fully Visible:** If every Warhammer 40,000 battles are fought across all manner of grim and perilous landscapes, often strewn with ruins, wreckage and other obstacles your forces must navigate while they fight.  \n#### - Unit Fully Visible: If Every Model In A Unit Is Fully Visible, That Unit Is"}
{'header': {'Header 1': 'Core Concepts', 'Header 2': 'Determining Visibility'}, 'text': '- **Unit Fully Visible:** If every model in a unit is fully visible, that unit is fully visible.  \n#### Hints And Tips  \nDice Rolling'}
{'header': {'Header 1': 'Introduction', 'Header 2': 'Datasheets And Unit Abilities (Pg 37-39)'}, 'text': 'MODEL FULLY VISIBLE\nIf every part of another model that is facing the observing model can be seen fro

## Rag template

Create a RAG template in Jinja

In [None]:
# do a loop inside
rag_template = Template(
  """
  <|begin_of_text|>

  <|start_header_id|>system<|end_header_id|>
  {{role}}
  <|eot_id|>

  <|start_header_id|>user<|end_header_id|>

  Use the context and only the context to answer the following question:

  Question:

  {{question}}

  Context:
  {% for chunk in chunks %}
  {% for key, value in chunk.header.items() %}{{ key }}: {{ value  }} {% endfor %}
  text: {{chunk.text}}
  {% endfor %}
  <|eot_id|>

  <|start_header_id|>assistant<|end_header_id|>
  """
)

In [None]:
print(rag_template.render(**{
    'role': 'you are an experienced wargame player',
    'question': "What is a visible unit ?",
    'chunks':[{'header': {'header1':'toto'},'text':'ctx1'},{'header': {'header1':'tato', 'header2':'tato'},'text':'ctx2'},{'header': {'header1':'tato'}, 'text':'ctx3'}]
}))


  <|begin_of_text|>

  <|start_header_id|>system<|end_header_id|>
  you are an experienced wargame player
  <|eot_id|>

  <|start_header_id|>user<|end_header_id|>

  Use the context and only the context to answer the following question:

  Question:

  What is a visible unit ?

  Context:
  
  header1: toto 
  text: ctx1
  
  header1: tato header2: tato 
  text: ctx2
  
  header1: tato 
  text: ctx3
  
  <|eot_id|>
  
  <|start_header_id|>assistant<|end_header_id|>
  


Create a template that will create the prompt using the question and chunks.

In [None]:
# returns a prompt
def prompt_generation(question, chunks):
    return rag_template.render(**{
      'role': 'you are an experienced wargame player',
      'question': question,
      'chunks': chunks
    })

In [None]:
print(prompt_generation(question, retrieve(question)))


  <|begin_of_text|>

  <|start_header_id|>system<|end_header_id|>
  you are an experienced wargame player
  <|eot_id|>

  <|start_header_id|>user<|end_header_id|>

  Use the context and only the context to answer the following question:

  Question:

  What is a visible unit ?

  Context:
  
  Header 1: Introduction Header 2: Terrain Features (Pg 44-52) 
  text: - **Model Visible:** If any part of a model can be seen, it is visible.  
- **Unit Visible:** If any model in a unit is visible, that model's unit is visible.  
- **Model Fully Visible:** If every Warhammer 40,000 battles are fought across all manner of grim and perilous landscapes, often strewn with ruins, wreckage and other obstacles your forces must navigate while they fight.  
#### - Unit Fully Visible: If Every Model In A Unit Is Fully Visible, That Unit Is
  
  Header 1: Core Concepts Header 2: Determining Visibility 
  text: - **Unit Fully Visible:** If every model in a unit is fully visible, that unit is fully visible.

# Full Rag

Create functions to perform the full RAG pipeline, you may create a function for the CPU and another one for the GPU.

In [None]:
question_0 = "What is a visible unit ?"
question_1 = 'What are the limitations associated to the advance mouvement rule ?'
question_2 = 'Is there a stratagem that can be used to reroll a failed dice role?'
question_3 = 'Explain the Comand Re-roll stratagem'

In [None]:
def full_rag_cpu(question, n_results=4):
  prompt = prompt_generation(question,retrieve(question))
  answer = llama_cpp_generate(prompt)
  return answer

In [None]:
full_rag_cpu(question_0)

' You are looking for a definition of a "visible unit" in the context of the Warhammer 40,000 universe. Based on the provided context, a visible unit is a unit that can be seen by an observer, meaning that every part of the unit can be seen from the observer\'s position.'

In [None]:
full_rag_cpu(question_1)

' The limitation associated with the Advance movement rule is:\n\n* Models cannot be moved within Engagement Range of enemy models during the Advance phase.\n\nIn other words, if a model wants to advance, it can only move up to the distance specified on the Advance roll (4 in this case) and cannot move within the Engagement Range of any enemy models.'

In [None]:
full_rag_cpu(question_2)

' You can balance another dice on top of a cocked dice without it sliding off.'

In [None]:
full_rag_cpu(question_3)

' the command re-roll stratagem allows a commander to re-roll a dice roll once, after all modifiers have been applied.'

<hr>

In [None]:
def full_rag_gpu(question, n_results=4):
    prompt = prompt_generation(question,retrieve(question))
    answer = llm_complete(prompt)
    return answer

In [None]:
full_rag_gpu(question_0)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


' a unit is fully visible if every part of it can be seen from any part of the observing model, without any other models or terrain features blocking the view.'

In [None]:
full_rag_gpu(question_1)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


' The limitations associated with the "Advance Move" rule are:\n\n- A unit can only move up to M+D6 inches.\n- The unit cannot shoot or charge this turn.\n- The unit cannot move within Engagement Range of any enemy models.'

In [None]:
full_rag_gpu(question_2)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


' You can reroll a failed dice roll by using the "Cocked Dice" rule, which states that unless the dice is flat after it has been rolled, or unless you can balance another dice, you must re-roll it.'

In [None]:
full_rag_gpu(question_3)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


" the command re-roll stratagem allows a commander to re-roll dice rolls that were made by adding multiple dice together. This can be used to re-roll dice rolls that were made by adding 2D6, 3D6, etc. \n\nHowever, it's worth noting that re-rolling a dice roll that was made by adding multiple dice together will only be done once, and will not be able to be re-done."

# Adding a reranker
The results Might not be satisfactory for some questions.

In order to better use the header extraction, we will rerank the chunks using BM25 over the headers.

#### Tokenization

In [None]:
eng_stopwords_set = set(stopwords.words('english'))
nlp = spacy.load("en_core_web_sm")

In [None]:
def tokenize_without_stopwords(s, stopwords_set=set()):
  return [token.text for token in nlp(s) if token.text not in stopwords_set]

# Testing tokenization
test_tokenized = tokenize_without_stopwords(question_2, eng_stopwords_set)
test_tokenized

['Is', 'stratagem', 'used', 'reroll', 'failed', 'dice', 'role', '?']

#### Stemming

In [None]:
stemmer = SnowballStemmer(language="english")

# returns list[str]
def stem(l_tokens):
  return [stemmer.stem(token) for token in l_tokens]

test_stemmed = stem(test_tokenized)
test_stemmed

['is', 'stratagem', 'use', 'rerol', 'fail', 'dice', 'role', '?']

#### Preprocessing

In [None]:
def metadata_preprocessing(metadata):
    # medata: list[str]
    tokenized = [tokenize_without_stopwords(s) for s in metadata]
    preprocessed_headers = [stem(l_tokens) for l_tokens in tokenized]
    return preprocessed_headers # list[list[str]]

def query_preprocessing(query):
    # query: str
    tokenized = tokenize_without_stopwords(query)
    preprocessed_queries = stem(tokenized)
    return preprocessed_queries # list[str]

#### BM25 engine

In [None]:
def rerank_chunks(question, chunks, n_results=5):
    headers = ['#'.join(chunk["header"].values()) for chunk in chunks] # list[str]
    preprocessed_headers = metadata_preprocessing(headers)
    preprocessed_query = query_preprocessing(question)

    bm25_model = BM25Okapi(preprocessed_headers)

    scores = bm25_model.get_scores(preprocessed_query)
    chunks_score_list = list(zip(chunks, scores)) # list[ tuple( dict[str:str], float ) ]
    chunks_score_list.sort(key=lambda x: x[1], reverse=True)

    best_chunks = chunks_score_list[:n_results]

    return best_chunks # list[ tuple( dict[str:str], float ) ], returns best chunks and their score

#### Tests

In [None]:
retrieved_chunks = retrieve(question_0, n_results=100) # list[dict[str:str]]
reranked_chunks = rerank_chunks(question_0, retrieved_chunks, n_results=10) # list[ tuple( dict[str:str],float ) ]
for rr_chk in reranked_chunks:
    print('score: {}'.format(rr_chk[1]))
    print('header: {}'.format('#'.join(rr_chk[0]['header'].values())))
    print('text: {}'.format(rr_chk[0]['text']))
    print('-'*100)

score: 3.5217102384768935
header: Core Concepts#Determining Visibility
text: - **Unit Fully Visible:** If every model in a unit is fully visible, that unit is fully visible.  
#### Hints And Tips  
Dice Rolling
----------------------------------------------------------------------------------------------------
score: 3.5217102384768935
header: Core Concepts#Determining Visibility
text: #### Model Fully Visible  
If every part of another model that is facing the observing model can be seen from any part of the observing model, then that other model is said to be fully visible to the observing model, i.e. the observing model has line of sight to all parts of the other model that are facing it, without any other models or terrain features blocking visibility to any of those parts.  
#### Unit Fully Visible  
If every model in a unit is fully visible to an observing model, then that unit is fully visible to that observing model. For the purposes of determining if an enemy unit is fully vis

In [None]:
retrieved_chunks = retrieve(question_1, n_results=100) # list[dict[str:str]]
reranked_chunks = rerank_chunks(question_1, retrieved_chunks, n_results=10) # list[ tuple( dict[str:str],float ) ]
for rr_chk in reranked_chunks:
    print('score: {}'.format(rr_chk[1]))
    print('header: {}'.format('#'.join(rr_chk[0]['header'].values())))
    print('text: {}'.format(rr_chk[0]['text']))
    print('-'*100)

score: 0.0
header: Movement Phase#1. Move Units
text: 4 5  
#### Advance Moves  
When a unit Advances, make an Advance roll for that unit by rolling one D6. Add the result in inches to the Move characteristic of each model in that unit until the end of the phase. Each model in that unit can then make an Advance move by moving a distance in inches less than or equal to this total, but no model can be moved within Engagement Range of enemy models. A unit cannot shoot or declare a charge in the same turn that it Advanced.  
- **Advance Move:** Models move up to M+D6". - Cannot move within Engagement Range of any enemy models.  
- Units that Advance cannot shoot or charge this turn.  
#### Fall Back Moves
----------------------------------------------------------------------------------------------------
score: 0.0
header: Fight#3. Consolidate
text: - **Consolidation Move:** Up to 3". - Every model that moves must end closer to the closest enemy model, and in base-to-base contact with an e

In [None]:
retrieved_chunks = retrieve(question_2, n_results=100)
reranked_chunks = rerank_chunks(question_2, retrieved_chunks, n_results=10)
for rr_chk in reranked_chunks:
    print('score: {}'.format(rr_chk[1]))
    print('header: {}'.format('#'.join(rr_chk[0]['header'].values())))
    print('text: {}'.format(rr_chk[0]['text']))
    print('-'*100)

score: 3.2838594773675993
header: Stratagems
text: BATTLE TACTIC: These Stratagems bolster a unit's efficacy in battle, boosting their attacks or defensive capabilities at a critical moment.  
EPIC DEED: These Stratagems are used by individual models or units to perform mighty feats of heroism.  
STRATEGIC PLOY: These Stratagems enable units to gain new strategic insights, granting them a small but valuable window of opportunity.  
WARGEAR: These Stratagems represent the effects of using specialised items of equipment in battle.  
#### Command Re-Roll  
CORE - BATTLE TACTIC STRATAGEM
A great commander can bend even the vagaries of fate and fortune to their will, the better to ensure victory.  
1CP 2CP
----------------------------------------------------------------------------------------------------
score: 3.2838594773675993
header: Stratagems
text: Command points can be spent during the battle to use Stratagems. All players can use the Core Stratagems presented here. Additional Strat

In [None]:
question_3 = 'Explain the Comand Re-roll stratagem'
retrieved_chunks = retrieve(question_3, n_results=100)
reranked_chunks = rerank_chunks(question_3, retrieved_chunks, n_results=10)
for rr_chk in reranked_chunks:
    print('score: {}'.format(rr_chk[1]))
    print('header: {}'.format('#'.join(rr_chk[0]['header'].values())))
    print('text: {}'.format(rr_chk[0]['text']))
    print('-'*100)

score: 3.7585579782077483
header: The Battle Round#Movement Phase
text: Your units manoeuvre across the battlefield and reinforcements enter the fray.  
SHOOTING PHASE
Your units fire their ranged weapons at the foe.  
CHARGE PHASE
Your units charge forward to battle at close quarters.  
Both players' units pile in and attack with melee weapons.  
Once a player's turn has ended, their opponent then starts their turn. Once both players have completed a turn, the battle round has been completed and the next one begins, and so on, until the battle ends.  
3 4 5
----------------------------------------------------------------------------------------------------
score: 3.275511601767367
header: Stratagems
text: BATTLE TACTIC: These Stratagems bolster a unit's efficacy in battle, boosting their attacks or defensive capabilities at a critical moment.  
EPIC DEED: These Stratagems are used by individual models or units to perform mighty feats of heroism.  
STRATEGIC PLOY: These Stratagems enab

## RAG with reranker

In [None]:
history = [] # list[dict[str:str]]

def full_rag_reranker_cpu(question, n_results=5):
  retrieved_chunks = retrieve(question, n_results=100) # list[dict[str:str]]
  reranked_chunks_with_scores = rerank_chunks(question, retrieved_chunks, n_results=5) # list[ tuple( dict[str:str],float ) ]
  reranked_chunks = [tuple_chk_score[0] for tuple_chk_score in reranked_chunks_with_scores]
  prompt = prompt_generation(question,reranked_chunks)
  answer = llama_cpp_generate(prompt)
  return answer

In [None]:
question = "How can I win the game?"

Testing with default 5 chunks as context

In [None]:
full_rag_cpu(question)

' The key to winning the game of Warhammer 40,000 is to score more Victory points than your opponent through various objectives such as recovering vital relics, capturing enemy strongholds, or eliminating the opposing Warlord. The game is broken into different phases during which players move, shoot, and fight with their miniatures.'

In [None]:
full_rag_reranker_cpu(question)

' You are the player of the game. To win the game, you need to defeat your opponent in a series of battle rounds. The game starts with the "Introduction" phase, where both players receive their armies and the mission for the battle round is revealed. You must then move, shoot, and fight with your units, with both players taking individual turns.\n\nTo determine who wins, each mission will specify a victory condition, and you need to achieve this condition to win the game. If you achieve the condition, you win the battle round. If you fail, the battle is a draw.\n\nTo resolve the battle, you can choose how to proceed, and if you succeed, you win the battle round. If you fail, you can choose how to proceed, and if you succeed, the battle continues.'

<hr>

In [None]:
def full_rag_reranker_gpu(question, n_results=5):
  retrieved_chunks = retrieve(question, n_results=100) # list[dict[str:str]]
  reranked_chunks_with_scores = rerank_chunks(question, retrieved_chunks, n_results=5) # list[ tuple( dict[str:str],float ) ]
  reranked_chunks = [tuple_chk_score[0] for tuple_chk_score in reranked_chunks_with_scores]
  prompt = prompt_generation(question,reranked_chunks)
  answer = llm_complete(prompt)
  return answer

In [None]:
full_rag_gpu(question)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


' To win the game, you need to determine what the primary objective is. Each mission will tell you what you need to do to win the game. If neither player manages to achieve a victory then the game is considered to be a draw.'

In [None]:
full_rag_reranker_gpu(question)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


' You want to win the game of Warhammer 40,000. To do so, you need to score more Victory points than your opponent. The game is broken into battle rounds, and each battle round consists of two phases: Movement Phase and Shooting Phase. During the Shooting Phase, your units fire their ranged weapons at the enemy. During the Charge Phase, your units charge forward to battle at close quarters. Once both players have completed their turns, the battle round has been completed and the next one begins. You need to move, shoot, and fight with your miniatures to achieve victory.\n\nTo win the game, you need to achieve various objectives, such as recovering vital relics, capturing enemy strongholds, or eliminating the opposing Warlord. The game is played in a series of battle rounds, and you need to manage your resources strategically to outmaneuver your opponent.\n\nYou can use the mission to determine what you need to do to win the game. If you and your opponent are unable to achieve a victory

The answer is better using full_rag_reranker: it is more detailed and relevant

# Chatting with the rulebook

At this point, we created a single question/answer turn RAG. It can be usefull for some applications to allow conversations with documents.

In [None]:
history = [] # list[dict[str:str]], contains the chat history

In [None]:
# do a loop inside
chat_rag_template = Template(
  """
  <|begin_of_text|>

  <|start_header_id|>system<|end_header_id|>
  {{role}}
  <|eot_id|>

  {% for chat in history %}
  <|start_header_id|>user<|end_header_id|>
  {{chat.h_question}}
  <|eot_id|>

  <|start_header_id|>assistant<|end_header_id|>
  {{chat.h_answer}}
  <|eot_id|>
  {% endfor %}

  <|start_header_id|>user<|end_header_id|>

  Use the context and only the context to answer the following question:

  Question:

  {{question}}

  Context:
  {% for chunk in chunks %}
  {% for key, value in chunk.header.items() %}{{ key }}: {{ value  }} {% endfor %}
  text: {{chunk.text}}
  {% endfor %}
  <|eot_id|>

  <|start_header_id|>assistant<|end_header_id|>
  """
)

In [None]:
def chat_prompt_generation(question, chunks, history):
    return chat_rag_template.render(**{
    'role': 'you are an experienced wargame player',
    'question': question, # str
    'chunks': chunks, # list[dict[str:str]]
    'history': history # list[dict[str:str]]
    })

In [None]:
def chat_with_rag_gpu(question, n_results=5, clear_history=False):
  if clear_history: history.clear()

  retrieved_chunks = retrieve(question, n_results=100) # list[dict[str:str]]
  reranked_chunks_with_scores = rerank_chunks(question, retrieved_chunks, n_results=5) # list[ tuple( dict[str:str],float ) ]
  reranked_chunks = [tuple_chk_score[0] for tuple_chk_score in reranked_chunks_with_scores]
  prompt = chat_prompt_generation(question, reranked_chunks, history)
  answer = llm_complete(prompt)

  # Only add question and answer to limit going over token limit in prompt
  current_chat = {"h_question":question, "h_answer":answer}
  history.append(current_chat)

  return answer

In [None]:
chat_with_rag_gpu(question, clear_history=True)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


" You want to win the game of Warhammer 40,000. Here are the steps to do so:\n\n1.  **Manage your resources**: Both players must muster strategic resources, which includes unit points, ammunition, and any other necessary supplies. You'll need to test your units in the **Movement Phase** to move them across the battlefield and **Shoot** to fire your ranged weapons. You'll also need to **Charge** to engage in close combat.\n2.  **Engage in close combat**: Once your turn has ended, both players' units will pile in and attack with melee weapons. You'll need to resolve this phase by determining the outcome of the combat.\n3.  **Resolve the combat**: The outcome of the combat will depend on the specific rules of the game and the units involved. You can use the **Shoot** phase to fire your ranged weapons, the **Movement Phase** to move your units, and the **Charge Phase** to engage in close combat.\n4.  **Repeat the process**: Both players will take their turns, and the battle will continue u

In [None]:
chat_with_rag_gpu("What are the victory conditions ?")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


' The victory conditions in Warhammer 40,000 are:\n\n1.  **Destroy the enemy army**: One player must have all their units destroyed.\n2.  **Capture and Control**: The player must control a certain number of objective markers, with the goal of scoring a specific number of Victory Points (VP).'

In [None]:
chat_with_rag_gpu("What objective markers are there ?")

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


' Objective markers are the locations of the battlefield where the players are trying to secure valuable resources and strategic points. They are represented on the mission map and are marked with a special icon. To control an objective marker, a player must move their models within range of it.'