<img src="https://huggingface.co/datasets/PleIAs/RAG-Resources/resolve/main/spqrllm.png">

<h1 align="center"><i>A demo with Pleias-Nano</i></h1>

What if your LLM knowledge was stuck in the antiquity?

SPQR LLM is a RAG experiment on most ancient greek and roman literature digitized by the Perseus project. For every question, ancient or modern, the model will strive to build up an answer only relying on sources written before the late antiquity, most of them coming from the 450 BC-200 AD period.

This experiment showcases the new capacities of the Pleias knowledge retrieval models, <a href="https://huggingface.co/PleIAs/Pleias-Nano">Nano</a> and <a href="https://huggingface.co/PleIAs/Pleias-Pico">Pico</a>, including built-in support for source analysis, reference and text grounding. This notebook uses the larger model, Nano (1.21B), that is still largely usable on a free Colab GPU (t4).

Pleias models are only trained on open data which makes them *historical* model trained on a large amount of sources published before 1990. They are expected to have better familiarity with antic texts than any other LLM of similar size.

This multilingual variant can work in Latin! And also a handful of other European languages (Italian, French, German) although source coverage is more limited than English or Latin.

## Loading the dataset

As usual on colab, we start by doing multiple installs.

In [None]:
# Setup
!pip install lancedb pandas langchain langchain-community pypdf tiktoken sentence_transformers tantivy==0.20.1

Collecting lancedb
  Downloading lancedb-0.17.0-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (4.7 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.11-py3-none-any.whl.metadata (2.9 kB)
Collecting pypdf
  Downloading pypdf-5.1.0-py3-none-any.whl.metadata (7.2 kB)
Collecting tiktoken
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting tantivy==0.20.1
  Downloading tantivy-0.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting deprecation (from lancedb)
  Downloading deprecation-2.1.0-py2.py3-none-any.whl.metadata (4.6 kB)
Collecting pylance==0.20.0 (from lancedb)
  Downloading pylance-0.20.0-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (7.4 kB)
Collecting overrides>=0.7 (from lancedb)
  Downloading overrides-7.7.0-py3-none-any.whl.metadata (5.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-a

We will start by loading the vector dataset. It is a zipped lancedb project containing the embeddings for 2,700 ancient works translated in English computed by bge-m3. Future versions will include more languages.

In [None]:
!wget https://huggingface.co/datasets/PleIAs/RAG-Resources/resolve/main/latin-greek-multilingual.zip?download=true -O latin_greek_text.lance.zip
!unzip latin_greek_text.lance.zip
!mv dinum latin_greek_text_db/

--2024-12-11 23:17:41--  https://huggingface.co/datasets/PleIAs/RAG-Resources/resolve/main/latin-greek-multilingual.zip?download=true
Resolving huggingface.co (huggingface.co)... 13.35.210.77, 13.35.210.66, 13.35.210.61, ...
Connecting to huggingface.co (huggingface.co)|13.35.210.77|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs-us-1.hf.co/repos/04/6c/046c7eb7e3371f3b5417a3650d65b05454ee73f45840dc0d0f971861a3c9b415/1734a62103e0cf5ecec0edd939d92b74647e4548048b958115b68f63b68a157e?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27latin-greek-multilingual.zip%3B+filename%3D%22latin-greek-multilingual.zip%22%3B&response-content-type=application%2Fzip&Expires=1734218261&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTczNDIxODI2MX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmhmLmNvL3JlcG9zLzA0LzZjLzA0NmM3ZWI3ZTMzNzFmM2I1NDE3YTM2NTBkNjViMDU0NTRlZTczZjQ1ODQwZGMwZDBmOTcxODYxYTNjOWI0MTUvMTc

We load the dataset using lancedb.

In [None]:
import os
import getpass
import pandas as pd

import lancedb

from langchain_community.vectorstores import LanceDB
from langchain.embeddings import HuggingFaceEmbeddings
from lancedb.embeddings import get_registry
from lancedb.pydantic import Vector, LanceModel

from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry

db = lancedb.connect("latin_greek_text_db")

lancedb_table = db.open_table("latin_greek_text")

lancedb_table

LanceTable(connection=LanceDBConnection(/content/latin_greek_text_db), name="latin_greek_text")

We can now run our first query:

In [None]:
lancedb_table.search("What is the best season for fruits?").limit(10).to_pandas()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/123 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/15.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/54.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/687 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/444 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

Unnamed: 0,hash_id,identifier,title,author,language,order_text,text,vector,_distance
0,81543aa4556aec77,tlg0007.tlg112.perseus-eng2_7,Symposiacs,Plutarch,eng,42,For both the summer and the early autumn bear ...,"[-0.008282402, 0.015326464, -0.023731133, -0.0...",0.724141
1,ec7bf192ec9d7580,phi0978.phi001.perseus-eng1_72,The Natural History,Pliny the Elder,eng,1,This is the proper time for gathering fruit; t...,"[0.012949162, 0.030704195, -0.043624897, 0.024...",0.747091
2,cad93af6e9621953,tlg0007.tlg125.perseus-eng2_0,Plutarch's Natural Questions,Plutarch,eng,7,Or is it because heat fighting against cold ca...,"[-0.018504128, 0.04148131, -0.0421267, -0.0017...",0.792306
3,56386670ef04cf5d,phi0978.phi001.perseus-eng1_61,The Natural History,Pliny the Elder,eng,13,The fruits which fall most readily before they...,"[0.012798937, 0.04470038, -0.036639236, -0.000...",0.807795
4,225cc3cb2d5649a7,phi0978.phi001.perseus-eng1_61,The Natural History,Pliny the Elder,eng,9,"The terebinth, the maple, and the ash produce ...","[-0.0147363, 0.044617243, -0.026414355, 0.0011...",0.812626
5,70580de03691ddd3,phi0978.phi001.perseus-eng1_65,The Natural History,Pliny the Elder,eng,33,"However, there are certain fixed periods of th...","[-0.0076794056, 0.014454386, -0.030387977, 0.0...",0.821087
6,42748fad4dc9494c,tlg0008.tlg001.perseus-eng2_4,The Deipnosophists,Athenaeus,eng,64,"And moreover, that of this fruit those which a...","[0.026232745, 0.04473641, -0.03572028, 0.00964...",0.822074
7,a133de7529136c19,phi0845.phi002.perseus-eng1_7,"On Agriculture, Books 1-9",Columella,eng,59,"The carob-tree, which some people called Cerat...","[0.03852989, 0.015690928, -0.015016179, 0.0233...",0.82418
8,c3f6bce355cb9754,phi0978.phi001.perseus-eng1_61,The Natural History,Pliny the Elder,eng,20,"This, however, is constantly to be witnessed i...","[0.019465365, 0.037772123, -0.05108791, -0.012...",0.825922
9,9aeeed73cefc3fa7,phi0978.phi001.perseus-eng1_63,The Natural History,Pliny the Elder,eng,33,Those trees which are the slowest in bringing ...,"[0.020600379, 0.050828516, -0.045996543, 0.001...",0.837583


Lancedb also support hybrid queries which can yield better results on collections of texts using a specialized language (basically you don't really want to substitute one word with the other). For this specific project, it's better overall to stick to embeddings as we rather want to maximize semantic proximity regardless of the words used.

In [None]:
lancedb_table.search("What is the best season for fruits?", query_type="hybrid").limit(10).to_pandas()

Unnamed: 0,hash_id,identifier,title,author,language,order_text,text,vector,_relevance_score
0,42748fad4dc9494c,tlg0008.tlg001.perseus-eng2_4,The Deipnosophists,Athenaeus,eng,64,"And moreover, that of this fruit those which a...","[0.026232745, 0.04473641, -0.03572028, 0.00964...",0.029211
1,81543aa4556aec77,tlg0007.tlg112.perseus-eng2_7,Symposiacs,Plutarch,eng,42,For both the summer and the early autumn bear ...,"[-0.008282402, 0.015326464, -0.023731133, -0.0...",0.016393
2,45dbc2acb097638f,tlg0008.tlg001.perseus-eng2_4,The Deipnosophists,Athenaeus,eng,66,"But, as I said before, a corrosive juice is en...","[0.018622678, 0.015837086, -0.04330039, 0.0029...",0.016393
3,ec7bf192ec9d7580,phi0978.phi001.perseus-eng1_72,The Natural History,Pliny the Elder,eng,1,This is the proper time for gathering fruit; t...,"[0.012949162, 0.030704195, -0.043624897, 0.024...",0.016129
4,1d6bfffbca2cf9b4,tlg0526.tlg001.perseus-eng2_9,Antiquities of the Jews,Flavius Josephus,eng,28,"He that plants a piece of land, the trees of w...","[0.00571946, 0.037307184, -0.015916698, 0.0448...",0.016129
5,cad93af6e9621953,tlg0007.tlg125.perseus-eng2_0,Plutarch's Natural Questions,Plutarch,eng,7,Or is it because heat fighting against cold ca...,"[-0.018504128, 0.04148131, -0.0421267, -0.0017...",0.015873
6,e0b2c47f9c568ed9,tlg0007.tlg084a.perseus-eng4_0,Roman Questions,Plutarch,eng,44,Or were those adorations paid to the infernal ...,"[-0.016109755, 0.01600607, -0.017857442, 0.020...",0.015873
7,56386670ef04cf5d,phi0978.phi001.perseus-eng1_61,The Natural History,Pliny the Elder,eng,13,The fruits which fall most readily before they...,"[0.012798937, 0.04470038, -0.036639236, -0.000...",0.015625
8,b55fb58480732273,tlg0008.tlg001.perseus-eng2_4,The Deipnosophists,Athenaeus,eng,62,"But Mnesitheus the Athenian, in his treatise o...","[0.047370914, 0.046967465, 0.0028742205, 0.014...",0.015625
9,225cc3cb2d5649a7,phi0978.phi001.perseus-eng1_61,The Natural History,Pliny the Elder,eng,9,"The terebinth, the maple, and the ash produce ...","[-0.0147363, 0.044617243, -0.026414355, 0.0011...",0.015385


To enforce language selection, it is possible to pass an additional parameter:

In [None]:
language = 'lat'

lancedb_table.search("Quid in Graecia visitare possum?", query_type="vector").where(f"language = '{language}'", prefilter=True).limit(10).to_pandas()

Unnamed: 0,hash_id,identifier,title,author,language,order_text,text,vector,_distance
0,403df54a6f1610a8,phi0474.phi056.perseus-lat1_4,Epistulae ad Familiares,M. Tullius Cicero,lat,37,"R. debeat. memini cum mihi desipere videbare, ...","[0.024182506, 0.049480297, -0.0003535076, 0.00...",0.915521
1,df099218e4c29a5e,phi1254.phi001.perseus-lat2_7,Noctes Atticae,Aulus Gellius,lat,50,CUM Delphos ad Pythia conventumque totius ferm...,"[0.01610997, 0.03416661, -0.033653818, -0.0151...",0.931729
2,48a2332011694a57,phi0474.phi057.perseus-lat1_7,Letters to Atticus,M. Tullius Cicero,lat,20,quin nunc ipsum minime offendisses eius causam...,"[-0.02027511, 0.038216908, -0.024814293, -0.01...",0.934681
3,d898ddbe7d1030d5,phi1056.phi001.perseus-lat1_3,De Architectura,Vitruvius Pollio,lat,8,In his oecis fiunt virilia convivia; non enim ...,"[-0.0017539985, -0.0037579937, -0.050707676, 0...",0.94414
4,4cb9d9f6a6d7f437,phi0474.phi057.perseus-lat1_1,Letters to Atticus,M. Tullius Cicero,lat,49,Maias. eo die pueri tui mihi a te litteras red...,"[0.01593717, 0.030880643, -0.041746616, 0.0224...",0.946214
5,fab55dad2d4fbbb9,phi0474.phi039.perseus-lat1_0,Brutus,M. Tullius Cicero,lat,13,"Cum idem placuisset illis, tum in pratulo prop...","[0.019726437, 0.023641659, -0.0016004019, 0.02...",0.948754
6,59499e177423a04f,phi0474.phi043.perseus-lat1_0,De Republica,M. Tullius Cicero,lat,61,Multa etiam ad luxuriam invitamenta pernicios...,"[0.010349509, 0.03541501, -0.03730126, 0.00274...",0.949987
7,acf6aecbdfbe3527,stoa0255.stoa011.perseus-lat2_0,De Otio,Seneca,lat,13,cum ipse dicat Epicurus aliquando se recessuru...,"[-0.031201635, 0.01622868, 0.012767613, -0.003...",0.952243
8,2ca1a24601028a44,phi0860.phi001.perseus-lat2_2,Historiae Alexandri Magni,"Curtius Rufus, Quintus",lat,41,"Nemo e vobis fastidium Macedonum, nemo vultum ...","[0.0030971675, 0.011397132, -0.05461492, -0.01...",0.96245
9,c39b71e30f98d7b1,phi2331.phi001.perseus-lat2_0,De vita Hadriani,,lat,20,Thence he visited the provinces of Moesia and ...,"[0.025629438, 0.026900563, -0.028726831, 0.044...",0.963596


## Connecting Pleias-Nano

We load the model from HuggingFace.

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from tokenizers.decoders import BPEDecoder

model_name = "PleIAs/Pleias-Nano"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Set the device to GPU if available, otherwise use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

config.json:   0%|          | 0.00/672 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.39G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/6.41k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/4.67M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/739 [00:00<?, ?B/s]

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(65536, 2048)
    (layers): ModuleList(
      (0-21): 22 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear(in_features=2048, out_features=512, bias=False)
          (v_proj): Linear(in_features=2048, out_features=512, bias=False)
          (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=2048, out_features=6144, bias=False)
          (up_proj): Linear(in_features=2048, out_features=6144, bias=False)
          (down_proj): Linear(in_features=6144, out_features=2048, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((2048,), eps=1e-05)
      )
    )
    (norm): 

As you can see it is a llama type model (actually in turn a gpt-neox model since LLama ultimately derives from this family). Yet, it has many specific features that you won't find in other models, including a whole new 65k tokens tokenizer.

We prepare now multiple functions for the RAG generation. Notice that the Pleias RAG model works best with deterministic generation for now on (temperature = 0). Concretely, given a query, you will always get the same answer, unless you tweak it a bit.

In [None]:
import torch
from typing import List, Optional, Union

class OptimizedGenerator:
    def __init__(self, model, tokenizer, device="cuda"):
        self.model = model.to(device)
        self.tokenizer = tokenizer
        self.device = device

        # Ensure tokenizer has proper padding settings
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
        if self.tokenizer.pad_token_id is None:
            self.tokenizer.pad_token_id = 1

    @torch.inference_mode()
    def generate_single(
        self,
        text: str,
        max_new_tokens: int = 1500,
    ) -> str:
        """Generate text for a single input"""
        inputs = self.tokenizer(
            text,
            return_tensors="pt",
            padding=True,
        ).to(self.device)

        outputs = self.model.generate(
            inputs.input_ids,
            attention_mask=inputs.attention_mask,
            max_new_tokens=max_new_tokens,
            repetition_penalty=1.1,
            do_sample=False,
            early_stopping=True,
            use_cache=True,
            pad_token_id=self.tokenizer.pad_token_id,
            eos_token_id=2
        )

        return self.tokenizer.decode(outputs[0], skip_special_tokens=False)

    def __call__(
        self,
        texts: Union[str, List[str]],
        batch_size: int = 4,
        max_new_tokens: int = 1500,
    ) -> Union[str, List[str]]:
        """Convenience method to handle both single and batch inputs"""
        if isinstance(texts, str):
            return self.generate_single(texts, max_new_tokens)
        return self.generate_batch(texts, batch_size, max_new_tokens)

def rag_generation(query, language='eng', max_new_tokens=1500):
    import json
    import torch

    # Ensure pad_token and pad_token_id are set
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    if tokenizer.pad_token_id is None:
        tokenizer.pad_token_id = 1

    #Less results for languages that are neither English nor French
    if language not in ["eng", "fre"]:
      search_results = lancedb_table.search(query).where(f"language = '{language}'", prefilter=True).limit(3).to_pandas()
    else:
      search_results = lancedb_table.search(query).where(f"language = '{language}'", prefilter=True).limit(5).to_pandas()

    rag_submission = [f"<|query_start|>{query}<|query_end|>"]

    citation_list = {}
    for index, result in search_results.iterrows():
      citation_list[result["hash_id"]] = result["hash_id"] + " [" + result["author"] + ", " + result["title"] + "]"
      entry = f"<|source_start|><|source_id_start|>" + result["hash_id"] + "<|source_id_end|>" + result["text"] + "<|source_end|>"
      rag_submission.append(entry)

    prompt = "\n".join(rag_submission) + "\n<|source_analysis_start|>"

    result = generator(prompt)

    return result, citation_list

def rag_processing(query, language):

  rag_result, citation_list = rag_generation(query, language)

  source, generation = rag_result.split("<|source_analysis_start|>")
  analysis, answer = generation.split("<|answer_start|>")
  analysis, answer = analysis.replace("<|source_analysis_end|>\n", ""), answer.replace("<|answer_end|><|end_of_text|>", "")

  # Regular expression to find references
  ref_pattern = r'<ref name="(.*?)">(.*?)</ref>'

  import re

  # Extract references and replace them with numbered calls
  footnotes = []
  def replace_reference(match):
      hash_value, quoted_text = match.groups()
      footnotes.append(f"[{len(footnotes) + 1}] {citation_list[hash_value]}: {quoted_text}")
      return f"[{len(footnotes)}]"

  # Replace references in the text
  formatted_text = re.sub(ref_pattern, replace_reference, answer)

  # Join footnotes into a single string
  footnotes_text = "<br>".join(footnotes)

  from IPython.display import HTML
  import html

  final_text = "<h2>Analysis</h2><i>" + analysis + "</i><br><br><h2>Answer</h2>" + formatted_text + "<br><br><h2>Bibliography</h2>" + footnotes_text

  # Display as HTML
  return HTML(f'<div style="font-size: 1.1em">{final_text}</div>')

generator = OptimizedGenerator(model, tokenizer)

Before testing, it's always good to assess your query first. Results are order by "cosine distance" to the query (so lower is closer):

In [None]:
query = "How to best experience a tragic play?"

results = lancedb_table.search(query).where(f"language = '{language}'", prefilter=True).limit(5).to_pandas()

for i, result in results.iterrows():
  print(f"\nResult {i} (Cosine distance: {result['_distance']:.2f}):")
  print(f"Title: {result['title']}")
  print(f"Author: {result['author']}")
  print(f"Text snippet: {result['text'][:300]}...")


Result 0 (Cosine distance: 1.03):
Title: Ad Lucilium Epistulae Morales
Author: Seneca
Text snippet: Cui permittit necessitas sua, circumspicit exitum mollem: cui ad manum plura sunt, per quae sese adserat. is dilectum agat et qua potissimum liberetur. consideret: cui difficilis occasio est, is proximam quamque pro optima arripiat, sit licet inaudita, sit nova. Non deerit ad mortem ingenium, cui no...

Result 1 (Cosine distance: 1.06):
Title: Heautontimorumenos
Author: P. Terentius Afer
Text snippet: Colman has shown the absurdity of the idea very well in his remarks on this subiect. Any one who considers that the Roman Drama was performed in the open air, will at once see the improbability of such a mode of representation. The Roman Amphitheatre was at any time a disadvantageous arena for the D...

Result 2 (Cosine distance: 1.06):
Title: Institutio Oratoria
Author: Quintilian
Text snippet: age , si de morte filii sui vel iniuria, quae morte sit gravior, dicendum patri fuerit, aut in 

We can now run the text generation. It takes about 30 seconds on a t4 (which is not well optimized yet at all as I'm just using vanilla model generate)

In [None]:
final_rag_result = rag_processing(query, language)
display(final_rag_result)



In [None]:
query= "Quid significat intelligentia artificialis?"
language = "lat"

final_rag_result = rag_processing(query, language)
display(final_rag_result)

In [None]:
query= "Che ci sono da visitare in Sicilia ?"
language = "ita"

final_rag_result = rag_processing(query, language)
display(final_rag_result)