<a href="https://colab.research.google.com/github/OdysseusPolymetis/digital_classics_course/blob/main/8_bases_transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Les Transformers, quelques exemples de base**
---


Les transformers permettent d'encapsuler les mots dans leur contexte. On les appelle des transformers parce qu'un seul modèle peut servir pour faire plein de choses (détection d'entités nommées, analyse de sentiment, génération de texte, etc.). Voilà comment on les fait marcher, simplement.

<img src='https://drive.google.com/uc?export=view&id=1sGd6mhCoT3eEYgwz776OmckPrvsEcbdB' width="1000">

<img src='https://drive.google.com/uc?export=view&id=1hjziU7wTdSiySpMFuFIf4xi1Dxn5Hgbt' width="1000">

Voici donc quelques exemples.

Le premier exemple part d'un XLM Roberta multilingue (il couvre près de 100 langues, dont le latin).

In [None]:
from transformers import pipeline

clf = pipeline(
    "zero-shot-classification",
    model="joeddav/xlm-roberta-large-xnli"
)

text = "Traité de paix entre le Roi, le roi d'Espagne et le roi de la Grande Bretagne, avec l'accession du roi de Portugal	France. Secrétariat d'Etat aux affaires étrangères (1589-1791). Auteur du texte	imp. royale (Paris)	1763"
labels = ["justice", "computers", "ebook"]
res = clf(text, candidate_labels=labels, hypothesis_template="This text is {}.")
print(res)

Some weights of the model checkpoint at joeddav/xlm-roberta-large-xnli were not used when initializing XLMRobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


{'sequence': "Traité de paix entre le Roi, le roi d'Espagne et le roi de la Grande Bretagne, avec l'accession du roi de Portugal\tFrance. Secrétariat d'Etat aux affaires étrangères (1589-1791). Auteur du texte\timp. royale (Paris)\t1763", 'labels': ['justice', 'computers', 'ebook'], 'scores': [0.48937487602233887, 0.3095439672470093, 0.20108120143413544]}


In [None]:
from transformers import pipeline

text = "Odi et amo; quare id faciam, fortasse requiris."
labels = ["positive", "negative", "neutral"]
res = clf(text, candidate_labels=labels, hypothesis_template="This text is {}.")
print(res)

In [None]:
!pip install llama-cpp-python

In [None]:
from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="tensorblock/hathibelagal_llama-3.2-latin-GGUF",
	filename="llama-3.2-latin-Q5_K_M.gguf",
  n_ctx=1024,
  n_gpu_layers=-1,
  verbose=False
)
output = llm(
	"Odi et amo; quare id faciam, fortasse requiris.",
	max_tokens=120,
  repeat_penalty=1.1
)
print(output)

In [None]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("Magna spes in <mask> est.", top_k=2)

In [None]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("Caesar in Galliam profectus est.")

In [None]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Marianne and I usually do my stuff at the ENS de Lyon.",
)

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of
    graduates in traditional engineering disciplines such as mechanical, civil,
    electrical, chemical, and aeronautical engineering declined, but in most of
    the premier American universities engineering curricula now concentrate on
    and encourage largely the study of engineering science. As a result, there
    are declining offerings in engineering subjects dealing with infrastructure,
    the environment, and related issues, and greater concentration on high
    technology subjects, largely supporting increasingly complex scientific
    developments. While the latter is important, it should not be at the expense
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other
    industrial countries in Europe and Asia, continue to encourage and advance
    the teaching of engineering. Both China and India, respectively, graduate
    six and eight times as many traditional engineers as does the United States.
    Other industrial countries at minimum maintain their output, while America
    suffers an increasingly serious decline in the number of engineering graduates
    and a lack of well-educated engineers.
"""
)

In [None]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est fait pour EnExDi et les Humanités Numériques.")