Exercise A: Machine Translation (English → French)

In [3]:
from transformers import pipeline

# Initialize translation pipeline
translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")

# Sentences to translate
sentences = [
    "The weather is beautiful today.",
    "I would like to order a coffee, please.",
    "What time does the train to Paris depart?"
]

# Translate and print
for eng in sentences:
    fr = translator(eng)[0]['translation_text']
    print(f"EN: {eng}")
    print(f"FR: {fr}\n")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/301M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

Device set to use cpu


EN: The weather is beautiful today.
FR: Le temps est beau aujourd'hui.

EN: I would like to order a coffee, please.
FR: Je voudrais commander un café, s'il vous plaît.

EN: What time does the train to Paris depart?
FR: A quelle heure le train part-il pour Paris ?



Exercise B: Paraphrasing with T5

In [5]:
from transformers import pipeline

# Initialize paraphrase pipeline
paraphraser = pipeline(
    "text2text-generation",
    model="t5-small"
)

# Input sentence
sentence = "Deep learning models require large amounts of data to train effectively."

# Generate two paraphrases
outputs = paraphraser(
    "paraphrase: " + sentence,
    num_return_sequences=2,
    do_sample=True,
    temperature=0.8,
    max_length=64
)

# Print results
for i, out in enumerate(outputs, 1):
    print(f"Paraphrase {i}: {out['generated_text']}")

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cpu
Both `max_new_tokens` (=256) and `max_length`(=64) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Paraphrase 1: Paraphrase
Paraphrase 2: False


Exercise C: Question Answering over a Context

In [6]:
from transformers import pipeline

# Initialize QA pipeline
qa = pipeline("question-answering", model="deepset/roberta-base-squad2")

# Context and question
context = (
    "Marie Curie was the first woman to win a Nobel Prize, "
    "and she remains the only person awarded Nobel Prizes "
    "in two different scientific fields."
)
question = "In how many fields did Marie Curie win Nobel Prizes?"

# Get answer
result = qa(question=question, context=context)
print(f"Question: {question}")
print(f"Answer: {result['answer']}")

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Device set to use cpu
  return forward_call(*args, **kwargs)


Question: In how many fields did Marie Curie win Nobel Prizes?
Answer: two


Exercise D: Code Generation from Description

In [7]:
from transformers import pipeline

# Initialize code generation pipeline
codegen = pipeline("text2text-generation", model="Salesforce/codet5-small")

# Prompt
prompt = (
    "Write a Python function that takes a list of numbers and returns "
    "the list sorted in ascending order."
)

# Generate code
output = codegen(prompt, max_length=128, do_sample=False)
print(output[0]['generated_text'])

config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/242M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu
Both `max_new_tokens` (=256) and `max_length`(=128) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


 def


Exercise E: Text Infilling with BART

In [9]:
from transformers import pipeline

# Initialize fill-mask pipeline
fill_mask = pipeline("fill-mask", model="facebook/bart-large")

# Incomplete sentence with masks
sentence = (
    "To make the recipe, first preheat the oven to <mask> degrees, "
    "then <mask> the butter and sugar together."
)

# Get top 3 options for each mask
results = fill_mask(sentence, top_k=3)

# Print results
for mask_results in results:
    for res in mask_results:
        print(f"Token: {res['token_str']}, Score: {res['score']:.4f}")
        print(f"Sequence: {res['sequence']}\n")

Device set to use cpu


Token:  350, Score: 0.5342
Sequence: <s>To make the recipe, first preheat the oven to 350 degrees, then<mask> the butter and sugar together.</s>

Token:  375, Score: 0.1138
Sequence: <s>To make the recipe, first preheat the oven to 375 degrees, then<mask> the butter and sugar together.</s>

Token:  400, Score: 0.1020
Sequence: <s>To make the recipe, first preheat the oven to 400 degrees, then<mask> the butter and sugar together.</s>

Token:  melt, Score: 0.0724
Sequence: <s>To make the recipe, first preheat the oven to<mask> degrees, then melt the butter and sugar together.</s>

Token:  mix, Score: 0.0680
Sequence: <s>To make the recipe, first preheat the oven to<mask> degrees, then mix the butter and sugar together.</s>

Token:  in, Score: 0.0573
Sequence: <s>To make the recipe, first preheat the oven to<mask> degrees, then in the butter and sugar together.</s>

