## Hello World
Popular LLM: GPT, BERT, LLaMA

Pre-trained LLMs: huggingface.co

In [2]:
from transformers import pipeline

model_name = 'nlptown/bert-base-multilingual-uncased-sentiment'  # 1-to-5 stars
sentiment_classifier = pipeline("text-classification", model=model_name)
outputs = sentiment_classifier("""Dear seller, 
I got very impresed with the fast delivery and careful packaging of my order. 
Great experience overall, thank you!
""")
print(outputs)



[{'label': '5 stars', 'score': 0.8516697287559509}]


## LLM tasks

Language generation
* Text: create text from scratch with a deep understanding of language and context
* Code: generate code automatically based on requirements

Language understanding
* text classification, sentiment analysis: supervised learning task to classify text into a predefined number of classes
* translation
* summarization
* intent recognition: determine purpose behind a text, e.g. in chatbots
* QA
* Named entity recognition: id and classify entities, i.e. people, places...

### Text generation

In [3]:
# text generation example
llm = pipeline("text-generation")
prompt = "The Gion neighborhood in Kyoto is famous for"
outputs = llm(prompt, max_length=100)
print(outputs[0]['generated_text'])

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The Gion neighborhood in Kyoto is famous for its many historical sights and monuments. There are also temples and churches on the other side of the city. There are also museums and churches all over Kyoto. The city is famous for being a part of the historical period, since the end of World War II. However, there are still many things about Kyoto that are not really known about. Here are some of the main attractions to visit:

- Osaka Station : You get the impression that the station


The following example generates nonsense

In [1]:
from transformers import pipeline

#model_name = 'openai-community/gpt2'
# model_name = 'openai-community/gpt2-medium'
model_name = 'microsoft/phi-2'
prompt = "Conil de la Frontera in Andalucia is famous for"
llm = pipeline("text-generation", model=model_name)
outputs = llm(prompt, max_length=500, truncation=True)
print(outputs)

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00,  1.53s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Conil de la Frontera in Andalucia is famous for its production of olive oil, which is a staple in the region's cuisine.\n\nExercise: What is the main industry in the province of Cáceres?\nAnswer: The main industry in the province of Cáceres is agriculture, particularly the production of olives and olive oil.\n\nExercise: What is the significance of the province of Cáceres in the history of Spain?\nAnswer: The province of Cáceres played a significant role in the Reconquista, the period of Christian reconquest of Spain from the Moors. It was also the site of the Battle of Las Navas de Tolosa, a decisive victory for the Christians in the Reconquista.\n\nExercise: What is the main attraction in the province of Cáceres?\nAnswer: The main attraction in the province of Cáceres is the city of Cáceres, known for its rich history and cultural heritage.\n\nExercise: What is the main industry in the province of Cáceres?\nAnswer: The main industry in the province of Cáceres is 

In [7]:
# Create a pipeline for text generation using the gpt2 model
generator = pipeline("text-generation", model="gpt2")

response = "Dear valued customer, I am glad to hear you had a good stay with us."

# Build the prompt for the text generation LLM
prompt = f"Customer review:\n{text}\n\nHotel reponse to the customer:\n{response}"

# Pass the prompt to the model pipeline
outputs = generator(prompt, max_length=150, pad_token_id=generator.tokenizer.eos_token_id)

# Print the augmented sequence generated by the model
print(outputs[0]['generated_text'])

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Customer review:
Walking amid Gion's Machiya wooden houses is a mesmerizing experience.

Hotel reponse to the customer:
Dear valued customer, I am glad to hear you had a good stay with us. We have enjoyed our stay and I am sure you will enjoy the hospitality at the hotel. Thanks again for the good times ahead.

We are delighted to hear from you and you are looking forward to having your stay with us in Bangkok. We are truly grateful to you for your hospitality and we look forward to serving you in the coming months.

Cheers!


### Text summarization

In [5]:
from transformers import pipeline

#model_name = 'facebook/bart-large-cnn'
model_name = 'cnicu/t5-small-booksum'
llm = pipeline('summarization', model=model_name)
long_text = """
\nThe tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure 
in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower 
surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years 
until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 
300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the 
Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing 
structure in France after the Millau Viaduct.\n
"""
llm(long_text, max_length=50, clean_up_tokenization_spaces=True)

[{'summary_text': 'the Eiffel Tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres'}]

### QA
Takes 2 inputs: Question and Context, to predict an answer.

In [3]:
from transformers import pipeline

llm = pipeline("question-answering")
context = "Walking amid Gion's Machiya wooden houses is a mesmerizing experience."
question = "What are Machiya houses made of?"
llm(question=question, context=context)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.8993020057678223, 'start': 28, 'end': 34, 'answer': 'wooden'}

### Translation

In [1]:
from transformers import pipeline

llm = pipeline("translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")
text = "Walking amid Gion's Machiya wooden houses is a mesmerizing experience."
llm(text, clean_up_tokenization_spaces=True)

  from .autonotebook import tqdm as notebook_tqdm


[{'translation_text': 'Caminar entre las casas de madera Machiya de Gion es una experiencia fascinante.'}]

In [6]:
input_text = "Este curso sobre LLMs se está poniendo muy interesante"

# Define pipeline for Spanish-to-English translation
translator = pipeline("translation_es_to_en", model="Helsinki-NLP/opus-mt-es-en")

# Translate the input text
translations = translator(input_text)

# Access the output to print the translated text in English
print(translations[0]['translation_text'])



This course on LLMs is getting very interesting.
