# Introduction

**Using Hugging Face models**

In [None]:
from transformers import pipeline
summarizer = pipeline(task="summarization", model="facebook/bart-large-cnn")

text =
"Walking amid Gion's Machiya wooden houses is a mesmerizing experience. The beautifully preserved structures exuded an old-world charm that transports visitors back in time, making them feel like they had stepped into a living museum. The glow of  laterns lining the narrow streets add to the enchanting ambiance, making each stroll a memorable journey through Japan's rich cultural history."


summary = summarizer(text, max_length=50)




"""

Depending on the model, tokenizer, or text, we may end up with unwanted whitespace in our output.
We can remove this by adding the argument clean_up_tokenization_spaces to the pipeline and setting it to True. Most of today's summarization models do this automatically


summarizer(text, clean_up_tokenization_spaces=True)

"""

In [None]:
"""

Load the model pipeline for a summarization task using the model "cnicu/t5-small-booksum".
Generate the output by passing the long_text to the pipeline; limit the output to 50 tokens.
Access and print the summarized text only from the output

"""

# Load the model pipeline
summarizer = pipeline(task = "summarization", model="cnicu/t5-small-booksum")

# Pass the long text to the model
output = summarizer(long_text, max_length = 50)

# Access and print the summarized text
print(output[0]["summary_text"])

# Using pre-trained LLMs

**Text Generation**

In [None]:
"""

The pad_token_id parameter fills in extra space up to the specified max_length through padding, which adds extra tokens to make all sequences the same length,
ensuring model efficiency. Setting this parameter to the tokenizer's end-of-sequence token ID, learned through training, marks the end of meaningful text.
This setting helps the model recognize where to stop generating, ensuring it only produces text up to the specified length or the end-of-sequence token.
Another parameter, truncation=True, can be added if the input is longer than the maximum length we have set. We won't need it for this example.

"""

In [None]:
"""

The output for this model is retrieved using the generated_text key. Sometimes, the output may be suboptimal if the prompt lacks context.
For instance, if the prompt is too vague or ambiguous, the generated text might not be relevant or coherent.

"""

**Guiding the Output**

In [None]:
"""

We can control the output by being more specific in the prompt or including additional elements to guide the output. Take this book review example.
We've included a response element and combined the review and response into a single prompt using an f-string to guide the model.

"""




generator = pipeline(task="text-generation", model="distilgpt2")
review = "This book was great. I enjoyed the plot twist in Chapter 10."
response = "Dear reader, thank you for your review."
prompt = f"Book review:\n {review}\n\nBook shop response to the review:\n{response}"
print(output[0]["generated_text"])
output = generator(prompt, max_length=100, pad_token_id=generator.tokenizer.eos_token_id)

**Language Translation**

In [None]:
translator = pipeline(task="translation_en_to_es", model="Helsinki-NLP/opus-mt-en-es")
text = "Walking amid Gion's Machiya wooden houses was a mesmerizing experience."
output = translator(text, clean_up_tokenization_spaces=True)
print(output[0]["translation_text"])

In [None]:
"""

You need to generate a response to a customer review found in text; it contains the same customer review for the Riverview Hotel you've seen before.
The pipeline module has been loaded for you.

"""


"""

Instantiate the generator pipeline specifying an appropriate task for generating text.
Complete the prompt by including the text and response in the f-string.
Complete the model pipeline by specifying a maximum length of 150 tokens and setting the pad_token_id to the end-of-sequence token.

"""




# Instantiate the pipeline
generator = pipeline(task = "text-generation", model="gpt2")

response = "Dear valued customer, I am glad to hear you had a good stay with us."

# Complete the prompt
prompt = f"Customer review:\n{text}\n\nHotel reponse to the customer:\n{response}"

# Complete the model pipeline
outputs = generator(prompt, max_length = 150, pad_token_id=generator.tokenizer.eos_token_id, truncation=True)

print(outputs[0]["generated_text"])



In [None]:
"""

Define the pipeline task for Spanish-to-English translation (es_to_en).
Translate the spanish_text using the model pipeline.

"""

spanish_text = "Este curso sobre LLMs se está poniendo muy interesante"

# Define the pipeline
translator = pipeline(task="translation_es_to_en", model="Helsinki-NLP/opus-mt-es-en")

# Translate the Spanish text
translations = translator(spanish_text, clean_up_tokenization_spaces=True)

print(translations[0]["translation_text"])

# Understanding the Transformer

In [None]:
### To understand Transformer's architecture

llm.model
llm.model.config
llm.model.config.is_decoder
llm.model.config.is_encoder_decoder

In [None]:
"""

Question-answering can be either extractive or generative, each requiring a different transformer structure to process input and output correctly.

They use either:

Encoder-only models such as "distilbert-base-uncased-distilled-squad"
Decoder-only models such as "gpt2"
Use your knowledge of common models for specific tasks to select the appropriate one. pipeline is loaded, as well as text on the Mona Lisa.

"""


"""
Use an appropriate model for extractive question-answering.
"""

from transformers import pipeline
question = "Who painted the Mona Lisa?"

qa = pipeline(task="question-answering", model="gpt2")

output = qa(question=question, context=text)
print(output['answer'])

In [None]:
"""

Use an appropriate model for generative question-answering.

"""

question = "Who painted the Mona Lisa?"

# Define the appropriate model
qa = pipeline(task="question-answering", model="gpt2")

input_text = f"Context: {text}\n\nQuestion: {question}\n\nAnswer:"

output = qa({"context": text, "question": question}, max_length=150)
print(output['answer'])