## Setup and Installation

Before we dive in, we need to set up our environment. This requires installing the Hugging Face Transformers library as well as other dependencies to make it work with paperspace. Normally, you would execute the following command:

In [None]:
!pip install -q --upgrade transformers torch torchvision torchaudio
!pip install -q tokenizers==0.14 evaluate
!pip install -q bitsandbytes transformers accelerate gradio thread6 sacremoses

---
### Recap of the Previous Lesson

In our previous lesson, we introduced the Hugging Face platform and its high-level component called Pipelines. We explored the zero-shot image classification task and how Hugging Face abstracts the complexities, allowing us to perform such tasks with ease.

Today, we'll delve deeper into other practical applications using pipelines, focusing on tasks that are particularly relevant in the business context.


---
### Deep Dive: Text Summarization

Text summarization models, particularly those used in Hugging Face, are often based on the Transformer architecture. This architecture has shown great success in various NLP tasks due to its self-attention mechanism, which allows the model to weigh the importance of different words relative to a given word.

For summarization, models often use a sequence-to-sequence approach. Here's a simplified overview:

1. **Encoder**: The input text (long text) is passed through an encoder, which converts the text into a series of vectors that capture its semantic information.
2. **Decoder**: These vectors are then passed to a decoder, which generates the summarized text word by word.

The self-attention mechanism in the Transformer allows the decoder to focus on different parts of the input text while generating each word of the summary, ensuring that the most relevant information is captured.

<img src="https://i.ytimg.com/vi/9PoKellNrBc/maxresdefault.jpg" width="600" height="400">



### Deep Dive: Text Translation

Modern translation models also leverage the Transformer architecture. The principle is similar to summarization but adapted for translation between languages:

1. **Encoder**: The input text (in the source language) is processed by an encoder, producing a series of vectors that encapsulate its meaning.
2. **Decoder**: A decoder then takes these vectors and generates the translation in the target language, word by word.

A crucial component here is the attention mechanism. As the decoder generates each word in the target language, the attention mechanism allows it to focus on different parts of the source text. This ensures that the translation is contextually accurate and captures nuances in the source text.


<img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2019/01/enc_dec_simple.png" width="600" height="400">

---
### Text Summarization with Pipelines

Text summarization is the process of shortening long pieces of text into a concise summary that retains the most important information. In business, this can be incredibly useful for quickly understanding long reports, articles, or documents.

Let's see how we can use the Hugging Face pipeline for this task.
- ignore the warning


In [None]:
from transformers import pipeline

# Initialize the summarization pipeline
summarizer = pipeline("summarization")

# Use the summarizer on a long piece of text (this is just an example; we'll use a dummy text here)
long_text = """The Mercer survey was conducted between July 31 and August 11, so the results are just an early look at what employers are thinking as they plan for 2024. 
Compensation budgets for next year won’t be finalized until December or even January in some instances. And a lot can change between now and then. The expected pay increases 
for next year reflect “the ongoing tightness of the labor market and low levels of unemployment. However, if the labor market continues to stabilize and inflation cools further as 
we move towards the end of the year, compensation pressures are likely to continue to decline,” said Lauren Mason, a senior principal in Mercer’s career practice."""

summary = summarizer(long_text, max_length=50, min_length=25, do_sample=False)
print("Model output:")
print(summary[0]['summary_text'])

---
### Text Translation with Pipelines

In the context of global business, the ability to translate content into different languages is invaluable. Whether it's for communicating with international partners, translating product information, or understanding foreign market reports, translation plays a key role.

Let's explore how the translation pipeline works.


In [None]:
# Initialize the translation pipeline (translating from English to Spanish in this example)
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es")

# Translate a sample sentence
text_to_translate = "I am a student at Fresno state University"
translated_text = translator(text_to_translate)
print(translated_text[0]['translation_text'])

---
### Customizing Pipeline Outputs

Pipelines in Hugging Face are highly customizable. By tweaking various parameters, we can influence the output. For instance, in text summarization, you might want a shorter or longer summary. By adjusting the `max_length` and `min_length` parameters, you can control the length of the generated summary.

Let's see how this works in practice.


In [None]:
# Summarize a text with different lengths
long_text = """In that case, employers could further trim their planned pay increases. 
Or they could decide to further boost raises and increase promotions if conditions warrant — 
as they did this year, when merit increase budgets were set for a 3.8% boost but employers ended 
up raising base salary levels for employees who remained in their roles by an average of 5.6% instead. 
“This is a result of off-cycle pay increases, which 59% of employers reported providing in 2023. The top 
reasons cited for off-cycle increases were to address retention concerns, counteroffers, market adjustments 
and internal equity,” Mason said. """

short_summary = summarizer(long_text, max_length=30, min_length=10, do_sample=False)
long_summary = summarizer(long_text, max_length=100, min_length=50, do_sample=False)

print("Short Summary:", short_summary[0]['summary_text'])
print("Long Summary:", long_summary[0]['summary_text'])


---
### Understanding Model Limitations

While models like those in the Hugging Face library are incredibly powerful, they're not infallible. It's crucial to understand their limitations, especially in a business context where decisions based on model outputs can have real-world consequences.

1. **Data Bias**: If a model is trained on biased data, its outputs will reflect that bias. This can lead to incorrect or unfair decisions.
2. **Overfitting**: If a model is trained too closely on its training data, it might perform poorly on new, unseen data.
3. **Complexity**: Deep learning models, especially transformers, have millions of parameters. While this makes them powerful, it can also make them prone to "memorizing" rather than "learning" from data.
4. **Resource Intensive**: Transformers are resource-intensive, which can be a concern when deploying them in real-world applications.

In summary, always evaluate model outputs critically and in the context of the specific business problem you're addressing.


---
### Practical Exercise

Now, it's your turn! Here's a business-related text:

> "Global sales have increased by 15% in the last quarter, with particularly strong performance in the Asia-Pacific region. New product launches have been well-received, and marketing campaigns in Europe are showing promise. However, there are concerns about supply chain disruptions in North America."

1. Generate a concise summary of the text.
2. Translate the summary into French.
