In [1]:
# DS776 Auto-Update (runs in ~2 seconds, only updates when needed)
# If this cell fails, see Lessons/Course_Tools/AUTO_UPDATE_SYSTEM.md for help
%run ../Course_Tools/auto_update_introdl.py

✅ introdl v1.6.21 already up to date


In [2]:
from transformers import pipeline
from introdl import (
    get_device, 
    wrap_print_text, 
    config_paths_keys,
    llm_generate, 
    clear_pipeline, 
    print_pipeline_info, 
    display_markdown,
    show_session_spending
)

# Wrap print to format text nicely at 120 characters
print = wrap_print_text(print, width=120)

device = get_device()

paths = config_paths_keys()

✅ Environment: Unknown Environment | Course root: /mnt/e/GDrive_baggett.jeff/Teaching/Classes_current/2025-2026_Fall_DS776/DS776
   Using workspace: <DS776_ROOT_DIR>/home_workspace

📂 Storage Configuration:
   DATA_PATH: <DS776_ROOT_DIR>/home_workspace/data
   MODELS_PATH: <DS776_ROOT_DIR>/Lessons/Lesson_07_Transformers_Intro/Lesson_07_Models (local to this notebook)
   CACHE_PATH: <DS776_ROOT_DIR>/home_workspace/downloads
🔑 API keys: 9 loaded from home_workspace/api_keys.env
🔐 Available: ANTHROPIC_API_KEY, GEMINI_API_KEY, GOOGLE_API_KEY... (9 total)
✅ HuggingFace Hub: Logged in
✅ Loaded pricing for 330 OpenRouter models
✅ Cost tracking initialized ($9.92 credit remaining)
📦 introdl v1.6.21 ready



#### L07_2_NLP_Tasks Video

<iframe 
    src="https://media.uwex.edu/content/ds/ds776/ds776_l07_2_nlp_tasks" 
    width="800" 
    height="450" 
    style="border: 5px solid cyan;"  
    allowfullscreen>
</iframe>
<br>
<a href="https://media.uwex.edu/content/ds/ds776/ds776_l07_2_nlp_tasks" target="_blank">Open UWEX version of video in new tab</a>
<br>
<a href="https://share.descript.com/view/omj2ldze713" target="_blank">Open Descript version of video in new tab</a>


# Introduction to NLP Tasks with Transformer Models

In this notebook we'll demonstrate solutions to some common Natural Language Processing (NLP) tasks that use transformer models.  We expand on the material in our NLP textbook Chapter 1 - Hello Transformers.  We'll add a little background about the underlying models.  We'll also demonstrate how these same tasks come be done using a large language model with either "zero-shot prompting" or "few-shot prompting".

Over the next five lessons we'll go into some of these NLP tasks in detail and a learn a bit about the transformer neural network architecture.  For each of the NLP tasks that follows we'll demonstrate how to do the task two ways.  The first is by using a pre-trained transformer-based model downloaded from HuggingFace.  In the second approach we'll use a a large language model and prompting.

Using LLMs for various NLP tasks is common when there isn't much labeled data available.  Zero-shot prompting means that no examples are provided to the LLM.  Few-shot prompting means that a small number of examples are provided to the LLM.  In this notebook we'll demonstrate zero-shot prompting, but in the lessons to come we'll include few-shot prompting examples.

Throughout this notebook we'll use the following customer feedback message that as an example:

In [3]:
# Sample Text
text = """I ordered the Samsung Galaxy S24 Ultra from Tech Haven, expecting next-day delivery, but after three days, I hadn’t even received a shipping update. After waiting 45 minutes on hold, customer service told me there was a stock issue—yet no one had informed me! 

When the package finally arrived a week late, it contained a Google Pixel 8 Pro instead. The support rep was apologetic but said an exchange would take another two weeks.  

I paid $1,200 for the wrong phone, dealt with delays and poor communication, and now have to wait even longer. To add insult to injury, the customer service representative I spoke with seemed indifferent to my frustration. I had to explain my situation multiple times before they even acknowledged the mistake. The entire experience has been incredibly disappointing and has left me questioning whether I should ever shop with Tech Haven again. 

It's baffling how a company can operate with such a lack of transparency and efficiency. I hope this feedback reaches someone who can make a difference, as no customer should have to go through what I did. Tech Haven, you need to do better! Sincerely, Jamie."""

print(text)

I ordered the Samsung Galaxy S24 Ultra from Tech Haven, expecting next-day delivery, but after three days, I hadn’t even
received a shipping update. After waiting 45 minutes on hold, customer service told me there was a stock issue—yet no
one had informed me!

When the package finally arrived a week late, it contained a Google Pixel 8 Pro instead. The support rep was apologetic
but said an exchange would take another two weeks.

I paid $1,200 for the wrong phone, dealt with delays and poor communication, and now have to wait even longer. To add
insult to injury, the customer service representative I spoke with seemed indifferent to my frustration. I had to
explain my situation multiple times before they even acknowledged the mistake. The entire experience has been incredibly
disappointing and has left me questioning whether I should ever shop with Tech Haven again.

It's baffling how a company can operate with such a lack of transparency and efficiency. I hope this feedback reaches
som

## NLP Task - Text Classification

Text classification is the process of assigning predefined categories to text. It involves analyzing the content of the text and categorizing it based on its subject, sentiment, or other criteria. One common application of text classification is sentiment analysis, which determines the sentiment expressed in a piece of text, such as positive, negative, or neutral. Sentiment analysis is widely used in customer feedback analysis, social media monitoring, and market research to gauge public opinion and customer satisfaction.

### Sentiment Analysis with a Specialized Model

Here we will let the HuggingFace transformers library provide its default model for sentiment analysis and apply it to our customer feedback.  

In [4]:
# Sentiment Analysis
print("\n**Sentiment Analysis**")
sentiment_pipeline = pipeline("sentiment-analysis", device=device)
print_pipeline_info(sentiment_pipeline)
sentiment_result = sentiment_pipeline(text)
print(sentiment_result)


**Sentiment Analysis**
Model: distilbert/distilbert-base-uncased-finetuned-sst-2-english, Size: 66,955,010 parameters
[{'label': 'NEGATIVE', 'score': 0.9989209175109863}]


In this case, a "BERT" model correctly classified the customer feedback as negative. BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based model developed by Google. It is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. This allows BERT to understand the context of a word based on its surroundings, making it highly effective for various NLP tasks. The particular model used here is a distilled BERT model that has been fine-tuned on a sentiment dataset. A distilled model is a smaller, faster, and more efficient version of a larger model, trained using knowledge distillation, where the smaller model learns to mimic the outputs of the larger one while retaining most of its performance. In Lesson 9, we'll learn more about the family of transformer models called encoders, which include BERT models.

It's never a bad idea to remove models from memory when they aren't being used:

In [5]:
clear_pipeline(sentiment_pipeline)

✅ Pipeline cleared.



### Sentiment Analysis with an LLM and a Zero-Shot Prompt

A system prompt is used to give instructions to an LLM while a user prompt is the specific input you want the LLM to respond to.  Here we define a system prompt for sentiment analysis

In [6]:
system_prompt = """You are an expert sentiment analysis model. Analyze the sentiment of the following text. 
Give only a one word response: positive, negative, or neutral."""
user_prompt = f"Text: {text}\nSentiment:"

response_zero_shot = llm_generate('gemini-flash-lite', user_prompt, system_prompt=system_prompt)
print(response_zero_shot)

negative


We can also handle batches of inputs:

In [7]:

customer_comments = [
    "Fast shipping and great customer support. Highly recommend!",
    "The item arrived damaged and the return process was a nightmare.",
    "I'm very satisfied with my purchase. Will buy again.",
    "The website is user-friendly and the prices are unbeatable.",
    "Received the wrong item and customer service was unhelpful.",
    "Fantastic experience from start to finish.",
    "The product is okay, but not worth the price.",
    "Excellent quality and quick delivery. Very happy!",
    "The product works as expected, nothing more, nothing less.",
    "I have mixed feelings about the service; it was both good and bad."
]

user_prompts = [f"Text: {comment}\nSentiment:" for comment in customer_comments]

responses_zero_shot = llm_generate('gemini-flash-lite', user_prompts, system_prompt=system_prompt)

for comment, response_zero_shot in zip(customer_comments, responses_zero_shot):
    print(f"Text: {comment}\nSentiment: {response_zero_shot}\n")

Text: Fast shipping and great customer support. Highly recommend!
Sentiment: positive

Text: The item arrived damaged and the return process was a nightmare.
Sentiment: negative

Text: I'm very satisfied with my purchase. Will buy again.
Sentiment: positive

Text: The website is user-friendly and the prices are unbeatable.
Sentiment: positive

Text: Received the wrong item and customer service was unhelpful.
Sentiment: negative

Text: Fantastic experience from start to finish.
Sentiment: positive

Text: The product is okay, but not worth the price.
Sentiment: negative

Text: Excellent quality and quick delivery. Very happy!
Sentiment: positive

Text: The product works as expected, nothing more, nothing less.
Sentiment: neutral

Text: I have mixed feelings about the service; it was both good and bad.
Sentiment: neutral



We'll study text classification more in Lesson 8.

### Learning to Write Better Prompts

There are many prompt engineering resources available on the internet.  I encourage you to look at those as needed.  I've also had good luck asking ChatGPT how to craft prompts.  One particularly useful resource is [ChatGPT Prompt Engineering for Developers](https://app.datacamp.com/learn/courses/chatgpt-prompt-engineering-for-developers) on DataCamp.  It's not free, but it is cheap.  I've viewed parts of this course and found it to be a very good introduction to programatic prompt writing.  It's tailored to ChatGPT, but you can use the OpenAI API with Gemini as well.  Working through this short course (they say it takes 4 hours) is likely worthwhile.

## NLP Task - Named Entity Recognition

Named Entity Recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Practical examples of NER include:
- **Business Application**: Extracting company names, dates, and monetary amounts from financial reports to automate data entry and analysis.
- **Healthcare**: Identifying patient names, medical conditions, and treatment dates from clinical notes to improve patient record management.
- **News Aggregation**: Categorizing and tagging entities like people, places, and events in news articles to enhance search and recommendation systems.

Here we will let the HuggingFace transformers library provide its default model for NER and apply it to our customer feedback.  

In [8]:
# Named Entity Recognition (NER)
print("\n**Named Entity Recognition**\n")
ner_pipeline = pipeline("ner", aggregation_strategy="simple", device=device)
print_pipeline_info(ner_pipeline)
print("")
ner_result = ner_pipeline(text)
print(ner_result)



**Named Entity Recognition**

Model: dbmdz/bert-large-cased-finetuned-conll03-english, Size: 332,538,889 parameters

[{'entity_group': 'MISC', 'score': 0.990973, 'word': 'Samsung Galaxy S24 Ultra', 'start': 14, 'end': 38},
{'entity_group': 'ORG', 'score': 0.994846, 'word': 'Tech Haven', 'start': 44, 'end': 54}, {'entity_group': 'MISC',
'score': 0.9928634, 'word': 'Google Pixel 8 Pro', 'start': 323, 'end': 341}, {'entity_group': 'ORG', 'score': 0.9964845,
'word': 'Tech Haven', 'start': 863, 'end': 873}, {'entity_group': 'ORG', 'score': 0.9887396, 'word': 'Tech Haven',
'start': 1089, 'end': 1099}, {'entity_group': 'PER', 'score': 0.9780477, 'word': 'Jamie', 'start': 1135, 'end': 1140}]


That output is hard to read, but we can easily convert it to a Pandas data frame for display:

In [9]:
import pandas as pd
from IPython.display import display

df = pd.DataFrame(ner_result)
display(df)


Unnamed: 0,entity_group,score,word,start,end
0,MISC,0.990973,Samsung Galaxy S24 Ultra,14,38
1,ORG,0.994846,Tech Haven,44,54
2,MISC,0.992863,Google Pixel 8 Pro,323,341
3,ORG,0.996485,Tech Haven,863,873
4,ORG,0.98874,Tech Haven,1089,1099
5,PER,0.978048,Jamie,1135,1140


The "BERT" model used here is the `dbmdz/bert-large-cased-finetuned-conll03-english` model, which has been fine-tuned on the CoNLL-2003 dataset for Named Entity Recognition (NER). This fine-tuning process allows the model to accurately identify and classify entities such as names of persons, organizations, locations, and more. 

In [10]:
clear_pipeline(ner_pipeline)

✅ Pipeline cleared.


### NER with an LLM and a Zero-Shot Prompt

If we don't have much training data or just want something quick and easy we can also use an LLM to for NER.  Here's an example:

In [11]:
system_prompt = """You are an expert named entity recognition model. Identify and classify the entities in the following text. 
Provide the entities and their types in a JSON format."""
user_prompt = f"Text: {text}\nEntities:"

response_ner = llm_generate('gemini-flash-lite', user_prompt, system_prompt=system_prompt)
print(response_ner)

```json
[
  {"entity": "Samsung Galaxy S24 Ultra", "type": "PRODUCT"},
  {"entity": "Tech Haven", "type": "ORGANIZATION"},
  {"entity": "Google Pixel 8 Pro", "type": "PRODUCT"},
  {"entity": "$1,200", "type": "MONEY"},
  {"entity": "Tech Haven", "type": "ORGANIZATION"},
  {"entity": "Tech Haven", "type": "ORGANIZATION"},
  {"entity": "Jamie", "type": "PERSON"}
]
```


### NER with String Output (Traditional Approach)

Our LLM returned a string containing JSON. Below we parse this string to extract the JSON and display it as a DataFrame. Different LLMs may return different formats which require different parsing strategies.

In [12]:
import json

# Clean the response_ner string
response_ner = response_ner.strip('```json\n').strip('\n```')

# Convert the cleaned response to a DataFrame for display
ner_result = json.loads(response_ner)

# ner_result is already a list - use it directly
df = pd.DataFrame(ner_result)
display(df)

Unnamed: 0,entity,type
0,Samsung Galaxy S24 Ultra,PRODUCT
1,Tech Haven,ORGANIZATION
2,Google Pixel 8 Pro,PRODUCT
3,"$1,200",MONEY
4,Tech Haven,ORGANIZATION
5,Tech Haven,ORGANIZATION
6,Jamie,PERSON


### NER with JSON Mode (Modern Approach)

Alternatively, we can use `mode='json'` in `llm_generate()` to get structured JSON output directly without needing to parse strings. This is more reliable and works with models that support JSON output.  All the suggested models from OpenRouter support JSON output.  Most of them even support JSON output that must conform to a user-defined template.  We'll see more about that in Lesson 8.

In [13]:
system_prompt = """You are an expert named entity recognition model. Identify and classify the entities in the following text. 
Provide the entities and their types in a JSON format."""
user_prompt = f"Text: {text}\nEntities:"

response_ner = llm_generate('gemini-flash-lite', user_prompt, system_prompt=system_prompt, mode='json')

print("Here's the structured JSON output from the model printed in raw format:\n")
print(response_ner)
print("\n That's not too helpful to look at. Let's put it in a DataFrame.")

df = pd.DataFrame(response_ner)
display(df)

Here's the structured JSON output from the model printed in raw format:

[{'entity_name': 'Samsung Galaxy S24 Ultra', 'entity_type': 'PRODUCT'}, {'entity_name': 'Tech Haven', 'entity_type':
'ORGANIZATION'}, {'entity_name': 'Google Pixel 8 Pro', 'entity_type': 'PRODUCT'}, {'entity_name': '$1,200',
'entity_type': 'MONEY'}, {'entity_name': 'Tech Haven', 'entity_type': 'ORGANIZATION'}, {'entity_name': 'Tech Haven',
'entity_type': 'ORGANIZATION'}, {'entity_name': 'Jamie', 'entity_type': 'PERSON'}]

 That's not too helpful to look at. Let's put it in a DataFrame.


Unnamed: 0,entity_name,entity_type
0,Samsung Galaxy S24 Ultra,PRODUCT
1,Tech Haven,ORGANIZATION
2,Google Pixel 8 Pro,PRODUCT
3,"$1,200",MONEY
4,Tech Haven,ORGANIZATION
5,Tech Haven,ORGANIZATION
6,Jamie,PERSON


The output of the LLM is similar to that of the specialized model from HuggingFace.  If we want different output from the LLM we could include instructions for that in our system prompt.

In Lesson 10 we'll learn more about Named Entity Recognition.

## NLP Task - Question Answering

Question Answering (QA) is a subtask of information retrieval and natural language understanding that involves automatically answering questions posed by humans in a natural language. QA systems can be designed to answer questions based on a given context or a large corpus of documents. The goal is to provide accurate and relevant answers to user queries.

Practical examples of QA include:
- **Customer Support**: Providing instant answers to customer queries based on a knowledge base or FAQ, improving response times and customer satisfaction.
- **Education**: Assisting students by answering questions related to their coursework or providing explanations for complex topics.
- **Healthcare**: Offering medical professionals quick access to information from medical literature or patient records to support clinical decision-making.
- **Search Engines**: Enhancing search results by directly providing answers to user queries, rather than just a list of relevant documents.

Here we will let the HuggingFace transformers library provide its default model for QA and apply it to our customer feedback.  

In [14]:

# Question Answering
print("\n**Question Answering**\n")
qa_pipeline = pipeline("question-answering", device=device)
print_pipeline_info(qa_pipeline)
print("")
question = "What is the main issue?"
qa_result = qa_pipeline(question=question, context=text)
print(qa_result)



**Question Answering**



Fetching 0 files: 0it [00:00, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 0 files: 0it [00:00, ?it/s]

Model: distilbert/distilbert-base-cased-distilled-squad, Size: 65,192,450 parameters

{'score': 0.4041374623775482, 'start': 218, 'end': 231, 'answer': 'a stock issue'}


In [15]:
clear_pipeline(qa_pipeline)

✅ Pipeline cleared.


### QA with an LLM and a Zero-Shot Prompt

If we don't have much training data or just want something quick and easy we can also use an LLM to for QA.  Here's an example:

In [16]:
system_prompt_qa = """You are an expert question answering model. Answer the question based on the context provided. Be succinct."""
user_prompt_qa = f"Context: {text}\nQuestion: What is the main issue?\nAnswer:"

response_qa = llm_generate('gemini-flash-lite', user_prompt_qa, system_prompt=system_prompt_qa)
print(response_qa)

The main issue is Tech Haven's mishandling of Jamie's order, including significant delays, poor communication,
delivering the wrong product, and unhelpful customer service.


We don't have a lesson dedicated to question answering, but it's discussed in our NLP textbook in Chapter 7.  You could investigate this topic further in a project if you're interested.

## NLP Task - Translation

The first transformer model in the paper "Attention is All You Need" was designed for the task of language translation. This model, known as the Transformer, introduced a novel architecture that relies entirely on self-attention mechanisms to process input sequences, making it highly effective for translation tasks. The Transformer model has since become the foundation for many state-of-the-art NLP models, including BERT, GPT, and others.

Here we demonstrate how to use a pre-trained model from HuggingFace for translating English to Spanish.  

In [17]:

# Translation (English to Spanish)
print("\n**Translation**\n")
translation_pipeline = pipeline("translation", model="Helsinki-NLP/opus-mt-en-es", device=device)
print_pipeline_info(translation_pipeline)
print("")
translation_result = translation_pipeline(text, max_length=300)
print(translation_result[0]['translation_text'])



**Translation**

Model: Helsinki-NLP/opus-mt-en-es, Size: 77,943,296 parameters

Pedí el Samsung Galaxy S24 Ultra de Tech Haven, esperando la entrega del día siguiente, pero después de tres días, yo ni
siquiera había recibido una actualización de envío. Después de esperar 45 minutos en espera, el servicio al cliente me
dijo que había un problema de existencias — sin embargo nadie me había informado! Cuando el paquete finalmente llegó una
semana tarde, que contenía un Google Pixel 8 Pro en su lugar. El representante de apoyo era apologético, pero dijo que
un intercambio tomaría otras dos semanas. Pagué $1.200 por el teléfono equivocado, trató con retrasos y mala
comunicación, y ahora tienen que esperar incluso más tiempo. Para añadir insulto a la lesión, el representante de
servicio al cliente con el que hablé parecía indiferente a mi frustración. Tuve que explicar mi situación varias veces
antes de que incluso reconocieron el error. Toda la experiencia ha sido increíblemente decepcion

Perhaps you're better than I am at Spanish, but I can't read that well enough to know if it's a good translation.  However, let's now use a similar model to translate it from Spanish back into English and compare it to the orginal text.

In [18]:
# Extract the Spanish translation from the previous result
spanish_translation = translation_result[0]['translation_text']

# Translate the Spanish text back to English
back_translation_pipeline = pipeline("translation", model="Helsinki-NLP/opus-mt-es-en", device=device)
print_pipeline_info(back_translation_pipeline)
back_translation_result = back_translation_pipeline(spanish_translation, max_length=300)
print(f"Back Translation to English: {back_translation_result[0]['translation_text']}\n")
print(f"Original Text: {text}")

Model: Helsinki-NLP/opus-mt-es-en, Size: 77,943,296 parameters
Back Translation to English: I ordered the Samsung Galaxy S24 Ultra from Tech Haven, waiting for delivery the next day,
but after three days, I had not even received a shipping update. After waiting 45 minutes on hold, the customer service
told me that there was a stock problem — yet no one had informed me! When the package finally arrived a week late,
containing a Google Pixel 8 Pro instead. The support representative was apologetic, but he said an exchange would take
another two weeks. I paid $1,200 for the wrong phone, tried with delays and bad communication, and now they have to wait
even longer. To add insult to the injury, the customer service representative with whom I spoke seemed indifferent to my
frustration. I had to explain my situation several times before they even recognized the error. All the experience has
been incredibly disappointing and has left me wondering if I should ever buy with Tech Haven again. It


The Helsinki-NLP models are part of the OPUS-MT project, which provides pre-trained neural machine translation models for many language pairs. These models are based on the MarianMT architecture, a transformer-based model optimized for translation tasks. The MarianMT architecture leverages self-attention mechanisms to effectively process and translate text, making these models highly accurate and efficient for translation tasks.

In [19]:
clear_pipeline(translation_pipeline)
clear_pipeline(back_translation_pipeline)

✅ Pipeline cleared.
✅ Pipeline cleared.


### Translation with an LLM and a Zero-shot Prompt

In [20]:
system_prompt_translation = """You are an expert translation model. Translate the following text from English to Spanish."""
user_prompt_translation = f"Text: {text}\nTranslation:"

response_translation = llm_generate('gemini-flash-lite',
                                    user_prompt_translation, 
                                    system_prompt=system_prompt_translation,
                                    max_tokens=500)
print(response_translation)

Pedí el Samsung Galaxy S24 Ultra en Tech Haven, esperando la entrega al día siguiente, pero después de tres días, ni
siquiera había recibido una actualización de envío. Después de esperar 45 minutos en espera, el servicio al cliente me
dijo que había un problema de stock, ¡pero nadie me había informado!

Cuando el paquete finalmente llegó una semana tarde, contenía un Google Pixel 8 Pro en su lugar. El representante de
soporte se disculpó, pero dijo que un cambio tardaría otras dos semanas.

Pagué $1,200 por el teléfono equivocado, tuve que lidiar con retrasos y mala comunicación, y ahora tengo que esperar aún
más. Para colmo, el representante de servicio al cliente con el que hablé parecía indiferente a mi frustración. Tuve que
explicar mi situación varias veces antes de que reconocieran el error. Toda la experiencia ha sido increíblemente
decepcionante y me ha hecho cuestionar si alguna vez volveré a comprar en Tech Haven.

Es desconcertante cómo una empresa puede operar con tanta fa

In [21]:
system_prompt_back_translation = """You are an expert translation model. Translate the following text from Spanish to English."""
user_prompt_back_translation = f"Text: {spanish_translation}\nTranslation:"

response_back_translation = llm_generate('gemini-flash-lite',
                                         user_prompt_back_translation, 
                                         system_prompt=system_prompt_back_translation,
                                         max_tokens=500)
print(response_back_translation)

I ordered the Samsung Galaxy S24 Ultra from Tech Haven, expecting next-day delivery, but after three days, I hadn't even
received a shipping update. After waiting 45 minutes on hold, customer service told me there was a stock issue — yet no
one had informed me! When the package finally arrived a week late, it contained a Google Pixel 8 Pro instead. The
support representative was apologetic but said an exchange would take another two weeks. I paid $1,200 for the wrong
phone, dealt with delays and poor communication, and now have to wait even longer. To add insult to injury, the customer
service representative I spoke with seemed indifferent to my frustration. I had to explain my situation multiple times
before they even acknowledged the mistake. The entire experience has been incredibly disappointing and has left me
questioning if I should ever purchase from Tech Haven again. It's baffling how a company can operate with such a lack of
transparency and efficiency. I hope this feedback re

We won't specifically study translation models in one of our lessons, the transformer models used for text summarization are similar in that they take an input sequence of text and produce an output sequence of text.  These models are called sequence to sequence transformers.

## NLP Task - Text Generation

Of all the models we'll study, text-generation models are perhaps the most familiar since they are the machines that drive today's chatbots like ChatGPT, Gemini, Claude, and others. Given an input sequence that provides context, a text generation model predicts a likely next word, then does it again and again to generate a hopefully sensible response. These models are particularly useful for tasks such as drafting emails, writing code, creating conversational agents, and generating creative content like stories and poems.

HuggingFace makes it simple to create a text generation pipeline.  Here we provide the original customer comment plus the beginning of customer service response and ask the modlel to generate 200 new tokens.

In [22]:

print("\n**Text Generation**\n")
generator_pipeline = pipeline("text-generation", device=device)
print_pipeline_info(generator_pipeline)
response = "Dear Jamie, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator_pipeline(prompt, max_length=500)
generated_text = outputs[0]['generated_text']
print(generated_text)



**Text Generation**

Model: openai-community/gpt2, Size: 124,439,808 parameters
I ordered the Samsung Galaxy S24 Ultra from Tech Haven, expecting next-day delivery, but after three days, I hadn’t even
received a shipping update. After waiting 45 minutes on hold, customer service told me there was a stock issue—yet no
one had informed me!

When the package finally arrived a week late, it contained a Google Pixel 8 Pro instead. The support rep was apologetic
but said an exchange would take another two weeks.

I paid $1,200 for the wrong phone, dealt with delays and poor communication, and now have to wait even longer. To add
insult to injury, the customer service representative I spoke with seemed indifferent to my frustration. I had to
explain my situation multiple times before they even acknowledged the mistake. The entire experience has been incredibly
disappointing and has left me questioning whether I should ever shop with Tech Haven again.

It's baffling how a company can operate 

Notice that the response includes the input prompt.  This is typical of text-generation models in HuggingFace, but it's easy to remove the input prompt from the output.  If you read the customer service response you can see that it's not very good.  GPT2, released by OpenAI in 2019, is a large transformer-based language model with 1.5 billion parameters. It was designed to generate coherent and contextually relevant text, but it can sometimes produce outputs that are not entirely accurate or appropriate.  Now there are much better text-generation models available in HuggingFace.  

In [23]:
clear_pipeline(generator_pipeline)

✅ Pipeline cleared.


### Text Generation with Other LLMs

`llm_generate` makes it simple to experiment with different models for text generation.  Here's a generated customer service response:

In [24]:
system_prompt_generation = """You are an expert customer service representative. Generate a professional and empathetic response to the following customer feedback. Address the issues mentioned and provide a resolution."""
user_prompt_generation = f"Customer Feedback: {text}\n\nCustomer service response:"

response_generation = llm_generate('gemini-flash-lite',
                                   user_prompt_generation, 
                                   system_prompt=system_prompt_generation,
                                   max_tokens=500)

display_markdown(response_generation)

Dear Jamie,

Please accept our sincerest apologies for the deeply disappointing experience you've had with your recent order of the Samsung Galaxy S24 Ultra. We understand your frustration and disappointment, and we are truly sorry that we failed to meet your expectations, not just once, but on multiple occasions.

Your feedback is incredibly important to us, and we want to assure you that it has been received and is being taken very seriously. We are actively investigating the issues you've highlighted regarding the stock notification, the incorrect item shipped, the extended delivery time, and the communication breakdown throughout your experience.

We are particularly concerned to hear about the lack of proactive communication regarding the stock issue and the subsequent shipment of the wrong device. This is not the standard of service we strive to provide, and we are reviewing our internal processes to prevent such errors from happening in the future. We also regret that you felt the customer service representative you spoke with was indifferent. Our team is trained to be empathetic and supportive, and we will be addressing this feedback with the relevant individuals to ensure all customers feel heard and valued.

We understand that waiting another two weeks for an exchange is unacceptable, especially after the significant inconvenience you've already endured. To rectify this situation promptly, we would like to offer you the following:

1.  **Immediate Exchange & Expedited Shipping:** We will arrange for the correct Samsung Galaxy S24 Ultra to be shipped to you immediately via our fastest available shipping method, at no additional cost. We will also provide a prepaid return label for the Google Pixel 8 Pro, and we will coordinate a pickup at your convenience to minimize further disruption.
2.  **Full Refund of Shipping Costs:** We will refund you the full amount paid for shipping on your original order, as compensation for the delays you experienced.
3.  **A Gesture of Goodwill:** As a further apology for the significant inconvenience and frustration this has caused, we would like to offer you a [e.g., $100 store credit, a complimentary accessory for your new phone, etc. - *choose one appropriate gesture*].

We will also ensure that your case is personally overseen to guarantee the swift and accurate resolution you deserve. Please reply to this email or call us directly at [Your Direct Phone Number] and ask for [Manager's Name or "Customer Resolution Team"] so we can arrange these steps for you immediately.

Jamie, we truly value your business and are

We'll study text generation models in Lesson 11.  

## NLP Task - Summarization

**Natural Language Processing (NLP) summarization** is the process of condensing a longer text into a shorter, more concise version while retaining its key information. There are two main types of summarization: **extractive** and **abstractive**. **Extractive summarization** selects and highlights the most important sentences or phrases directly from the original text without altering their wording. In contrast, **abstractive summarization** generates a new, rephrased summary that conveys the core meaning of the original content in a more natural and human-like manner. While extractive methods rely on ranking techniques, abstractive approaches often leverage deep learning models for text generation.

We'll focus on abstractive summarization using HuggingFace pipelines.  Here we use a summarization model to create a summary of our customer complaint:

In [25]:
# Summarization
print("\n**Summarization**")
summarization_pipeline = pipeline("summarization", device=-1)
print_pipeline_info(summarization_pipeline)
summarization_result = summarization_pipeline(text, max_length=100, min_length=25, do_sample=False)
print(summarization_result)



**Summarization**
Model: sshleifer/distilbart-cnn-12-6, Size: 305,510,400 parameters
[{'summary_text': ' Tech Haven sent a Samsung Galaxy S24 Ultra to Tech Haven, expecting next-day delivery . The package
arrived a week late and contained a Google Pixel 8 Pro instead . The customer service rep was apologetic but said an
exchange would take two weeks .'}]


In [26]:
clear_pipeline(summarization_pipeline)

✅ Pipeline cleared.


BART (Bidirectional and Auto-Regressive Transformers) is a denoising autoencoder for pretraining sequence-to-sequence models. It combines the bidirectional context of BERT with the autoregressive nature of GPT, making it highly effective for various NLP tasks, including text generation and summarization (Don't worry, we're going to make sense of many of those terms in future lessons...). `sshleifer/distilbart-cnn-12-6` is a distilled version of the BART model, specifically fine-tuned on the CNN/DailyMail dataset for abstractive summarization tasks. This model is designed to be smaller and faster than the original BART model while retaining most of its performance, making it efficient for generating concise summaries of longer texts.`sshleifer/distilbart-cnn-12-6` is a distilled version of the BART model, specifically fine-tuned on the CNN/DailyMail dataset for abstractive summarization tasks. This model is designed to be smaller and faster than the original BART model while retaining most of its performance, making it efficient for generating concise summaries of longer texts.

In [27]:
system_prompt_summarization = """You are an expert summarization model. Summarize the following customer feedback in a concise manner."""
user_prompt_summarization = f"Customer Feedback: {text}\n\nSummary:"

response_summarization = llm_generate('gemini-flash-lite',
                                      user_prompt_summarization, 
                                      system_prompt=system_prompt_summarization,
                                      max_tokens=150)
print(response_summarization)

Jamie received the wrong phone (Google Pixel 8 Pro instead of Samsung Galaxy S24 Ultra) a week late due to stock issues
and poor communication from Tech Haven. The subsequent exchange process will take another two weeks, and Jamie found the
customer service to be unhelpful and indifferent, leading to extreme disappointment and a loss of confidence in the
company.


## Some Notes on Using LLMs Programatically

While LLMs can make great all-purpose NLP tools, their use has some drawbacks as well:

1.  They're usually configured to give **human sounding responses** which may not be what you want depending on the task. You'll often have to experiment with the system prompt to get closer to what you want.

2.  **LLMs don't always generate the same output.** We'll learn more about text-generation in Lesson 11, but by default LLMs include some randomness in the generated text. You can usually configure the LLM to use lower temperature values to get more deterministic results. In `llm_generate` you can pass `temperature=0` for more consistent outputs.

3.  It can be **difficult to get an LLM to format the output** in the way that you want. Carefully crafting the system prompt can help, but often some post-processing of the generated text is also necessary. Recent LLMs such as GPT-4o, Claude, and Gemini can produce output following JSON schemas through their APIs, which we'll explore in later lessons.

4.  **LLMs are usually slower than a specialized model.** Especially if you're running the LLM locally. While LLMs continue to improve, often fine-tuning a specialized model is still preferable if you have enough data and resources to do so, but if you don't have much training data or just need something quick, using an LLM programatically can be beneficial.

5.  **Few-shot prompting can improve the results** from an LLM. Providing one or more examples in the prompt can improve the LLM response. You'll explore this a bit in the homework.


**Suggestion:** Try changing the model in the LLM cells above to see how different models perform. For example, change `'gemini-flash-lite'` to `'gpt-4o-mini'`, `'claude-haiku'`, or `'llama-3.3-70b'` in any of the `llm_generate()` calls and rerun the cell to compare results. You can see all available models with `llm_list_models()`.

In [28]:
# to see cost of llms run in this session 
show_session_spending()


💰 Current Session Spending Summary
Total Cost:          $0.000814
Total API Calls:     9
Total Tokens:        2,669 in / 1,369 out

----------------------------------------------------------------------
By Model:
  google/gemini-2.5-flash-lite
    Cost: $0.000814 | Calls: 9 | Tokens: 2,669 in / 1,369 out

----------------------------------------------------------------------
Total Spent this session: $0.000814
Approximate Credit remaining: $9.92
(Note: This balance may not reflect the most recent spending)

