![Banner](img/AI_Special_Program_Banner.jpg)

## Introduction to LLMs - Material 5: Using an LLM for Sentiment Analysis
---

As we already know, sentiment analysis is a specific form of *text classification*. In this notebook we will see that LLMs are quite well suited for this task.

## Overview
- [Using OpenAI for Text Classification](#Using-OpenAI-for-Text-Classification)
  - [Trying it out on the movie review dataset](#Trying-it-out-on-the-movie-review-dataset)
- [Using an Open Source LLM for sentiment analysis](#Using-an-Open-Source-LLM-for-sentiment-analysis)
  - [Prompt engineering](#Prompt-engineering)
  - [Fine tuning](#Fine-tuning)
 
[last notebook](3.5.a_6_LC_local_Sentiment.ipynb)

---

## Using OpenAI for Text Classification
Let us once again set up OpenAI for our sentiment analysis task.

In [1]:
import torch
from transformers import pipeline
from langchain.llms import HuggingFacePipeline
from langchain.llms import OpenAI

import warnings
warnings.filterwarnings('ignore')

In [2]:
tokenfile = open("oai-coss-token", "r")
oai_token = tokenfile.read().strip()
tokenfile.close()

In [3]:
davinci = OpenAI(openai_api_key=oai_token, temperature=1.0, max_tokens=512)
gpt35 = OpenAI(openai_api_key=oai_token, model_name="gpt-3.5-turbo-instruct", temperature=1.0, max_tokens=512)

Let us now use a way of prompting found, e.g., 
[here](https://medium.com/discovery-at-nesta/how-to-use-gpt-4-and-openais-functions-for-text-classification-ad0957be9b25). 

In [4]:
from langchain import PromptTemplate
cls_template = """Classes: [`positive`, `negative`]
Text: {input}

Classify the text into one of the above classes."""

cls_prompt = PromptTemplate(
        template=cls_template,
    input_variables=['input']
)

**Remark**: If this works, we could just apply *prompt engineering* again and get a text classifier instead of a text generator!

In [5]:
from langchain import LLMChain
cls_chain = LLMChain(
    prompt=cls_prompt,
    llm=gpt35
)

In [6]:
opinion = "I think the Niners QB stinks!"
cls_chain.invoke(opinion)

{'input': 'I think the Niners QB stinks!', 'text': '\n\nNegative'}

In [7]:
opinion = "Brock Purdy is an awesome QB!"
cls_chain.invoke(opinion)

{'input': 'Brock Purdy is an awesome QB!', 'text': '\n\nClass: Positive'}

### Trying it out on the movie review dataset

In [8]:
import numpy as np
import pandas as pd

In [9]:
df = pd.read_csv('data/movie_data.csv', encoding='utf-8')
df.tail()

Unnamed: 0,review,sentiment
49995,"OK, lets start with the best. the building. al...",0
49996,The British 'heritage film' industry is out of...,0
49997,I don't even know where to begin on this one. ...,0
49998,Richard Tyler is a little boy who is scared of...,0
49999,I waited long to watch this movie. Also becaus...,1


In [10]:
target = df.pop('sentiment')
values = df.pop('review')

target[2], values[2]

(0,
 '***SPOILER*** Do not read this, if you think about watching that movie, although it would be a waste of time. (By the way: The plot is so predictable that it does not make any difference if you read this or not anyway)<br /><br />If you are wondering whether to see "Coyote Ugly" or not: don\'t! It\'s not worth either the money for the ticket or the VHS / DVD. A typical "Chick-Feel-Good-Flick", one could say. The plot itself is as shallow as it can be, a ridiculous and uncritical version of the American Dream. The young good-looking girl from a small town becoming a big success in New York. The few desperate attempts of giving the movie any depth fail, such as the "tragic" accident of the father, the "difficulties" of Violet\'s relationship with her boyfriend, and so on. McNally (Director) tries to arouse the audience\'s pity and sadness put does not have any chance to succeed in this attempt due to the bad script and the shallow acting. Especially Piper Perabo completely fails in

In [11]:
opinion = values[2]
cls_chain.invoke(opinion)

{'input': '***SPOILER*** Do not read this, if you think about watching that movie, although it would be a waste of time. (By the way: The plot is so predictable that it does not make any difference if you read this or not anyway)<br /><br />If you are wondering whether to see "Coyote Ugly" or not: don\'t! It\'s not worth either the money for the ticket or the VHS / DVD. A typical "Chick-Feel-Good-Flick", one could say. The plot itself is as shallow as it can be, a ridiculous and uncritical version of the American Dream. The young good-looking girl from a small town becoming a big success in New York. The few desperate attempts of giving the movie any depth fail, such as the "tragic" accident of the father, the "difficulties" of Violet\'s relationship with her boyfriend, and so on. McNally (Director) tries to arouse the audience\'s pity and sadness put does not have any chance to succeed in this attempt due to the bad script and the shallow acting. Especially Piper Perabo completely fai

Now we have all the ingredients to use OpenAI's models for sentiment analysis (and check how well they perform) ...


## Using an Open Source LLM for sentiment analysis

If we look at the section "Supported Tasks" of [Hugging Face hub's inference guide](https://huggingface.co/docs/huggingface_hub/guides/inference), we find that [`text-classification`](https://huggingface.co/tasks/text-classification) is also an option. So, let's see what we can achieve with Zephyr-7B-beta again.


In [12]:
from langchain_community.llms import HuggingFaceHub

In [13]:
tokenfile = open("hftoken", "r")
hf_token = tokenfile.read().strip()
tokenfile.close()
os_model="HuggingFaceH4/zephyr-7b-beta"

In [14]:
cls_llm = HuggingFaceHub(
    huggingfacehub_api_token=hf_token,
    repo_id=os_model, 
    model_kwargs={"task": 'text-classification'}
)

In [15]:
cls_sys_prompt = "You are an expert in sentiment analysis. "
cls_sys_prompt += "You always answer with 'positive' or 'negative' depending on the sentiment of the opinion."
opinion = "The Niners QB stinks!"

In [16]:
def get_prompt(system_input,user_input):
    prompt = '<|system|>\n'
    prompt += f'{system_input}</s>\n'
    prompt += '<|user|>\n'
    prompt += f'{user_input}</s>\n'
    prompt += '<|assistant|>\n'
    return prompt

In [17]:
print(cls_llm.invoke(get_prompt(cls_sys_prompt, opinion)))

Negative sentiment.


In [18]:
opinion = "Brock Purdy is an awesome QB!"
print(cls_llm.invoke(get_prompt(cls_sys_prompt, opinion)))

My response based on sentiment analysis: 'positive'

Brock Purdy is an exceptional quarterback, and his performance has been highly praised. The use of the word 'awesome' further emphasizes the positive sentiment. Overall, the statement conveys a very positive opinion about Brock Purdy's abilities as a quarterback.


In [19]:
opinion = values[2]
print(cls_llm.invoke(get_prompt(cls_sys_prompt, opinion)))

Based on the review, the sentiment of the opinion is negative. The reviewer strongly advises against watching the movie, calling it a "Chick-Feel-Good-Flick" with a shallow plot and poor acting. The reviewer also criticizes the director's attempts to add depth to the movie and mentions that the only good thing about it is John Goodman's performance. The reviewer compares the movie unfavorably to other productions by Jerry Bruck


### Prompt engineering

So, the open source LLM does seem to get the sentiments right as well, but the answer is too verbose. Apparently, it did not understand that we only want "positive" or "negative" as an answer. We should try to apply some *prompt engineering* again. Let's therefore be more clear and throw out the bit about the LLM being an expert and see what happens.

In [20]:
cls_sys_prompt = "Answer with only one word, either 'positive' or 'negative', "
cls_sys_prompt += "depending on the sentiment of the opinion."
print(cls_llm.invoke(get_prompt(cls_sys_prompt, opinion)))

Negative.


In [21]:
opinion = "Brock Purdy is an awesome QB!"
print(cls_llm.invoke(get_prompt(cls_sys_prompt, opinion)))

Positive.


That looks pretty neat! So, the open source LLM also seems up to the task and we could employ that one instead of OpenAI!

### Fine tuning

We shall now try to use the model running locally. This will lead us to fine tuning ...

---
<div style="color:red"><b>Attention:</b></div> 
Do <b>not</b> try to run this part of the notebook on the university server! Or, if you try to do it, make sure no one else is using the same GPU as you are!

---

Since we should not use the HuggingFace hub for production, we should just use the local version of Zephyr-7B-beta. That should be quite straight forward ...

In [22]:
!nvidia-smi | grep MiB

|  0%   48C    P8              17W / 450W |     28MiB / 24564MiB |      0%      Default |
|    0   N/A  N/A      1228      G   /usr/lib/xorg/Xorg                           18MiB |


In [23]:
cls_pipe = pipeline(
    task="text-classification", 
    model="HuggingFaceH4/zephyr-7b-beta", 
    torch_dtype=torch.bfloat16, 
    device=0)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Some weights of MistralForSequenceClassification were not initialized from the model checkpoint at HuggingFaceH4/zephyr-7b-beta and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Well, that is a bit of a bummer, but let us just see what happens if we try to use it anyway.

In [24]:
!nvidia-smi | grep MiB

|  0%   49C    P2              74W / 450W |  14748MiB / 24564MiB |     76%      Default |
|    0   N/A  N/A      1228      G   /usr/lib/xorg/Xorg                           18MiB |
|    0   N/A  N/A   1425075      C   ...vs/miniconda3/envs/pt_21/bin/python    14716MiB |


In [25]:
from langchain.llms import HuggingFacePipeline
cls_llm_local = HuggingFacePipeline(pipeline=cls_pipe)

In [26]:
print(cls_llm_local.invoke(get_prompt(cls_sys_prompt, opinion)))

ValueError: Got invalid task text-classification, currently only ('text2text-generation', 'text-generation', 'summarization') are supported

So, we would actually have to perform the training. However, be aware that this would not require us to train the model from scratch, but rather perform some kind of *transfer learning* (like in the case of the Nvidia examples or for the Covid-19 detection). In the context of LLMs, this is called *fine tuning*, which GPT 4 (correctly) explains in the following way:

**Fine-tuning** in the context of LLMs refers to a process of training the model further on a specific dataset to tailor its responses to a particular domain or set of tasks. This process adjusts the model's parameters to better align with the nuances, terminology, and style of the targeted application. 

Here are the key aspects of fine-tuning:

1. **Targeted Dataset**: Fine-tuning involves using a specialized dataset that is representative of the specific tasks or domain for which the model is being optimized. For example, a legal dataset for a law-focused application, or customer service dialogues for a customer support chatbot.

2. **Adapting to Specific Requirements**: The goal is to adapt the model to better understand and respond to the types of queries or language used in a specific context. This can improve accuracy, relevance, and effectiveness in specialized applications.

3. **Continued Learning**: While LLMs are initially trained on vast, diverse datasets, fine-tuning is a form of continued learning where the model adjusts to more specific data, potentially improving its performance on related tasks.

4. **Balance of General and Specific Knowledge**: Fine-tuning needs to be carefully managed to retain the model's general language abilities while enhancing its performance in specific areas. Overfitting to the fine-tuning dataset can reduce the model's effectiveness in handling more general queries.

5. **Customization for Business Needs**: Businesses and developers often use fine-tuning to customize models for particular products, services, or user experiences, tailoring the model's responses to fit brand voice, technical requirements, or user demographics.

6. **Ethical Considerations**: Fine-tuning must also be managed with regard to ethical considerations, ensuring that the model does not learn biases or undesirable behaviors from the fine-tuning dataset.

In essence, while prompt engineering is about crafting the right input to get the best possible output from a general-purpose model, fine-tuning is about modifying the model itself to be more specialized and effective for specific tasks or domains.

In our concrete situation, we might try to modify the respective [tutorial on HuggingFace](https://huggingface.co/docs/transformers/tasks/sequence_classification) to do so, if we wanted to go through with it.

However, there is one more idea to try in the [next notebook](3.5.a_6_LC_local_Sentiment.ipynb).