# SWE3011_41 Task2


**Prompt Engineering with Langchain for LLMS**

1. Based on the given code, you need to extend (modify) to a classification task using Langchain.
2. Since large language model baseline is required for the task, you may choose any model from OpenAI API and HuggingfaceHub.

 **OpenAI**: Only available for paid users.

 **HuggingfaceHUB**: Free to use with usage limits (reset hourly).

3. Conduct experiments and document the results in the report. Here, you should consider what kind of prompt design you use so please find some tutorials/resources in our homework description to obtain more information.


**Installation**

In [None]:
pip install datasets

In [None]:
pip install --force-reinstall openai


In [None]:
from getpass import getpass
import os

# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token
HUGGINGFACEHUB_API_TOKEN = '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'

os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN


In [None]:
pip install huggingface_hub



In [None]:
pip install langchain

In [None]:
from sklearn.metrics import accuracy_score, classification_report

In [None]:
from sklearn.metrics import accuracy_score, classification_report

def evaluate_model_nlp(y_pred, y_test):

    accuracy = accuracy_score(y_test, y_pred)

    print(f"Accuracy: {accuracy:.2f}")
    print("Classification Report:")
    print(classification_report(y_test, y_pred))

#evaluate_model(y_pred, y_test)

**1. Load Dataset**


Evaluation should be done using **provided test dataset**

In [None]:
from datasets import load_dataset

# You can use train_ds for few-shot examples
train_ds = load_dataset("glue", "sst2", split="train")

# Evaluation should be done using test_ds
# test_ds = load_dataset("csv", data_files="./test_dataset.csv")['train']

In [None]:
test_ds = load_dataset("csv", data_files="./test_dataset.csv")['train']

**2. Preparing Prompt**

In [None]:
train_ds[123]['sentence']

'proves a lovely trifle that , unfortunately , is a little too in love with its own cuteness . '

In [None]:
test_ds1 = load_dataset("csv", data_files="./responses.csv")['train']

In [None]:
from langchain import PromptTemplate

# Edit Prompt Freely
template = """Question: {question}\nAnswer : """

prompt = PromptTemplate(
    template= template,
    input_variables=["question"]
)

question = "When was the FIFA World Cup first held? "

**3. Inference**

In [None]:
from langchain.llms import HuggingFaceHub, OpenAI
from langchain.chains import LLMChain

In [None]:
def initialize_llm(model_name, api_key=None, openai=False):
    """
    Initialize the model using the langchain library.
    """
    if openai:
      llm = OpenAI(model_name=model_name, openai_api_key=api_key)
    else:
      llm = HuggingFaceHub(repo_id=model_name, model_kwargs={"temperature": 0.22, "max_length": 0.1})

    return llm

In [None]:
def interaction(llm, prompt, question, openai=False):
    """
    Use a templated prompt to get a response from the LLM.
    """
    if openai:
      final_prompt = prompt.format(question=question)
      response = llm(final_prompt)
    else:
      llm_chain = LLMChain(prompt=prompt, llm=llm)
      response = llm_chain.run(question)

    return response

In [None]:
# To Use OpenAI Model
# model_name = "text-davinci-003"
# openai_api_key = "OPENAI_API_KEY" # Make sure to replace with your actual key

# llm = initialize_llm(model_name, api_key=openai_api_key, openai=True)
# response = interaction(llm, prompt, question, openai=True)
# print(f"LLM Output: {response}")

In [None]:
# To Use HuggingfaceHUB Model
model_name = "google/flan-t5-xxl"

llm = initialize_llm(model_name, openai=False)
question = f"""
  I'll give you a sentence.
  Change this sentence to be very specific, long and detailed so that the emotion or content is revealed much better.
  {test_ds[40]['sentence']}
"""
response = interaction(llm, prompt, question, openai=False)
print(f"LLM Output: {response}")



LLM Output: while undisputed isn't exactly a high, it is a gripping, tidy little movie that takes mr. hill higher than he's been in a while.


In [None]:
test_ds[40]['sentence']

"while undisputed is n't exactly a high , it is a gripping , tidy little movie that takes mr. hill higher than he 's been in a while . "

In [None]:
# evaluate_model_nlp(y_pred, y_test)

## zero-shot case 94, 93

In [None]:
model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)
match_count = 0
y_list = []
p_list = []
for i in range(100):
  question =  'Classify whether the following sentence is positive or negative. \n If positive, print 1. If negative, print 0.\n text : ' + test_ds[i]['sentence']
  response = interaction(llm, prompt, question, openai=False)
  #print(response + ' : ' + str(test_ds[i]['label']))
  y_list.append(int(test_ds[i]['label']))
  p_list.append(int(response))
  if (str(response) == str(test_ds[i]['label'])):
      match_count+= 1

print(match_count)

evaluate_model_nlp(y_list, p_list)




94
Accuracy: 0.94
Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.91      0.95        58
           1       0.89      0.98      0.93        42

    accuracy                           0.94       100
   macro avg       0.94      0.94      0.94       100
weighted avg       0.94      0.94      0.94       100



In [None]:
model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)
match_count = 0
y_list = []
p_list = []
for i in range(100):
  question = "considering this conext, determine if the following sentence is more likely to be positive, print 1, if negative, print 0\n " + test_ds[i]['sentence']
  response = interaction(llm, prompt, question, openai=False)
  y_list.append(int(test_ds[i]['label']))
  p_list.append(int(response))
  #print(response + ' : ' + str(test_ds[i]['label']))
  if (str(response) == str(test_ds[i]['label'])):
      match_count+= 1

evaluate_model_nlp(y_list, p_list)
print(match_count)



Accuracy: 0.93
Classification Report:
              precision    recall  f1-score   support

           0       0.94      0.93      0.94        55
           1       0.91      0.93      0.92        45

    accuracy                           0.93       100
   macro avg       0.93      0.93      0.93       100
weighted avg       0.93      0.93      0.93       100

93


## one-shot case : 95

In [None]:
model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)
match_count = 0
y_list = []
p_list = []
for i in range(100):
  question =  """Classify whether the following sentence is positive or negative.
   If positive, print 1. If negative, print 0.
    i give one example.
    text : """ + train_ds[1]['sentence'] + "\n answer : " + str(train_ds[1]['label']) + '\n text : ' + test_ds[i]['sentence'] +'answer : '
  response = interaction(llm, prompt, question, openai=False)
  y_list.append(int(test_ds[i]['label']))
  p_list.append(int(response))
  #print(response + ' : ' + str(test_ds[i]['label']))
  if (str(response) == str(test_ds[i]['label'])):
      match_count+= 1

print(match_count)
evaluate_model_nlp(y_list, p_list)





95
Accuracy: 0.95
Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.95      0.95        55
           1       0.93      0.96      0.95        45

    accuracy                           0.95       100
   macro avg       0.95      0.95      0.95       100
weighted avg       0.95      0.95      0.95       100



## few-shot 3개의 예시  결과 :  92


In [None]:
model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)
y_list = []
p_list = []
match_count = 0
for i in range(100):
  question = 'if sentence is positive, print 1, if negative, print 0. i give 3 test example.\n' + train_ds[1]['sentence'] + ' : ' + str(train_ds[1]['label']) + '\n' + train_ds[2]['sentence'] + ' : ' + str(train_ds[2]['label']) + '\n' + train_ds[3]['sentence'] + ' : ' + str(train_ds[3]['label']) + '\n' + test_ds[i]['sentence'] + '\n'

  response = interaction(llm, prompt, question, openai=False)
  #print(response + ' : ' + str(test_ds[i]['label']))
  y_list.append(int(test_ds[i]['label']))
  p_list.append(int(response))
  if (str(response) == str(test_ds[i]['label'])):
      match_count+= 1

print(match_count)
evaluate_model_nlp(y_list, p_list)





92
Accuracy: 0.92
Classification Report:
              precision    recall  f1-score   support

           0       0.94      0.91      0.93        56
           1       0.89      0.93      0.91        44

    accuracy                           0.92       100
   macro avg       0.92      0.92      0.92       100
weighted avg       0.92      0.92      0.92       100



## few-shot 9개의 예시  결과 :  91




In [None]:
model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)
y_list = []
p_list = []
match_count = 0
for i in range(100):
  question = 'if sentence is positive, print 1, if negative, print 0. i give 9 test example.\n' + train_ds[1]['sentence'] + ' : ' + str(train_ds[1]['label']) + '\n' + train_ds[2]['sentence'] + ' : ' + str(train_ds[2]['label']) + '\n' + train_ds[3]['sentence'] + ' : ' + str(train_ds[3]['label'])+ '\n' + train_ds[4]['sentence'] + ' : ' + str(train_ds[4]['label']) + '\n' + train_ds[5]['sentence'] + ' : ' + str(train_ds[5]['label'])+ '\n' + train_ds[6]['sentence'] + ' : ' + str(train_ds[6]['label'])+ '\n' + train_ds[7]['sentence'] + ' : ' + str(train_ds[7]['label'])+ '\n' + train_ds[8]['sentence'] + ' : ' + str(train_ds[8]['label'])+ '\n' + train_ds[9]['sentence'] + ' : ' + str(train_ds[9]['label'])+ '\n' + test_ds[i]['sentence'] + '\n'
  response = interaction(llm, prompt, question, openai=False)
  y_list.append(int(test_ds[i]['label']))
  p_list.append(int(response))
  #print(response + ' : ' + str(test_ds[i]['label']))
  if (str(response) == str(test_ds[i]['label'])):
      match_count+= 1

print(match_count)
evaluate_model_nlp(y_list, p_list)




91
Accuracy: 0.91
Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.87      0.92        61
           1       0.83      0.97      0.89        39

    accuracy                           0.91       100
   macro avg       0.90      0.92      0.91       100
weighted avg       0.92      0.91      0.91       100



## Chain-of-Thought : 95

In [None]:
model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)
y_list = []
p_list = []
match_count = 0
for i in range(100):
  question = """
  Let's classify positive and negative sentences.
  If the sentence is positive, 1 is output, if it is negative, 0 is output.
  Q: the part where nothing’s happening
  A: This sentence suggests boredom. Also there are no indicators indicating positivity: 0
  Q: {}
  A:
  """.format(test_ds[i]['sentence'])



  response = interaction(llm, prompt, question, openai=False)
  y_list.append(int(test_ds[i]['label']))
  p_list.append(int(response))
  #print(response + ' : ' + str(test_ds[i]['label']))
  if (str(response)[-1] == str(test_ds[i]['label'])):
      match_count+= 1

print(match_count)
evaluate_model_nlp(y_list, p_list)



95
Accuracy: 0.95
Classification Report:
              precision    recall  f1-score   support

           0       0.96      0.95      0.95        55
           1       0.93      0.96      0.95        45

    accuracy                           0.95       100
   macro avg       0.95      0.95      0.95       100
weighted avg       0.95      0.95      0.95       100



##트리거 문장"let's think step by step"을 포함한 CoT : 90

In [None]:
model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)
match_count = 0
y_list = []
p_list = []
for i in range(100):
  question = """
  Let's classify positive(1) and negative(0) sentences.
  let's think step by step
  Q: the part where nothing’s happening
  A: This sentence suggests boredom. Also there are no indicators indicating positivity: 0
  Q: {}
  A:
  """.format(test_ds[i]['sentence'])

  response = interaction(llm, prompt, question, openai=False)
  y_list.append(int(test_ds[i]['label']))
  p_list.append(int(response))

  if (str(response)[-1] == str(test_ds[i]['label'])):
      match_count+= 1

print(match_count)
evaluate_model_nlp(y_list, p_list)



90
Accuracy: 0.90
Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.85      0.91        62
           1       0.80      0.97      0.88        38

    accuracy                           0.90       100
   macro avg       0.89      0.91      0.90       100
weighted avg       0.91      0.90      0.90       100



### Sentiment Clarification



### 기존 모호한 문장을 이해하기 쉬운 문장으로 변환 후 해당 문장들을 bert model에 전달함.

#V1

In [None]:
import csv

model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)


with open('responses12.csv', mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Sentence', 'Response', 'Target'])


    for i in range(100):
        question = f"make this sentence \"{test_ds[i]['sentence']}\" more understandable"
        response = interaction(llm, prompt, question, openai=False)
        target = str(test_ds[i]['label'])
        writer.writerow([test_ds[i]['sentence'], response, target])


#V2

In [None]:
import csv

model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)


with open('responses2.csv', mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Response', 'Target'])


    for i in range(100):
        question = f"If it is difficult to judge whether the following sentence is positive or negative, convert it into a sentence that is easier to judge. If you can clearly judge it, print the existing sentence.\n{test_ds[i]['sentence']}"
        response = interaction(llm, prompt, question, openai=False)
        target = str(test_ds[i]['label'])
        writer.writerow([response, target])


#V3

In [None]:
import csv

model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)


with open('responses17.csv', mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['sentence', 'label'])


    for i in range(100):
        question = f"""
        think step by step.
        You are working on converting the sentences into simpler sentences so that the positives and negatives are more evident.
        If it contains a proverb or idiom, convert it to make it easier to understand.
        If there is an element of reversal, such as but or although, emphasize that part.
        And you must change the sentence in easy word.
        I'll give you three examples.
        example 1 : original : {train_ds[10101]['sentence']}   converted : It is clear that the filmmakers have good intentions, but despite this, the film does not achieve the desired results and in fact has the opposite effect.
        example 2 : original : {train_ds[200]['sentence']}   converted : told in haphazard fashion
        example 3 : original : {train_ds[25]['sentence']}   converted : Enhanced by a brilliantly diverse cast of exuberant and whimsical characters, each bringing their own unique flair and creativity to the ensemble, creating an exceptionally vibrant and captivating experience.

        original : {test_ds[i]['sentence']}.  converted :
        """
        response = interaction(llm, prompt, question, openai=False)
        target = str(test_ds[i]['label'])
        writer.writerow([response, target])




#V4

In [None]:
import csv

model_name = "google/flan-t5-xxl"
llm = initialize_llm(model_name, openai=False)


with open('responses21.csv', mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['sentence', 'label'])

    for i in range(100):
        question = f"""
        think step by step.
        I'll give you a sentence.
        Change below sentence to be very specific, long and detailed so that the semantic or content is revealed much better. and explain it
        {test_ds[i]['sentence']}
        """
        response = interaction(llm, prompt, question, openai=False)
        target = str(test_ds[i]['label'])
        writer.writerow([response, target])
