### Overview
In this notebook, I will continue working on prompting large language models (LLMs) such as ChatGPT. Language model prompting is the process of providing a model with an input text and then having the model generate a response as output. Prompting can be used to complete breadth of tasks such as summarization (e.g., “Summarize the following paragraph: <paragraph written out here> Summary:”) or extraction (e.g., “Extract the phone number from the user bio: <bio written out here>.”).

The task we are trying to solve in this homework is text rewriting. Specifically, our goal is to rewrite GoFundMe fundraising pitches to different emotions (e.g., sad). There is no off-the-shelf solution. Most studies in the rewriting tasks focus on a particular transformation type within the boundaries of single sentences. However, a fundraising pitch contains 170 words on average (calculated based on pitches in the medical category: https://www.gofundme.com/discover/medical-fundraiser) and each one is a coherent story. We will explore and develop prompts and fine-tuning strategies to enable LLMs to better rewrite GoFundMe fundraising pitches to different emotions.

The notebook consists of two parts:

1.   Explore prompt engineering: zero-shot, few-shot, and chain-of-thought
2.   Explore strategies for LLM fine-tuning


## Part 0: Task, Data, and Setup


### Task
Rewrite GoFundMe fundraising pitches to different emotions. We work with the following eight types of emotions: “fear, love, sadness, surprise, optimism, gratitude, anger, joy”


###Data

We provide you with two datasets to play with:
1.   Development set. We provide you with eight random stories from GoFundMe.
2.   A sample of 800 stories from GoFundMe that are associated with the eight emotions.







### Setup

1. Install dependencies

In [None]:
!pip3 install openai

Collecting openai
  Downloading openai-1.23.2-py3-none-any.whl (311 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.2/311.2 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 ht

2. Import necessary libraries, define helper function, and data

Next, import the necessary libraries into your Jupyter notebook. You will need to import the following libraries

In [None]:
!pip install python-dotenv

Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1


In [None]:
from openai import OpenAI

# This is my personal API keys. Please consider use your own when running
# experiments.

client = OpenAI(
    api_key = ''
)

We use OpenAI's gpt-3.5-turbo model and the chat completions endpoint in this example. You can also use gpt-4 but please be mindful of usage.

This helper function will make it easier to use prompts and look at the generated outputs:

In [None]:
model_version = "gpt-3.5-turbo" # GPT-3.5

def get_completion(prompt, model=model_version):
  completion = client.chat.completions.create(
      model = model_version,
      messages=[
          {
              "role": "user",
              "content": prompt,
          }
      ],
      temperature=0, # this is the degree of randomness of the model's output
  )
  return completion.choices[0].message.content

## Part I: Prompt Development

In Part I, please develop and improve upon prompts to rewrite GoFundMe fundraising pitches to different emotions, using the development set. Your deliverables include:
1.    Process. A description of different prompting strategies/prompts you tried in arriving at your final prompts.
2.    Prompts. You should propose at least 3 prompts.
3.    Results. For each prompt, report the rewritten text of the stories in the development set and your ratings of the rewritten text on a scale from 1 to 5, where 1 means not good and 5 means good. We ask you to privide three ratings: 1) instruction success: whether the rewrite accurately follows the instruction and conveys the target emotion that you asked for; 2) content preservation: wehther the content of the story (e.g., events, figures, topics) is preserved in the rewrite, independent of the emotion conveyed in the story ; 3) overall rating, considering both 1) and 2) and many other factors that you consider important for text rewrite task. See “Other things to consider” under “Useful resources”.

4.    Error analysis. Analyze the trends you observe from the results of all the prompts. For example, why do some prompts work better than others? What are some common problems that exist in the prompts?  Do some prompts do better on a specific case than others? Finally, what did you learn from this prompting engineering exercise?




**Note: You do not need to try your prompts on all the stories in the development set and all the emotions (i.e., 8 x 8). Feel free to work on a subset of the stories and on a few emotions, considering the time you have for this homework.**

#### **Submission**

Please report your prompts and results in the excel file `prompt_analysis.xlsx` and submit it on Canvas.

#### Useful resources

1.   Prompting strategies: zero-shot, few-shots, chain-of-thought, and others (https://platform.openai.com/docs/guides/prompt-engineering, https://www.promptingguide.ai/)
2.   Other things to consider
  *   Text complexity
  *   Length
  *   Content preservation: whether the rewritten text preserves the essential content and meaning of the source text
  *   Factuality: The rewrite only provides as much information as is present in the reference, without adding anything. It is not misleading and does not make any false statements.
  *   Coherence: The rewrite is easy to understand, non-ambiguous, and logically coherent.
  *   Fluency: Examines the clarity, grammar, and style of the written answer.
3.    Examples and related papers
  *   Examples in Lab 6 Prompt Engineering using OpenAI API
  *   Figure 7 – Figure 19 in the appendix of paper [3]
  *   Paper [4]


In [None]:
import pandas as pd

gofound = pd.read_csv("gofundme_sample.csv")
gofound.head(10)

Unnamed: 0,message,emotion
0,"Well, another seizure decided to sneak up on m...",fear
1,Not alot know but my sister under went a tripl...,fear
2,This morning my mom was rushed to the hospital...,fear
3,Hello everyone. I am having surgery on my hear...,fear
4,Helen is a 16 year old girl recently diagnosed...,fear
5,Hello my name is Jeremy Stone and I'm humbly a...,fear
6,Well on June 15 2021 he had his first tonic se...,fear
7,"Hello, my name is Anthony, and I'm fundraising...",fear
8,"Hi, My name is Pamela Flores (Pam). I was empl...",fear
9,"Hi everyone my Katherine I am 26 yrs old ,sing...",fear


In [None]:
gofound.iloc[0,0]

"Well, another seizure decided to sneak up on me and hit me with its best shot! Again, I do not have any recollection of this incident. I was told that I was blue when they found me, gray and arm was cold as I lay on the gurney! I had to be resuscitated twice and a 3rd time in the ICU! My brain had swelling so I had to be iced up like a keg cooler! They didn't know if I was going to come out of it, and if I did, what kind of condition would I be in? Brain dead? Vegetable? Could understand but mobility all screwed up? I scared the hell put of my poor family who drove in from everywhere to be by my side until I got through this! They are absolutely amazing and I have to give a HUGE shout out to the fucking rock of our family, the most selfish, positive, unbelievably head strong woman to walk this 3rd rock.... MOM! You never get enough thanks! We'd all be lost without you and that's no joke!"

In [None]:
################################################################################
# TODO: Fill in your codes for prompt engineering                                                     #                                                          #
################################################################################

emotion='surprise'
text = '''
The reason for this GoFundMe is to help me live a normal life after I had my left leg amputated from diabetes.
Currently, I cannot access my home on the second story due to all of the stairs, so I moved in with my Son and his family.
Insurance does not cover a stairlift so I am asking for your help.
About myself: I am a 67 year old diabetic who struggles with many normal life activities.
I dropped out of school in 8th grade, moved out at age 14, met my wife at 18 & started a family shortly after. Now,
I am a retired bus driver but nothing will stop me from getting up and moving. I love to work and can fix pretty much, anything.
I love telling jokes to make people laugh & prior to my leg amputation, I would donate my time to Woburn pop Warner football, where my grandkids played.
My wife and I have 6 kids and 13 grandchildren. My favorite past time is camping and I have a trailer parked in Salisbury that I hope to get back to one day.
Funds raised will help me purchase a stairlift so I can get back home.
I cannot express the gratitude I feel just knowing that people care.
Every $1 counts so donât feel bad or embarrassed if thatâs all you can donate, I appreciate everything & everyone.
'''
prompt1 = f"""Please rewrite the text in {emotion} emotion
Text: ```{text}```
"""
prompt2 = f""" Rewrite the following text to evoke a sense of {emotion}, using vivid imagery, evocative language, and a compelling narrative style.
Text: ```{text}```
"""
prompt3= f"""
System: You are an expert in emotional psychology and have a knack for crafting stories that evoke specific emotions.
Text: ```{text}```
Tasks and Steps: The provided text is a story from a GoFundMe campaign. Your task is to rewrite the story to convey the emotion {emotion}. Follow the steps below to guide your writing process:

Step 1: Begin by describing the content and main events of the original story. Highlight key details such as the protagonist's situation, challenges they face, and the purpose of the fundraising campaign.
Step 2: Consider the emotions conveyed in the current story. Reflect on the tone, language, and events that evoke specific feelings in the reader. Identify the predominant emotion or emotions present in the text.
Step 3: Now, envision how you would rewrite the same story to convey the emotion {emotion}. Think about how changes in language, imagery, and narrative style can evoke the desired emotional response in the reader. Consider the protagonist's perspective, their struggles, and the impact of the reader's support on their journey.
"""


respone = get_completion(prompt1)
print('prompt1: ', respone)

respone = get_completion(prompt2)
print('prompt2: ', respone)

respone = get_completion(prompt3)
print('prompt3: ', respone)



prompt1:  Oh my goodness! I can't believe the support I've received on this GoFundMe! After having my left leg amputated from diabetes, I never imagined I would be able to live a normal life again. But thanks to all of you, I may be able to get a stairlift and finally access my home on the second story. I am overwhelmed with gratitude for all the help I've received. Every dollar counts and I appreciate each and every one of you who has donated. Thank you from the bottom of my heart!
prompt2:  As I sit here, grappling with the reality of my left leg being amputated due to diabetes, I am faced with a daunting challenge - how to navigate the stairs to my home on the second story. Unable to access my own space, I have sought refuge with my Son and his family. The insurance company has turned a blind eye to my plight, refusing to cover the cost of a stairlift. And so, I find myself reaching out to you, dear reader, in a desperate plea for help.

Let me paint a picture for you - I am a 67-ye

In [None]:
### gofound.iloc[2,0]

emotion='fear'
text = '''
Hi, my name is Debbi and Iâm fundraising for my brother Donald Webb.
The sad news is that he is only 51 years old, and in 2016 he suffered a series of strokes that resulted in loss of motor skills and memory loss.
Recently he was sent to the hospital because his kidneys are failing and is heading toward dialysis.
This is in addition to other failing health factors including high blood pressure and diabetes.
He needs to stay in the hospital for an extend amount of time.
Each day he is in the hospital costs $450 to hold his bed at The Regency at Shelby in Shelby Township Mi, his care facility.
Any help given would make sure he is cared for.
'''

prompt1 = f"""Please rewrite the text in {emotion} emotion
Text: ```{text}```
"""
prompt2 = f""" Rewrite the following text to evoke a sense of {emotion}, using vivid imagery, evocative language, and a compelling narrative style.
Text: ```{text}```
"""

prompt3= f"""
System: You are an expert in emotional psychology and have a knack for crafting stories that evoke specific emotions.
Text: ```{text}```
Tasks and Steps: The provided text is a story from a GoFundMe campaign. Your task is to rewrite the story to convey the emotion {emotion}. Follow the steps below to guide your writing process:

Step 1: Begin by describing the content and main events of the original story. Highlight key details such as the protagonist's situation, challenges they face, and the purpose of the fundraising campaign.
Step 2: Consider the emotions conveyed in the current story. Reflect on the tone, language, and events that evoke specific feelings in the reader. Identify the predominant emotion or emotions present in the text.
Step 3: Now, envision how you would rewrite the same story to convey the emotion {emotion}. Think about how changes in language, imagery, and narrative style can evoke the desired emotional response in the reader. Consider the protagonist's perspective, their struggles, and the impact of the reader's support on their journey.
"""


respone = get_completion(prompt1)
print('prompt1: ', respone)

respone = get_completion(prompt2)
print('prompt2: ', respone)

respone = get_completion(prompt3)
print('prompt3: ', respone)


prompt1:  I'm terrified. My name is Debbi and I'm desperately trying to raise funds for my brother Donald Webb. The horrifying truth is that he is only 51 years old, and in 2016 he suffered a series of strokes that left him with loss of motor skills and memory loss. Now, he has been rushed to the hospital because his kidneys are failing and he is on the brink of needing dialysis. On top of that, he is battling other serious health issues like high blood pressure and diabetes. He requires an extended stay in the hospital, with each day costing $450 to keep his bed at The Regency at Shelby in Shelby Township, MI. Please, any assistance provided would ensure that he receives the care he so desperately needs.
prompt2:  My name is Debbi, and I am reaching out in desperation for my brother Donald Webb. At only 51 years old, he has already faced a nightmare of health issues that have left him a mere shell of his former self. In 2016, a cruel series of strokes robbed him of his motor skills an

In [None]:
#https://medium.com/@hermanschutte/how-to-custom-train-and-fine-tune-models-with-the-chatgpt-api-afb796aaf2fe
# https://medium.com/@r2consultingcloud/a-step-by-step-guide-to-custom-fine-tuning-with-chatgpts-api-using-a-custom-dataset-54dae6c055ce

In [None]:
import pandas as pd

gofound_all = pd.read_csv("gofundme_sample.csv")
gofound_all.head(10)

Unnamed: 0,message,emotion
0,"Well, another seizure decided to sneak up on m...",fear
1,Not alot know but my sister under went a tripl...,fear
2,This morning my mom was rushed to the hospital...,fear
3,Hello everyone. I am having surgery on my hear...,fear
4,Helen is a 16 year old girl recently diagnosed...,fear
5,Hello my name is Jeremy Stone and I'm humbly a...,fear
6,Well on June 15 2021 he had his first tonic se...,fear
7,"Hello, my name is Anthony, and I'm fundraising...",fear
8,"Hi, My name is Pamela Flores (Pam). I was empl...",fear
9,"Hi everyone my Katherine I am 26 yrs old ,sing...",fear


## Part II Fine-tuning (Extra credit, 20% extra)

### Creation of fine-tuning data set

To fine-tune an LLM, we will need to create a fine-tuning data set. There are multiple approaches to creating such a data set, as listed in the following table. Feel free to use any approach proposed in the table provided the pdf homework document or come up with your ways of data set curation. I suggest starting with 100 well-crafted training examples for fine-tuning.   


In [None]:
# prepare the dataset with 100 samples to fine-tune the model

emotion='fear'
text_outcome = []
for i in range(0,100):
  input_text = gofound_all.iloc[i,0]
  prompt1 = f"""Rewrite the following text in {emotion} emotion:
  Text: ```{input_text}```
  """
  response = get_completion(prompt1)
  text_outcome.append(response)
gofound_100 = gofound_all.iloc[:100]
gofound_100['model_generated'] = text_outcome

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gofound_100['model_generated'] = text_outcome


In [None]:
gofound_100.drop(columns=['emotion'], inplace=True)
gofound_100.head(10)
gofound_100.to_csv('gofound_100.csv', index=False)

**Answer** (Describe how you create the fine-tuning data set):

_I prepare a fine-tuning dataset by iterating through the first 100 rows of a dataset. For each row, it constructs a prompt instructing a language model to rewrite the text with a specified emotion (in this case, fear). The model generates responses based on these prompts, and the original text along with the generated responses are stored in a DataFrame. Finally, the DataFrame is converted into a JSON Lines format, creating a structured dataset suitable for fine-tuning the language model._

In [None]:
#https://platform.openai.com/docs/guides/fine-tuning

In [None]:
import pandas as pd
import json

def prepare_data(input_file, output_file):
    # Read CSV file into DataFrame
    df = pd.read_csv(input_file)

    jsonl_data = []
    for index, row in df.iterrows():
        user_message = str(row['message'])  # Assuming 'message' is the column name for user messages
        model_message = str(row['model_generated'])  # Assuming 'model_generated' is the column name for model-generated messages

        # Construct dialogue
        conversation = [
            {"role": "system", "content": "Rewrite with the emotion of fear"},
            {"role": "user", "content": user_message},
            {"role": "assistant", "content": model_message}
        ]

        # Convert dialogue to JSON object
        json_object = {'messages': conversation}
        jsonl_data.append(json.dumps(json_object))

    # Write JSON Lines data to output file
    with open(output_file, 'w', encoding='utf-8') as outfile:
        outfile.write('\n'.join(jsonl_data))

# Replace 'input_file.csv' and 'output_file.jsonl' with your file paths
prepare_data('gofound_100.csv', 'gofound_100.jsonl')


In [None]:
from openai import OpenAI
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")

client = OpenAI(api_key = '')

client.files.create(
  file=open("gofound_100.jsonl", "rb"),
  purpose="fine-tune"
)

FileObject(id='file-Ciz8WVJehs4NooYEl0pTrg3y', bytes=204626, created_at=1713891216, filename='gofound_100.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)

### Fine-tuning and evaluation

In this part, we will explore how to fine-tune a LLM using OpenAI API for the text rewritten task.



Please follow the OpenAI tutorial on fine-tuning to fine-tune GPT-3.5-turbo on your fine-tuning data set.
After fine-tuning, please try the same prompts you developed in Part I on the newly fine-tuned model and comment on its performance and your observation.


##  *  _I saved the output of fine-tuning in the "gofound_100_tuned.csv" file._




In [None]:
################################################################################
# TODO: Fill in your codes for the fine-tuning process                         #                                                              #
################################################################################

In [None]:
client.fine_tuning.jobs.create(
  training_file="file-Ciz8WVJehs4NooYEl0pTrg3y",
  model="gpt-3.5-turbo")

FineTuningJob(id='ftjob-hSZPGboF85EHMJlQoLbwPL23', created_at=1713891226, error=Error(code=None, message=None, param=None, error=None), fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto'), model='gpt-3.5-turbo-0125', object='fine_tuning.job', organization_id='org-dStTSLOVLNSFy6iYbaKpbade', result_files=[], seed=1673481916, status='validating_files', trained_tokens=None, training_file='file-Ciz8WVJehs4NooYEl0pTrg3y', validation_file=None, integrations=[], user_provided_suffix=None)

In [None]:
# Retrieve the state of a fine-tune
client.fine_tuning.jobs.retrieve("ftjob-hSZPGboF85EHMJlQoLbwPL23")

FineTuningJob(id='ftjob-hSZPGboF85EHMJlQoLbwPL23', created_at=1713891226, error=Error(code=None, message=None, param=None, error=None), fine_tuned_model='ft:gpt-3.5-turbo-0125:personal::9HDqjuMb', finished_at=1713891840, hyperparameters=Hyperparameters(n_epochs=3, batch_size=1, learning_rate_multiplier=2), model='gpt-3.5-turbo-0125', object='fine_tuning.job', organization_id='org-dStTSLOVLNSFy6iYbaKpbade', result_files=['file-eZGchLhzX6udurnvqcVNYUPw'], seed=1673481916, status='succeeded', trained_tokens=129444, training_file='file-Ciz8WVJehs4NooYEl0pTrg3y', validation_file=None, integrations=[], user_provided_suffix=None)

In [None]:
# Cancel a job
# client.fine_tuning.jobs.cancel("ftjob-hSZPGboF85EHMJlQoLbwPL23")

In [None]:
# deploy the fine-tuned model
model_version = "ft:gpt-3.5-turbo-0125:personal::9HDqjuMb" # find-tuned GPT-3.5

def get_fine_tune_completion(prompt, model=model_version):
  completion = client.chat.completions.create(
      model = model_version,
      messages=[
          {
              "role": "user",
              "content": prompt,
          }
      ],
      temperature=0, # this is the degree of randomness of the model's output
  )
  return completion.choices[0].message.content

In [None]:
import pandas as pd

gofound_all = pd.read_csv("gofundme_sample.csv")
gofound_all.head(10)

Unnamed: 0,message,emotion
0,"Well, another seizure decided to sneak up on m...",fear
1,Not alot know but my sister under went a tripl...,fear
2,This morning my mom was rushed to the hospital...,fear
3,Hello everyone. I am having surgery on my hear...,fear
4,Helen is a 16 year old girl recently diagnosed...,fear
5,Hello my name is Jeremy Stone and I'm humbly a...,fear
6,Well on June 15 2021 he had his first tonic se...,fear
7,"Hello, my name is Anthony, and I'm fundraising...",fear
8,"Hi, My name is Pamela Flores (Pam). I was empl...",fear
9,"Hi everyone my Katherine I am 26 yrs old ,sing...",fear


In [None]:
# evaluate the fine-tuned model

emotion='fear'
text_outcome = []
for i in range(0,100):
  input_text = gofound_all.iloc[i,0]
  prompt1 = f"""Rewrite the following text in {emotion} emotion:
  Text: ```{input_text}```
  """
  response = get_fine_tune_completion(prompt1)
  text_outcome.append(response)
gofound_100_tuned = gofound_all.iloc[:100]
gofound_100_tuned['model_generated'] = text_outcome

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gofound_100_tuned['model_generated'] = text_outcome


In [None]:
gofound_100_tuned.drop(columns=['emotion'], inplace=True)
gofound_100_tuned.head(10)
gofound_100_tuned.to_csv('gofound_100_tuned.csv', index=False)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  gofound_100_tuned.drop(columns=['emotion'], inplace=True)


## Acknowledgement

The homework is developed based on materials from [1], [2], [3], and [4]

