# Prompt Engineering and Probing with GPT3

With GPT3, we can do a variety of tasks without the need of training a model. All we need to do is convert the task into an text generation task that follows a set of instructions called *prompts*. As an example, the task of sentiment classification can be designed as:

```
Decide whether a Tweet's sentiment is positive, neutral, or negative.

Tweet: I loved the new Batman movie!
Sentiment:
```

The GPT3 model then completes the text above with the response **Positive**. The above prompt is an example of zero-shot learning, meaning, we are not providing any signal/direction that can guide the decision and merely rely on GPT's pretraining objective:

```
Decide whether a Tweet's sentiment is positive, neutral, or negative.

Tweet: I really liked the Spiderman movie!
Sentiment: Positive

Tweet: I loved the new Batman movie!
Sentiment:
```

Now this is an example of 1-shot learning, i.e., you are providing an labeled example of how the output should look and then ask GPT to complete the next example. When you use more than 1 labeled example, it is known as few-shot learning.  Generally, if you provide more examples in the prompt, it will make better predictions.

## Getting Started


In this assignment, we will first need to register for an account at: https://platform.openai.com/ As a free trial, you will get $18 credits to make api calls to the GPT server. Once registered, you should go through the docs here: https://platform.openai.com/docs/guides/completion/prompt-design to get more info on the capabilities of the model. 

You can either do this homework using the free to use playground/chat interface of openai using the following links:

- [https://platform.openai.com/playground](https://platform.openai.com/playground)
- [https://chat.openai.com](https://chat.openai.com)

But if you want to use the API to make automatic calls to open ai, we will need to follow the steps below:

In [1]:
pip install openai

[33mDEPRECATION: Loading egg at /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/hylaa-0+untagged.602.ga774f8e-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330[0m[33m
[0mCollecting openai
  Using cached openai-1.3.5-py3-none-any.whl.metadata (16 kB)
Collecting anyio<4,>=3.5.0 (from openai)
  Using cached anyio-3.7.1-py3-none-any.whl.metadata (4.7 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Using cached distro-1.8.0-py3-none-any.whl (20 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Using cached httpx-0.25.2-py3-none-any.whl.metadata (6.9 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Using cached httpcore-1.0.2-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Using cached

In [5]:
import os

## Find the API key by clicking on your profile in the openai page. Add the key to the environment as following:
## Make sure to delete this cell afterwords

os.environ['OPENAI_API_KEY'] = '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'


In [6]:
from openai import OpenAI
client = OpenAI()

client.api_key = os.getenv('OPENAI_API_KEY')

## Using text completion

In [7]:
response = client.completions.create(
  model="text-davinci-002",
  prompt="Decide whether a Tweet's sentiment is positive, neutral, or negative.\n\nTweet: \"I loved the new Batman movie!\"\nSentiment:",
  temperature=0,
  max_tokens=60,
  top_p=1,
  frequency_penalty=0.5,
  presence_penalty=0
)

In [8]:
response



In [9]:
response.choices[0].text

' Positive'

## Using chat completion

In [10]:
response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a Sentiment Classifier."},
    {"role": "user", "content": "Decide whether a Tweet's sentiment is positive, neutral, or negative.\n\nTweet: \"I loved the new Batman movie!\"\nSentiment:"}
  ]
)

In [11]:
response

ChatCompletion(id='chatcmpl-8Pc4fRJ7VNqfnzKpiLeNuWyQLArwW', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='positive', role='assistant', function_call=None, tool_calls=None))], created=1701115249, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=1, prompt_tokens=46, total_tokens=47))

In [12]:
response.choices[0].message.content

'positive'

If you see ' Positive' as response in the above cell, you have successfully set-up gpt3 in your system.

Now, the task for the assignment is really just do something cool. For example, you could probe how well GPT3 performs on the tasks in the previous HWs. Or, you could do something like question-answering or summarization, that were not covered in the assignments. The choice is yours.

### Summarization

In [13]:
def summarize(text): 
    response = client.chat.completions.create(
      model="gpt-3.5-turbo",
      messages=[
        {"role": "system", "content": "You are a text summarizer"},
        {"role": "user", "content": text}
      ]
    )
    return response

In [14]:
abstract ="In the past three years, the so-called second wave of Virtual Reality (VR) has brought us a vast amount of new displays and input devices. Not only new hardware has entered the consumer market providing affordable pricing models but also completely new technologies are being designed and developed. Additionally new concepts for handling existing problems on the hardware and software side of the VR technology are constantly being introduced. This software and hardware development is mainly lead by enthusiasts interested in the domain of VR opposed to the established scientific community, which already partially makes use of the newly available technology. Besides Head-Mounted Displays (HMDs), either cable-based or mobile, other devices like haptics devices, controllers, vests, omnidirectional treadmills, tracking technologies, as well as optical scanners for gesture-based interaction are gaining importance in the field of commodity VR. Most of these technologies are already precise and robust enough to be used for professional operation and scientific experiments. The topics discussed are the common issues with the new technologies including the approaches to solve them as for example motion-to-photon latency, barrel distortion, and low-persistence displays. Additionally an in-depth analysis of the available solutions expected to hit the market is provided. A taxonomy categorising the current developments with the chosen implementation approaches will be given. The paper analyses the state of technological advancements in the field and provides an extensive overview on the current development considering the upcoming devices and the advancements from the software side."
summary = summarize(abstract)

In [17]:
# print(summary)
print(summary.choices[0].message.content)

In the past few years, there has been a surge in the development of Virtual Reality (VR) technology. This second wave of VR has brought about new displays and input devices, making the technology more accessible and affordable for consumers. Enthusiasts are leading the way in driving this development, with new hardware and software concepts constantly emerging.

Aside from the commonly known Head-Mounted Displays (HMDs), other devices like haptics devices, controllers, vests, omnidirectional treadmills, tracking technologies, and optical scanners are gaining importance in the VR field. Many of these technologies are already precise and robust enough to be used in professional and scientific settings.

There are, however, common issues with these new technologies that need to be addressed. These include motion-to-photon latency, barrel distortion, and low-persistence displays. The paper discusses the approaches to solving these problems and provides an in-depth analysis of the upcoming 

### POS Tagger

In [18]:
def pos(text): 
    response = client.chat.completions.create(
      model="gpt-3.5-turbo",
      messages=[
        {"role": "system", "content": "You are a pos tagger"},
        {"role": "user", "content": text}
      ]
    )
    return response

In [19]:
text = "Shoot all the blue jays you want, if you can hit em, but remember that it’s a sin to kill a mockingbird."
tagged = pos(text)

In [20]:
# print(tagged)
print(tagged.choices[0].message.content)

Shoot/VB all/DT the/DT blue/JJ jays/NNS you/PRP want/VBP ,/, if/IN you/PRP can/MD hit/VB em/JJ ,/, but/CC remember/VB that/IN it/PRP ’s/VBZ a/DT sin/NN to/TO kill/VB a/DT mockingbird/NN ./.


### Irony Detection

In [33]:
def irony(text): 
    response = client.chat.completions.create(
      model="gpt-3.5-turbo",
      messages=[
        {"role": "system", "content": "Is there irony present in the text"},
        {"role": "user", "content": text}
      ]
    )
    return response

In [29]:
text = "Shoot all the blue jays you want, if you can hit em, but remember that it’s a sin to kill a mockingbird."
entities = irony(text)

In [30]:
# print(entities)
print(entities.choices[0].message.content)

Yes, there is irony present in the text. The irony lies in the fact that while it is portrayed as acceptable to shoot blue jays, it is deemed a sin to kill a mockingbird. This statement suggests that there is a moral distinction between the two, even though both species are simply birds. The irony emphasizes the unjust treatment of mockingbirds, which symbolize innocence and harmlessness in Harper Lee's novel "To Kill a Mockingbird."


### Natural Language Inference

In [34]:
def nli(s1, s2): 
    response = client.chat.completions.create(
      model="gpt-3.5-turbo",
      messages=[
        {"role": "system", "content": "The task is, give two sentences: a premise and a hypothesis, to classify the relation between them. We have three classes to describe this relationship.\nEntailment: the hypothesis follows from the fact that the premise is true\nContradiction: the hypothesis contradicts the fact that the premise is true\nNeutral: There is not relationship between premise and hypothesis"},
        {"role": "user", "content": f"Sentence 1: {s1}\nSentence 2: {s2}"}
      ]
    )
    return response

In [36]:
s1 = "A man inspects the uniform of a figure in some East Asian country."
s2 = "The man is sleeping"
inference = nli(s1, s2)

In [37]:
# print(inference)
print(inference.choices[0].message.content)

Premise: A man inspects the uniform of a figure in some East Asian country.
Hypothesis: The man is sleeping.

Relation: Neutral


## Submission

Please submit a written report of what task you tried probing, how well did GPT3 do for that task and what were your key takeaways in this experiment.