# 1. Prompt Engineering Basics

In [12]:
%%capture
# update or install the necessary libraries
!pip install openai==v0.28.1
!pip install --upgrade python-dotenv
!pip install --upgrade langchain

In [1]:
import openai
import os
import IPython
from dotenv import load_dotenv

In [2]:
load_dotenv()

openai.api_key = os.getenv("OPENAI_API_KEY")

In [3]:
def set_open_params(
    model="text-davinci-003",
    temperature=0.7,
    max_tokens=256,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,  
):
    """set openai parameter"""
    openai_params = {}    

    openai_params['model'] = model
    openai_params['temperature'] = temperature
    openai_params['max_tokens'] = max_tokens
    openai_params['top_p'] = top_p
    openai_params['frequency_penalty'] = frequency_penalty
    openai_params['presence_penalty'] = presence_penalty
    return openai_params

def get_completion(params, prompt):
    """Get completion from openai api"""
    response = openai.Completion.create(
        engine = params['model'],
        prompt = prompt,
        temperature = params['temperature'],
        max_tokens = params['max_tokens'],
        top_p = params['top_p'],
        frequency_penalty = params['frequency_penalty'],
        presence_penalty = params['presence_penalty'],
    )
    return response

A simple example:

In [4]:
params = set_open_params()

prompt = "Roses are"

response = get_completion(params, prompt)

In [5]:
response.choices[0].text

' red\n\nViolets are blue\nSugar is sweet\nAnd so are you!'

In [6]:
IPython.display.Markdown(response.choices[0].text)

 red

Violets are blue
Sugar is sweet
And so are you!

Try with different temperature to compare results:

In [7]:
params = set_open_params(temperature=0)
prompt = "Roses are"
response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)

 red

Violets are blue

Sugar is sweet

And so are you!

### Text Summarization

In [8]:
params = set_open_params()
prompt = """Keras is a deep learning API written in Python that runs on top of TensorFlow. 
    It is quite popular among deep learning users because of its ease of use. 
    TensorFlow is an end-to-end open-source deep learning framework developed and maintained by Google. 
    Similar to Numpy, TensorFlow allows for mathematical computations and manipulation between numerical tensors, runs on CPUs, GPUs, and TPUs. 
    Keras was incorporated in TensorFlow 2.0 (the recent version) as tf.keras (high-level API) and can run on the aforementioned hardwares. 
    TensorFlow also allows for low-level operations with the TensorFlow Core API. 

    Explain the above in one sentence:"""
response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)

 

Keras is a high-level API written in Python that runs on top of TensorFlow, a deep learning framework developed and maintained by Google, which can run on CPUs, GPUs, and TPUs.

### Question Answering

In [9]:
prompt = """Answer the question based on the context below. Keep the answer short and concise. Respond "Unsure about answer" if not sure about the answer.

Context: The Avengers were a team of extraordinary individuals, with either superpowers or other special characteristics. Though primarily affiliated with the interests of the United States of America, the group's purpose was to protect global stability from inner or extraterrestrial threats. The Avengers were first assembled by S.H.I.E.L.D. as a result of the Avengers Initiative, when Loki invaded Earth with his Chitauri army. The team, consisting of Iron Man, Captain America, Hulk, Thor, Black Widow and Hawkeye defeated Loki and went their separate ways for a while.

Question: Why were the Avengers formed?

Answer:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)

 To protect global stability from inner or extraterrestrial threats.

### Text Classfication

In [11]:
prompt = """Classify the text into neutral, negative or positive.

Text: I think the Avengers Endgame was an interesting movie..

Sentiment:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)

 Neutral

### Role Playing

In [12]:
prompt = """The following is a conversation with an AI research assistant. The assistant tone is technical and scientific.

Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the big bang theory?
AI:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)

 Sure! The Big Bang Theory is the prevailing cosmological model for the universe from the earliest known periods through its subsequent large-scale evolution. The model describes how the universe expanded from a very high-density and high-temperature state, and offers a comprehensive explanation for a broad range of phenomena, including the abundance of light elements, the cosmic microwave background, large scale structure, and Hubble's law.

### Code Generation

In [13]:
prompt = "\"\"\"\nTable departments, columns = [DepartmentId, DepartmentName]\nTable students, columns = [DepartmentId, StudentId, StudentName]\nCreate a PostgreSQL query for all students in the Computer Science Department\n\"\"\""

response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)



SELECT StudentId, StudentName 
FROM students 
WHERE DepartmentId IN (SELECT DepartmentId 
                       FROM departments 
                       WHERE DepartmentName = 'Computer Science');

### Reasoning

In [14]:
prompt = """The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 

Solve by breaking the problem into steps. First, identify the odd numbers, add them, and indicate whether the result is odd or even."""

response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)



Odd numbers: 15, 5, 13, 7, 1 
Sum of odd numbers: 41 
41 is an odd number.

## Advanced Prompting Techniques

### Few-shot prompts

In [16]:
prompt = """The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: The answer is False.

The odd numbers in this group add up to an even number: 17,  10, 19, 4, 8, 12, 24.
A: The answer is True.

The odd numbers in this group add up to an even number: 16,  11, 14, 4, 8, 13, 24.
A: The answer is True.

The odd numbers in this group add up to an even number: 17,  9, 10, 12, 13, 4, 2.
A: The answer is False.

The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)

 The answer is True.

### Chain-of-Thought (CoT) Prompting

In [17]:
prompt = """The odd numbers in this group add up to an even number: 4, 8, 9, 15, 12, 2, 1.
A: Adding all the odd numbers (9, 15, 1) gives 25. The answer is False.

The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. 
A:"""

response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)

 Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.

### Zero-shot CoT

In [18]:
prompt = """I went to the market and bought 10 oranges and 5 apples. I gave 5 oranges to the neighbor and 2 apples to the repairman. I then went and bought 5 more oranges and ate 1. How many oranges and apples did I remain with?

Let's think step by step."""

response = get_completion(params, prompt)
IPython.display.Markdown(response.choices[0].text)



Initially: 10 oranges and 5 apples

After giving away 5 oranges and 2 apples: 5 oranges and 3 apples

After buying 5 more oranges: 10 oranges and 3 apples

After eating 1 orange: 9 oranges and 3 apples

So, I remain with 9 oranges and 3 apples.

## Using LangChain 

### PAL - Code as Reasoning

In [11]:
from langchain.llms import OpenAI

In [None]:
llm = OpenAI(model_name='text-davinci-003', temperature=0)

In [22]:
question = "Which is the heaviest penguin?"

In [23]:
PENGUIN_PROMPT = '''
"""
Q: Here is a table where the first line is a header and each subsequent line is a penguin:
name, age, height (cm), weight (kg) 
Louis, 7, 50, 11
Bernard, 5, 80, 13
Vincent, 9, 60, 11
Gwen, 8, 70, 15
For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. 
We now add a penguin to the table:
James, 12, 90, 12
How many penguins are less than 8 years old?
"""
# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Add penguin James.
penguins.append(('James', 12, 90, 12))
# Find penguins under 8 years old.
penguins_under_8_years_old = [penguin for penguin in penguins if penguin[1] < 8]
# Count number of penguins under 8.
num_penguin_under_8 = len(penguins_under_8_years_old)
answer = num_penguin_under_8
"""
Q: Here is a table where the first line is a header and each subsequent line is a penguin:
name, age, height (cm), weight (kg) 
Louis, 7, 50, 11
Bernard, 5, 80, 13
Vincent, 9, 60, 11
Gwen, 8, 70, 15
For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.
Which is the youngest penguin?
"""
# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Sort the penguins by age.
penguins = sorted(penguins, key=lambda x: x[1])
# Get the youngest penguin's name.
youngest_penguin_name = penguins[0][0]
answer = youngest_penguin_name
"""
Q: Here is a table where the first line is a header and each subsequent line is a penguin:
name, age, height (cm), weight (kg) 
Louis, 7, 50, 11
Bernard, 5, 80, 13
Vincent, 9, 60, 11
Gwen, 8, 70, 15
For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.
What is the name of the second penguin sorted by alphabetic order?
"""
# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Sort penguins by alphabetic order.
penguins_alphabetic = sorted(penguins, key=lambda x: x[0])
# Get the second penguin sorted by alphabetic order.
second_penguin_name = penguins_alphabetic[1][0]
answer = second_penguin_name
"""
{question}
"""
'''.strip() + '\n'

In [24]:
llm_output = llm(PENGUIN_PROMPT.format(question=question))
print(llm_output)

# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Sort penguins by weight.
penguins_by_weight = sorted(penguins, key=lambda x: x[3], reverse=True)
# Get the heaviest penguin's name.
heaviest_penguin_name = penguins_by_weight[0][0]
answer = heaviest_penguin_name


In [27]:
exec(llm_output)
print(answer)

Gwen


### Retrieval-Augmented Generation

In [3]:
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate

In [4]:
with open('../data/state_of_the_union.txt') as f:
    state_of_the_union = f.read()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 20,
    length_function = len,
)
texts = text_splitter.split_text(state_of_the_union)

In [5]:
embeddings = OpenAIEmbeddings()

In [6]:
docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{'source': str(i)} for i in range(len(texts))])

In [7]:
query = "What is the name of the vice president?"
docs = docsearch.similarity_search(query)

In [8]:
docs

[Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': '31'}),
 Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the S

In [9]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain

In [12]:
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff")
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': ' The vice president is Kamala Harris.\nSOURCES: 0-pl'}

Try a question with a custom prompt

In [45]:
template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES"). 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.

QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER:"""

# create a prompt template
PROMPT = PromptTemplate(template=template, input_variables=["summaries", "question"])

# query 
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff", prompt=PROMPT)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': '\nI do not know the name of the vice president. \n\nSOURCES:\n31. https://www.whitehouse.gov/briefing-room/speeches-remarks/2021/04/28/remarks-by-president-biden-in-address-to-a-joint-session-of-congress/'}