# 🧪 Applied Prompting

### Config

In [1]:
from IPython.display import display, Markdown
from config import OPENAI_API_KEY
import openai
import os

In [2]:
# Set up your OpenAI API key
openai.api_key = OPENAI_API_KEY

# Define function for printing long strings as markdown
md_print = lambda text: display(Markdown(text))

## Introduction

In this section we go over end-to-end prompt engineering processes written by community members!

Let's recreate our function to call GPT with a single prompt, and out Chatbot:

In [3]:
# Call ChatGPT API with prompt
def call_GPT(prompt, model):
    if model == "gpt-3.5-turbo":
        completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
        )
        response = completion.choices[0].message.content
    elif model == "text-davinci-003":
        completion = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        max_tokens=2000
        )
        response=completion['choices'][0]['text']
    else:
        raise ValueError("Model must be gpt-3.5-turbo or text-davinci-003")
    # Parse results and print them out
    md_print(f'Jacob: {prompt}')
    md_print(f'GPT: {response}')

# Create a chatbot class

class ChatBot:
    def __init__(self):
        # List to keep track of conversation history
        self.context = []
        
    def new_message(self, prompt, verbose_last_message_only=True):
        # Append user prompt to chatbot context
        self.context.append({"role": "user", "content": prompt})

        # Create assistant response
        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=self.context
        )

        # Parse assistant response
        chat_response = completion.choices[0].message.content

        # Add assistant response to context
        self.context.append({"role": "assistant", "content": chat_response})

        # selecting message to print
        if verbose_last_message_only == True:
            print_messages = self.context[-2:]
        else:
            print_messages = self.context

        # Print out conversation
        for message in print_messages:
            if message["role"] == "user":
                md_print(f'Jacob: {message["content"]}')
            else:
                md_print(f'GPT: {message["content"]}')  

## Mutliple Choice Questions

In [4]:
# Standard prompt

standard_prompt = """
John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. 
Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. 
Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. 
Thus, the Korean sighting helps to confirm John of Worcester's sighting. Which one of the following, if true, most strengthens the argument?


a) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. 
b) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. 
c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. 
d) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. 
e) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.
"""

call_GPT(standard_prompt, 'text-davinci-003')

Jacob: 
John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. 
Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. 
Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. 
Thus, the Korean sighting helps to confirm John of Worcester's sighting. Which one of the following, if true, most strengthens the argument?


a) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. 
b) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. 
c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. 
d) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. 
e) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.


GPT: 
C) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea.

In [5]:
# Zero-shot Chain of thought prompting
zero_shot_cot_prompt = """
John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. 
Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. 
Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. 
Thus, the Korean sighting helps to confirm John of Worcester's sighting. Which one of the following, if true, most strengthens the argument?


a) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. 
b) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. 
c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. 
d) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. 
e) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.

Let's explain step by step
"""

call_GPT(zero_shot_cot_prompt, 'text-davinci-003')

Jacob: 
John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. 
Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. 
Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. 
Thus, the Korean sighting helps to confirm John of Worcester's sighting. Which one of the following, if true, most strengthens the argument?


a) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. 
b) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. 
c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. 
d) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. 
e) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.

Let's explain step by step


GPT: 
(a) This answer choice does not strengthen the argument, since it suggests that the aurora borealis had nothing to do with the sunspots John of Worcester recorded, which weakens the argument instead.

(b) This answer choice does not strengthen the argument because it provides knowledge that is irrelevant to the argument.

(c) This answer choice is the correct answer since it strengthens the argument by providing evidence of a correlation between the two phenomena, suggesting that the sunspots John of Worcester recorded did indeed contribute to the Korean sighting of the aurora borealis.

(d) This answer choice does not strengthen the argument because it provides information that is not directly related to the argument.

(e) This answer choice does not strengthen the argument because it does not provide evidence of a correlation between the two phenomena.

In [6]:
# Reorder question items
reordered_items_prompt = """
John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. 
Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. 
Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. 
Thus, the Korean sighting helps to confirm John of Worcester's sighting. 
Which one of the following, if true, most strengthens the argument?

a) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity. 
b) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. 
c) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. 
d) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. 
e) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. 

Let's explain step by step
"""

call_GPT(reordered_items_prompt, 'text-davinci-003')

Jacob: 
John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. 
Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. 
Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. 
Thus, the Korean sighting helps to confirm John of Worcester's sighting. 
Which one of the following, if true, most strengthens the argument?

a) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity. 
b) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. 
c) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. 
d) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. 
e) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. 

Let's explain step by step


GPT: 
a) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity. 

This answer choice does not help to strengthen the argument, because the drawing does not provide evidence of a connection between the sighting of sunspots and the aurora borealis. 

b) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. 

This answer choice does not help to strengthen the argument, because the weather conditions do not provide evidence of a connection between the sighting of sunspots and the aurora borealis. 

c) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. 

This answer choice does not help to strengthen the argument, because this suggests that sunspot activity and an aurora borealis can occur independently of one another. 

d) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. 

This answer choice does not help to strengthen the argument, because this does not provide evidence of a connection between the sighting of sunspots and the aurora borealis. 

e) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. 

This is the correct answer choice because it strengthens the argument by providing evidence of a connection between the sighting of sunspots and the aurora borealis. This suggests that the sunspot activity reported by John of Worcester indeed caused the aurora borealis seen five days later in Korea.

In [7]:
# Reword the question

# Changed "Which one of the following, if true, most strengthens the argument?" to "Identify each choice as strengthens, weakens or doesn't impact the argument."
reworded_question_prompt = """
John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. 
Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. 
Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. 
Thus, the Korean sighting helps to confirm John of Worcester's sighting. 
Which one of the following, if true, most strengthens the argument? 

a) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. 
b) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. 
c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. 
d) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. 
e) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.

Identify each choice as strengthens, weakens or doesn't impact the argument and explain step by step.
"""

call_GPT(reworded_question_prompt, 'text-davinci-003')

Jacob: 
John of Worcester, an English monk, recorded the sighting, on December 8, 1128, of two unusually large sunspots. 
Five days later a brilliant aurora borealis (northern lights) was observed in southern Korea. 
Sunspot activity is typically followed by the appearance of an aurora borealis, after a span of time that averages five days. 
Thus, the Korean sighting helps to confirm John of Worcester's sighting. 
Which one of the following, if true, most strengthens the argument? 

a) An aurora borealis can sometimes occur even when there has been no significant sunspot activity in the previous week. 
b) Chinese sources recorded the sighting of sunspots more than 1000 years before John of Worcester did. 
c) Only heavy sunspot activity could have resulted in an aurora borealis viewable at a latitude as low as that of Korea. 
d) Because it is impossible to view sunspots with the naked eye under typical daylight conditions, the sighting recorded by John of Worcester would have taken place under unusual weather conditions such as fog or thin clouds. 
e) John of Worcester's account included a drawing of the sunspots, which could be the earliest illustration of sunspot activity.

Identify each choice as strengthens, weakens or doesn't impact the argument and explain step by step.


GPT: 
A) Strengthens - This strengthens the argument because it provides a reason why the aurora borealis might have been visible five days after the sighting of sunspots. Even though there had been no significant sunspot activity in the week leading up to the aurora, it is possible that small amounts of sunspot activity prior to the week before can still lead to an aurora. 

B) Doesn't Impact - This choice does not impact the argument. It provides no information to indicate whether or not the two sightings (the sunspots and the aurora) are connected to one another. 

C) Strengthens - This strengthens the argument because it provides evidence that the aurora borealis observed in Korea could have been a result of the sunspot activity observed in England. By noting that heavy sunspot activity would have been necessary to produce an aurora at such a low latitude, it becomes more likely that the two events are connected.

D) Weakens - This weakens the argument because it provides evidence that the sunspots observed may not have been caused by sunspot activity. If the conditions were not ideal for viewing, then it is possible that John of Worcester observed something else in the sky which he mistaken for sunspots. 

E) Doesn't Impact - This choice does not impact the argument. It does not provide any information that supports a connection between the two sightings.

Let's try a different problem now

In [8]:
# Try a math problem
bayes_prompt = """
Consider two medical tests, A and B, for a virus. 
Test A is 90% effective at recognizing the virus when it is present, but has a 5% false positive rate (indicating that the virus is present, when it is not). 
Test B is 95% effective at recognizing the virus, but has a 10% false positive rate.
The two tests use independent methods of identifying the virus. 
The virus is carried by 2% of all people.

(a) Say that a person is tested for the virus using only Test A. 
What is the probability that the person is really carrying the virus given that Test A came back positive? (2 points)

(b) Say that a person is tested for the virus using only Test B. 
What is the probability that the person is really carrying the virus given that Test B came back positive? (2 points)

(c) Say that a person is tested for the virus using both tests. 
What is the probability that the person is really carrying the virus given that both tests came back positive? (2 points)
"""

call_GPT(bayes_prompt, 'text-davinci-003')

Jacob: 
Consider two medical tests, A and B, for a virus. 
Test A is 90% effective at recognizing the virus when it is present, but has a 5% false positive rate (indicating that the virus is present, when it is not). 
Test B is 95% effective at recognizing the virus, but has a 10% false positive rate.
The two tests use independent methods of identifying the virus. 
The virus is carried by 2% of all people.

(a) Say that a person is tested for the virus using only Test A. 
What is the probability that the person is really carrying the virus given that Test A came back positive? (2 points)

(b) Say that a person is tested for the virus using only Test B. 
What is the probability that the person is really carrying the virus given that Test B came back positive? (2 points)

(c) Say that a person is tested for the virus using both tests. 
What is the probability that the person is really carrying the virus given that both tests came back positive? (2 points)


GPT: 
a) The probability that the person is really carrying the virus given that Test A came back positive is 90%.

b) The probability that the person is really carrying the virus given that Test B came back positive is 95%.

c) The probability that the person is really carrying the virus given that both tests came back positive is (0.95 * 0.90) * 0.02 = 0.1688 = 16.88%.

In [9]:
# Try a math problem with some additional context
additional_context_bayes_prompt = """
Consider two medical tests, A and B, for a virus. 
Test A is 90% effective at recognizing the virus when it is present, but has a 5% false positive rate (indicating that the virus is present, when it is not). 
Test B is 95% effective at recognizing the virus, but has a 10% false positive rate.
The two tests use independent methods of identifying the virus. 
The virus is carried by 2% of all people.

(a) Say that a person is tested for the virus using only Test A. 
What is the probability that the person is really carrying the virus given that Test A came back positive? (2 points)

(b) Say that a person is tested for the virus using only Test B. 
What is the probability that the person is really carrying the virus given that Test B came back positive? (2 points)

(c) Say that a person is tested for the virus using both tests. 
What is the probability that the person is really carrying the virus given that both tests came back positive? (2 points)

Let's explain step by step. Give the numerical expression as answer, do not return a number. The formula for bayes is
"""

call_GPT(additional_context_bayes_prompt, 'text-davinci-003')

Jacob: 
Consider two medical tests, A and B, for a virus. 
Test A is 90% effective at recognizing the virus when it is present, but has a 5% false positive rate (indicating that the virus is present, when it is not). 
Test B is 95% effective at recognizing the virus, but has a 10% false positive rate.
The two tests use independent methods of identifying the virus. 
The virus is carried by 2% of all people.

(a) Say that a person is tested for the virus using only Test A. 
What is the probability that the person is really carrying the virus given that Test A came back positive? (2 points)

(b) Say that a person is tested for the virus using only Test B. 
What is the probability that the person is really carrying the virus given that Test B came back positive? (2 points)

(c) Say that a person is tested for the virus using both tests. 
What is the probability that the person is really carrying the virus given that both tests came back positive? (2 points)

Let's explain step by step. Give the numerical expression as answer, do not return a number. The formula for bayes is


GPT: P(A|B) = P(B|A) * P(A) / P(B)


(a) P(Virus | Test A positive) = P(Test A positive | Virus) * P(Virus) / P(Test A positive) = (0.90 * 0.02) / (0.90 * 0.02 + 0.05 * 0.98)

(b) P(Virus | Test B positive) = P(Test B positive | Virus) * P(Virus) / P(Test B positive) = (0.95 * 0.02) / (0.95 * 0.02 + 0.10 * 0.98)

(c) P(Virus | Test A positive & Test B positive) = P(Test A positive & Test B positive | Virus) * P(Virus) / P(Test A positive & Test B positive) = (0.90 * 0.95 * 0.02) / (0.90 * 0.95 * 0.02 + 0.05 * 0.10 * 0.98)

Frameworks like MRKL which provide the LLM with external tools can improve the models responses dramatically and with "less" work than complex prompt engineering.

## Solve Discussion Questions

Discussion questions are common in college courses, their responses usually range from 100-700 words. We will look at some prompting methods to improve a language model's responses to these questions

In [10]:
# Example 1: Simple Prompt
simple_discussion_question_1_prompt = """
Respond to the following:

Overfitting is a common problem in machine learning. Can you recall a real-world scenario where you noticed overfitting? 
"""

call_GPT(simple_discussion_question_1_prompt, 'text-davinci-003')

Jacob: 
Respond to the following:

Overfitting is a common problem in machine learning. Can you recall a real-world scenario where you noticed overfitting? 


GPT: 
A real-world scenario where I have noticed overfitting is in using machine learning algorithms to design a marketing strategy. If the algorithm is trained on certain patterns in customer data, it may fail to generalize or recognize significant changes in customer behavior or preferences that could potentially impact the effectiveness of the strategy. As a result, the algorithm could end up providing recommendations that may be overly specific to the data set used to train it, resulting in decisions that may be too tailored to the particular circumstances of the training data and not effective when put into practice.

In [11]:
# Example 1: Improved Prompt
improved_discussion_question_1_prompt = """
Write a highly detailed essay with introduction, body, and conclusion paragraphs responding to the following:

Overfitting is a common problem in machine learning. Can you recall a real-world scenario where you noticed overfitting? 
"""

call_GPT(improved_discussion_question_1_prompt, 'text-davinci-003')

Jacob: 
Write a highly detailed essay with introduction, body, and conclusion paragraphs responding to the following:

Overfitting is a common problem in machine learning. Can you recall a real-world scenario where you noticed overfitting? 


GPT: 
Introduction
In today’s day and age, machine learning is playing an increasingly important role in our lives. While machine learning offers powerful tools to make predictions and uncover insights from data, it is also prone to a common problem known as overfitting. Overfitting occurs when a machine learning model fits too closely to the given training data at the expense of accurately predicting unseen data. In this essay, I will discuss a real-world scenario where I noticed overfitting.

Body
I noticed overfitting recently when I was developing a machine learning algorithm to identify high-value customers of a business and allocate promotional budget accordingly. I took into account various features like age, marital status, geographic location, profession, and frequency of visits. While the model was able to identify high-value customers with great accuracy using the training data, it failed to accurately predict the value of new customers. It turns out that the model had overfitted on the training data and could not generalize beyond it.

To address the issue, I re-examined the data and put more emphasis on customer purchase history, product preferences, and promotion responsiveness. I also used regularization techniques like L1 and L2 to reduce overfitting. After making these changes, the model was able to generalize better and provided consistent results on unseen data.

Conclusion
Overfitting is a common problem in machine learning and can lead to inaccurate predictions. In this essay, I highlighted a real-world scenario where I dealt with overfitting while developing a machine learning model. I provided key steps to address the issue, such as emphasizing customer purchase history, product preferences, and promotion responsiveness, as well as applying regularization techniques like L1 and L2. A careful approach to tackling overfitting is necessary to ensure accurate predictions on unseen data.

In [12]:
# Example 2: Simple Prompt
simple_discussion_question_2_prompt = """
The terms AI and Data Science can be used interchangeably. Agree our Disagree and why?
"""

call_GPT(simple_discussion_question_2_prompt, 'gpt-3.5-turbo')

Jacob: 
The terms AI and Data Science can be used interchangeably. Agree our Disagree and why?


GPT: As an AI language model, I would disagree with this statement. While AI and Data Science are closely related, they are not interchangeable terms. AI refers to the simulation of human intelligence in machines that can perform tasks that typically require human intelligence, such as learning, problem-solving, decision-making, and recognizing speech and images. On the other hand, Data Science is a multidisciplinary field that involves the collection, storage, analysis, and interpretation of large amounts of data to extract insights and knowledge from them.

While AI relies on data and data science techniques to train machine learning models and other algorithms, AI is broader in scope and includes a range of techniques and approaches beyond just data science. So, while there is undoubtedly some overlap between the two fields, they are not interchangeable terms.

In [13]:
# Example 2: Improved Prompt
improved_discussion_question_2_prompt = """
Write a highly detailed discussion response, in the structure of an essay, responding to the following prompt:

The terms AI and Data Science can be used interchangeably.
"""

call_GPT(improved_discussion_question_2_prompt, 'gpt-3.5-turbo')

Jacob: 
Write a highly detailed discussion response, in the structure of an essay, responding to the following prompt:

The terms AI and Data Science can be used interchangeably.


GPT: The terms Artificial Intelligence (AI) and Data Science are commonly used in the contemporary technology industry. While these two terminologies share some similarities, they are distinct and cannot be used interchangeably. Artificial Intelligence is a branch of computer science that enables machines to perform intelligent decision-making processes that would typically require human intervention. On the other hand, Data Science is an interdisciplinary field that involves the use of statistical and computational methods to extract insights from data. This essay will argue that AI and Data Science are distinct disciplines and cannot be used interchangeably.

Artificial Intelligence is a discipline where computers are programmed to mimic human intelligence and behaviors such as speech recognition, object recognition, decision making, and prediction. AI uses algorithms, machine learning, and deep learning models to learn from data and perform specific tasks. The foundation of AI is based on developing machines that can learn, reason, perceive, and communicate like humans. AI systems operate in complex environments, making it possible for them to perform tasks that would require human intelligence, such as language translation, image recognition, and speech recognition. Therefore, AI cannot be used interchangeably with Data Science, which primarily focuses on extracting insights from data.

Data Science, on the other hand, is an interdisciplinary field that uses various techniques to extract insights from structured, semi-structured, and unstructured data. Data Science involves the use of statistical and computational methods to analyze data and extract insights. Statisticians, computer scientists, and domain experts team up to design models that can be used to solve complex business problems. Moreover, Data Science involves data acquisition, data extraction, data cleaning, data transformation, data visualization, and data modeling. Data Scientists use different tools and techniques such as machine learning, natural language processing (NLP), and pattern recognition to explore large datasets and derive insights.

One of the main differences between AI and Data Science is their focus. AI is concerned with developing machines that can exhibit human-like behavior, while Data Science is focused on deriving insights from data. This focus results in differences in the techniques and tools used by the two disciplines. AI primarily uses machine learning, deep learning, and natural language processing techniques to learn from data, while Data Science primarily uses statistical and computational techniques such as regression analysis, clustering, and classification.

Additionally, AI requires a significant amount of labeled data to train and refine its models, while Data Science requires a smaller amount of data since its focus is on extracting insights rather than creating predictive models. AI algorithms are also complex and require a high level of expertise to develop and implement. In contrast, the development and implementation of Data Science algorithms are more streamlined, and Data Scientists do not require expert knowledge in computer science or deep learning.

In conclusion, it is clear that AI and Data Science are distinct disciplines that cannot be used interchangeably. While they share some similarities, their focus, capabilities, and techniques differ significantly. The distinction between AI and Data Science is crucial in the technology industry as it helps organizations to implement the appropriate technology for solving specific business problems. The use of the two disciplines interchangeably can cause confusion and undermine the value each brings to the table. Therefore, it is essential to understand the differences between the two disciplines to make informed decisions when implementing technology solutions.

In [15]:
# Example 3: Starting Prompt
chatbot1 = ChatBot()
starting_discussion_question_3_prompt = """
Write a beginning paragraph responding to the following:

There's a growing interest in using data science for social good. 
Can you discuss an area or issue where you think data science could make a significant positive impact?
"""

chatbot1.new_message(starting_discussion_question_3_prompt, 'text-davinci-003')

Jacob: 
Write a beginning paragraph responding to the following:

There's a growing interest in using data science for social good. 
Can you discuss an area or issue where you think data science could make a significant positive impact?


GPT: As the world becomes increasingly data-driven, more individuals and organizations are looking to use data science for social good. The potential applications of data science are endless, and one area where it could make a significant impact is in public health. With large amounts of data on disease outbreaks, hospital admissions, and drug efficacy, data science can help us identify patterns and develop targeted interventions. By harnessing the power of data, we could improve health outcomes, reduce healthcare costs, and save lives.

In [16]:
# Example 3: Iteration Prompt
iteration_discussion_question_3_prompt = """
I am writing a detailed short essay responding to the following prompt:

There's a growing interest in using data science for social good. 
Can you discuss an area or issue where you think data science could make a significant positive impact?

Here is what I have so far:

{previous_response}

Write the next paragraph of my essay.
""".format(previous_response=chatbot1.context[-1]['content'])

chatbot1.new_message(iteration_discussion_question_3_prompt, 'text-davinci-003')

Jacob: 
Write a beginning paragraph responding to the following:

There's a growing interest in using data science for social good. 
Can you discuss an area or issue where you think data science could make a significant positive impact?


GPT: As the world becomes increasingly data-driven, more individuals and organizations are looking to use data science for social good. The potential applications of data science are endless, and one area where it could make a significant impact is in public health. With large amounts of data on disease outbreaks, hospital admissions, and drug efficacy, data science can help us identify patterns and develop targeted interventions. By harnessing the power of data, we could improve health outcomes, reduce healthcare costs, and save lives.

Jacob: 
I am writing a detailed short essay responding to the following prompt:

There's a growing interest in using data science for social good. 
Can you discuss an area or issue where you think data science could make a significant positive impact?

Here is what I have so far:

As the world becomes increasingly data-driven, more individuals and organizations are looking to use data science for social good. The potential applications of data science are endless, and one area where it could make a significant impact is in public health. With large amounts of data on disease outbreaks, hospital admissions, and drug efficacy, data science can help us identify patterns and develop targeted interventions. By harnessing the power of data, we could improve health outcomes, reduce healthcare costs, and save lives.

Write the next paragraph of my essay.


GPT: One specific example where data science has already made a significant impact in public health is in the fight against HIV/AIDS. For decades, the HIV epidemic has plagued the world, with millions of people affected and limited access to treatment in certain regions. However, with the help of data science, we have seen progress in combating this disease. For instance, researchers have used machine learning to predict which patients are at the greatest risk of dropping out of treatment, allowing healthcare providers to intervene early and prevent treatment interruptions. Additionally, data analytics has helped identify regions with higher rates of HIV transmission, leading to more targeted prevention efforts. Overall, data science has played a crucial role in improving the prevention, treatment, and management of HIV/AIDS.

## Build ChatGPT of GPT-3

Ok, here we will build a mimic version of ChatGPT using GPT-3. Essentially all we need to do is build the context window for the conversation. We use a tokenizer to make sure the conversation history never exceeds the token limit of GPT-3 no matter how long the conversation goes.

In [28]:
!pip install tiktoken

In [21]:
import openai
import tiktoken

class GPT3ChatBot:
    model_engine = "text-davinci-003"
    encoding = tiktoken.encoding_for_model("text-davinci-003")

    chatbot_prompt = """
    As an advanced chatbot, your primary goal is to assist users to the best of your ability. This may involve answering questions, providing helpful information, or completing tasks based on user input. In order to effectively assist users, it is important to be detailed and thorough in your responses. Use examples and evidence to support your points and justify your recommendations or solutions.

    <conversation history>

    User: <user input>
    Chatbot: """

    def __init__(self):
        self.conversation_history = ""

    @staticmethod
    def num_tokens_from_string(string):
        """Returns the number of tokens in a text string."""
        num_tokens = len(ChatBot.encoding.encode(string))
        return num_tokens

    def trim_history(self, prompt_tokens):
        """
        Trim the conversation history until it fits within the token limit.

        It does so by removing whole dialogue turns instead of cutting a dialogue in the middle.
        """
        max_tokens = 4096
        while prompt_tokens > max_tokens:
            # '2' is here because a conversation turn includes both user and chatbot dialogue
            # Each turn ends with a '\n', so we split by '\n' and remove '2' elements each time
            self.conversation_history = '\n'.join(self.conversation_history.split('\n')[2:])
            prompt_tokens = ChatBot.num_tokens_from_string(ChatBot.chatbot_prompt.replace("<conversation history>", self.conversation_history))

    def get_response(self, user_input):
        """Generate a response from GPT-3 given the conversation history and user input."""
        # Prepare the prompt
        prompt = ChatBot.chatbot_prompt.replace("<conversation history>", self.conversation_history).replace("<user input>", user_input)

        # Trim the conversation history if the prompt is too long
        prompt_tokens = ChatBot.num_tokens_from_string(prompt)
        if prompt_tokens > 4096:
            self.trim_history(prompt_tokens)
            prompt = ChatBot.chatbot_prompt.replace("<conversation history>", self.conversation_history).replace("<user input>", user_input)

        # Get the response from GPT-3
        response = openai.Completion.create(
            engine=ChatBot.model_engine, prompt=prompt, max_tokens=2048, n=1, stop=None, temperature=0.5)

        # Extract and return the response text
        return response["choices"][0]["text"].strip()

    def new_message(self, user_input):
        chatbot_response = self.get_response(user_input)
        self.conversation_history += f"User: {user_input}\nChatbot: {chatbot_response}\n"
        print(f"User: {user_input}\nChatbot: {chatbot_response}")

gpt3_chatbot = GPT3ChatBot()

In [22]:
gpt3_chatbot.new_message("Hello")

User: Hello
Chatbot: Hi there! How can I help you today?


In [24]:
gpt3_chatbot.new_message("Write me a short essay on the evolution of AI from the turing machine to GPT-3")

User: Write me a short essay on the evolution of AI from the turing machine to GPT-3
Chatbot: The Turing Machine, developed by Alan Turing in 1936, is considered the first example of artificial intelligence (AI). It was designed to simulate the behavior of a human being, and was capable of solving problems that were too complex for humans to solve. The Turing Machine paved the way for the development of more sophisticated AI systems, such as neural networks and deep learning. 

In the 1950s, the first neural networks were developed, which were able to learn from data and make decisions based on the data they had been given. This allowed for the development of more complex AI systems that were able to solve complex problems.

In the 1980s, AI systems began to use machine learning algorithms to improve their performance. This allowed AI systems to become more accurate and efficient in their decision-making.

In the 1990s, AI systems began to use natural language processing (NLP) to under

In [25]:
gpt3_chatbot.new_message("What are the most important points to note from that short essay. Keep your repsonse concise.")

User: What are the most important points to note from that short essay. Keep your repsonse concise.
Chatbot: The most important points to note from the essay are: 
1. The Turing Machine was the first example of artificial intelligence (AI). 
2. Neural networks and deep learning were developed in the 1950s. 
3. Machine learning algorithms were used in the 1980s to improve AI performance. 
4. Natural language processing (NLP) was introduced in the 1990s for better human-machine interaction. 
5. GPT-3 is the most recent example of AI, which can generate human-like text. 
6. AI is becoming increasingly sophisticated and integrated into our everyday lives.


In [27]:
gpt3_chatbot.new_message("Why do you believe those are the most important points?")

User: Why do you believe those are the most important points?
Chatbot: These points represent the key milestones that have been achieved in the development of AI. The Turing Machine was the first example of AI, and it paved the way for more sophisticated AI systems such as neural networks and deep learning. Machine learning algorithms allowed for improved AI performance, and NLP allowed for better human-machine interaction. GPT-3 is the most recent example of AI, and it is able to generate human-like text. These points demonstrate the progress AI has made over the years, and how it is becoming increasingly integrated into our everyday lives.


## Chatbot + Knowledge Base

#### Intent-Based Chatbots
Traditional chatbots, Intent-Based Chatbots, respond to questions by matching it with a user intent and returning the associated response from a set of example questions and responses to them. The user input is matched against the sample questions in the repository based on similarity and the associated response is given.

Problems with this method include the large number of pre-defined intents you need to create and the possibility of matching to the wrong intent.

#### How GPT-3 can help
GPT-3 can improve upon Intent-Based Chatbots, because you can write documents containing your context and have GPT-3 distill that information into a response for the user. This removes the need for a massive list of intents and associated responses. This way you can instead have intents mapped to specific documents and perform a similarity search to find the document to use as context for the repsonse.

This improves upon some of the pitfalls of the Intent-Based Chatbot, but not all of them. We still need to do similarity matching as we can not fit all documents into the context window of GPT-3.

#### Generating Answers from a Knowledge Base with GPT-3
The steps for this improved chatbot would include:

1. Select the approriate intent for the user question
2. Generate the response given the document associated with the intent. 

Semantic search can be utilized for the first response. Then a carefully crafted prompt can help complete the second portion.'

Your prompt for the response may include:

- Role-prompting
- Knowledge Base data
- Conversation history
- User question

Let's try it out:

In [29]:
kb_chatbot = GPT3ChatBot()

In [30]:
prompt = """
As an advanced chatbot named Skippy, your primary goal is to assist users to the best of your ability.

START CONTEXT
Login to VideoGram from Website
1. Open your web browser and go to the VideoGram website.
2. Click on the “Login” button located in the top right corner of the page.
3. On the login page, enter your VideoGram username and password.
4. Once you have entered your credentials, click on the “Login” button.
5. You should now be logged in to your VideoGram account.

Login to VideoGram from Mobile App
1. Open the VideoGram app on your mobile device.
2. On the main page, tap the “Login” button located in the bottom right corner.
3. On the login page, enter your VideoGram username and password.
4. Once you have entered your credentials, tap the “Login” button.
5. You should now be logged in to your VideoGram account.
END CONTEXT

SKIPPY: Hello, I’m Skippy! How can I help you?
USER: Can’t login
SKIPPY:
"""

kb_chatbot.new_message(prompt)

User: 
As an advanced chatbot named Skippy, your primary goal is to assist users to the best of your ability.

START CONTEXT
Login to VideoGram from Website
1. Open your web browser and go to the VideoGram website.
2. Click on the “Login” button located in the top right corner of the page.
3. On the login page, enter your VideoGram username and password.
4. Once you have entered your credentials, click on the “Login” button.
5. You should now be logged in to your VideoGram account.

Login to VideoGram from Mobile App
1. Open the VideoGram app on your mobile device.
2. On the main page, tap the “Login” button located in the bottom right corner.
3. On the login page, enter your VideoGram username and password.
4. Once you have entered your credentials, tap the “Login” button.
5. You should now be logged in to your VideoGram account.
END CONTEXT

SKIPPY: Hello, I’m Skippy! How can I help you?
USER: Can’t login
SKIPPY:

Chatbot: I'm sorry to hear that you are having trouble logging in. Are

In [31]:
kb_chatbot.new_message("mobile")

User: mobile
Chatbot: No problem! To log in from the mobile app, you will need to open the VideoGram app on your mobile device. Then, tap the “Login” button located in the bottom right corner. On the login page, enter your VideoGram username and password. Once you have entered your credentials, tap the “Login” button. You should now be logged in to your VideoGram account. If you are still having trouble, please let me know and I will do my best to help.


#### Conclusion

GPT-3 can be helpful in creating conversational chatbots although it can be difficult to leverage it for answering complex questions given it can only levarage the information in context. The model tends to hallucinate when asked questions outside of that context. If you gather the incorrect document then the model will be responding without much prior knowledge to the domain specific question.

The good news is that when the relevant information is in the context the model performs well at generating the correct responses!