In [1]:
# Custom Chatbot Project
# I have selected the Wikipedia article on the year 2023 as the dataset for my project.
# The reason for choosing this dataset is that it contains information about various events that occurred in 2023, which are not present in the ChatGPT's current knowledge base.
# This makes it an ideal choice to showcase the effectiveness of my custom prompts.

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [2]:
import requests

In [3]:
params = {
    "action": "query", 
    "prop": "extracts",
    "exlimit": 1,
    "titles": "2023",
    "explaintext": 1,
    "formatversion": 2,
    "format": "json"
}

response = requests.get("https://en.wikipedia.org/w/api.php", params=params)
response_dict = response.json()["query"]["pages"][0]["extract"].split("\n")
response_dict

['2023 (MMXXIII) was a common year starting on Sunday of the Gregorian calendar, the 2023rd year of the Common Era (CE) and Anno Domini (AD) designations, the 23rd  year of the 3rd millennium and the 21st century, and the  4th   year of the 2020s decade.  ',
 'The year 2023 saw the decline in severity of the COVID-19 pandemic, with the WHO (World Health Organization) ending its global health emergency status in May. Catastrophic natural disasters included the fifth-deadliest earthquake of the 21st century striking Turkey and Syria, leaving up to 62,000 people dead, Cyclone Freddy – the longest-lasting recorded tropical cyclone in history – leading to over 1,400 deaths in Malawi and Mozambique, Storm Daniel, which became the deadliest cyclone worldwide since Cyclone Nargis after killing at least 11,000 people in Libya, a major 6.8 magnitude earthquake striking western Morocco, killing 2,960 people, and a 6.3 magnitude quadruple earthquake striking western Afghanistan, killing over 1,400

In [4]:
# load the data with pandas

In [5]:
import pandas as pd

In [15]:
df = pd.DataFrame()
df["text"] = response_dict
df

Unnamed: 0,text
0,2023 (MMXXIII) was a common year starting on S...
1,The year 2023 saw the decline in severity of t...
2,The Russian invasion of Ukraine and Myanmar ci...
3,A banking crisis resulted in the collapse of n...
4,"In the realm of technology, 2023 saw the conti..."
...,...
296,"Physics – Pierre Agostini, Ferenc Krausz & Ann..."
297,Physiology or Medicine – Katalin Karikó & Drew...
298,
299,


In [16]:
df=df[df["text"].str.len() > 0]
df

Unnamed: 0,text
0,2023 (MMXXIII) was a common year starting on S...
1,The year 2023 saw the decline in severity of t...
2,The Russian invasion of Ukraine and Myanmar ci...
3,A banking crisis resulted in the collapse of n...
4,"In the realm of technology, 2023 saw the conti..."
...,...
294,"Literature – Jon Fosse, for his innovative pla..."
295,"Peace – Narges Mohammadi, for her works on the..."
296,"Physics – Pierre Agostini, Ferenc Krausz & Ann..."
297,Physiology or Medicine – Katalin Karikó & Drew...


In [17]:
df=df[~df["text"].str.startswith("==")]
df

Unnamed: 0,text
0,2023 (MMXXIII) was a common year starting on S...
1,The year 2023 saw the decline in severity of t...
2,The Russian invasion of Ukraine and Myanmar ci...
3,A banking crisis resulted in the collapse of n...
4,"In the realm of technology, 2023 saw the conti..."
...,...
293,"Economics – Claudia Goldin, for her empirical ..."
294,"Literature – Jon Fosse, for his innovative pla..."
295,"Peace – Narges Mohammadi, for her works on the..."
296,"Physics – Pierre Agostini, Ferenc Krausz & Ann..."


In [18]:
df.tail(15)

Unnamed: 0,text
270,The European Court of Justice rules that threa...
271,The deadliest mass shooting in the Czech Repub...
272,December 22 – 2023 Israel–Hamas war: The death...
273,December 29 – Russian invasion of Ukraine: Rus...
274,December 31 – Queen Margrethe II of Denmark an...
278,"The world population on January 1, 2023 was es..."
282,The best-selling video game in 2023 was Hogwar...
283,The highest-grossing movie in 2023 was Barbie.
284,The best-selling book in 2023 was It Ends with...
292,"Chemistry – Moungi Bawendi, Louis E. Brus & Al..."


In [19]:
from dateutil.parser import parse
prefix = ""
for (i, row) in df.iterrows():
    # If the row already has " - ", it already has the needed date prefix
    if " – " not in row["text"]:
        try:
            # If the row's text is a date, set it as the new prefix
            parse(row["text"])
            prefix = row["text"]
        except:
            # If the row's text isn't a date, add the prefix
            row["text"] = prefix + " – " + row["text"]
            
df = df[df["text"].str.contains(" – ")].reset_index(drop=True)
df

Unnamed: 0,text
0,– 2023 (MMXXIII) was a common year starting o...
1,The year 2023 saw the decline in severity of t...
2,– The Russian invasion of Ukraine and Myanmar...
3,– A banking crisis resulted in the collapse o...
4,"– In the realm of technology, 2023 saw the co..."
...,...
210,"Economics – Claudia Goldin, for her empirical ..."
211,"Literature – Jon Fosse, for his innovative pla..."
212,"Peace – Narges Mohammadi, for her works on the..."
213,"Physics – Pierre Agostini, Ferenc Krausz & Ann..."


In [20]:
df.tail(15)

Unnamed: 0,text
200,December 21 – The European Court of Justice ru...
201,December 21 – The deadliest mass shooting in t...
202,December 22 – 2023 Israel–Hamas war: The death...
203,December 29 – Russian invasion of Ukraine: Rus...
204,December 31 – Queen Margrethe II of Denmark an...
205,December 21 – The world population on January ...
206,December 21 – The best-selling video game in 2...
207,December 21 – The highest-grossing movie in 20...
208,December 21 – The best-selling book in 2023 wa...
209,"Chemistry – Moungi Bawendi, Louis E. Brus & Al..."


In [21]:
# Create an Embeddings Index with openai.Embedding

In [22]:
import openai
openai.api_key="MY API KEY"

In [23]:
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
response = openai.Embedding.create(
    input=df["text"].tolist(),
    model=EMBEDDING_MODEL_NAME
)

In [24]:
type(response)

openai.openai_object.OpenAIObject

In [25]:
response.keys()

dict_keys(['object', 'data', 'model', 'usage'])

In [26]:
type(response["data"])

list

In [27]:
response["data"][0]["embedding"]

[0.0036978961434215307,
 -0.013878832571208477,
 -0.007445806171745062,
 -0.016579579561948776,
 -0.0042761811055243015,
 -0.0025116312317550182,
 -0.014666550792753696,
 8.30307399155572e-05,
 -0.027007458731532097,
 0.0016317006666213274,
 0.03358427435159683,
 0.015954406931996346,
 -0.018067490309476852,
 -0.008039720356464386,
 -0.010009014047682285,
 0.009840217418968678,
 0.013953853398561478,
 -0.01572934351861477,
 0.014291446655988693,
 0.008627383038401604,
 0.002014618832617998,
 0.0075208269990980625,
 -0.009346331469714642,
 -0.019680434837937355,
 0.012747270055115223,
 0.021718498319387436,
 -0.00557028828188777,
 -0.005263953935354948,
 0.038860730826854706,
 -0.014866605401039124,
 -0.0031539960764348507,
 -0.011671973392367363,
 -0.0071332198567688465,
 -0.008496096357703209,
 0.008852444589138031,
 -0.03691019490361214,
 -0.015141681768000126,
 -0.007933440618216991,
 -0.00610481109470129,
 -0.011521931737661362,
 0.005798476282507181,
 0.017954958602786064,
 0.0178

In [28]:
len(response["data"][0]["embedding"])

1536

In [29]:
embeddings = [data["embedding"] for data in response["data"]]
embeddings

[[0.0036978961434215307,
  -0.013878832571208477,
  -0.007445806171745062,
  -0.016579579561948776,
  -0.0042761811055243015,
  -0.0025116312317550182,
  -0.014666550792753696,
  8.30307399155572e-05,
  -0.027007458731532097,
  0.0016317006666213274,
  0.03358427435159683,
  0.015954406931996346,
  -0.018067490309476852,
  -0.008039720356464386,
  -0.010009014047682285,
  0.009840217418968678,
  0.013953853398561478,
  -0.01572934351861477,
  0.014291446655988693,
  0.008627383038401604,
  0.002014618832617998,
  0.0075208269990980625,
  -0.009346331469714642,
  -0.019680434837937355,
  0.012747270055115223,
  0.021718498319387436,
  -0.00557028828188777,
  -0.005263953935354948,
  0.038860730826854706,
  -0.014866605401039124,
  -0.0031539960764348507,
  -0.011671973392367363,
  -0.0071332198567688465,
  -0.008496096357703209,
  0.008852444589138031,
  -0.03691019490361214,
  -0.015141681768000126,
  -0.007933440618216991,
  -0.00610481109470129,
  -0.011521931737661362,
  0.005798476

In [30]:
df["embeddings"] = embeddings
df

Unnamed: 0,text,embeddings
0,– 2023 (MMXXIII) was a common year starting o...,"[0.0036978961434215307, -0.013878832571208477,..."
1,The year 2023 saw the decline in severity of t...,"[-0.021441509947180748, -0.004789189901202917,..."
2,– The Russian invasion of Ukraine and Myanmar...,"[-0.019335422664880753, -0.017729807645082474,..."
3,– A banking crisis resulted in the collapse o...,"[-0.03288523107767105, -0.01240596640855074, 0..."
4,"– In the realm of technology, 2023 saw the co...","[-0.023159807547926903, -0.014188113622367382,..."
...,...,...
210,"Economics – Claudia Goldin, for her empirical ...","[-0.017360655590891838, -0.00789299700409174, ..."
211,"Literature – Jon Fosse, for his innovative pla...","[-0.009490792639553547, 0.017633236944675446, ..."
212,"Peace – Narges Mohammadi, for her works on the...","[-0.013204479590058327, -0.012974321842193604,..."
213,"Physics – Pierre Agostini, Ferenc Krausz & Ann...","[-0.020525293424725533, 0.01492499653249979, 0..."


In [31]:
df.to_csv("embeddings.csv")

In [32]:
# Find relevant data with Cosine similarity

In [33]:
import numpy as np
import pandas as pd
df = pd.read_csv("embeddings.csv", index_col=0)
df["embeddings"] = df["embeddings"].apply(eval).apply(np.array)
df

Unnamed: 0,text,embeddings
0,– 2023 (MMXXIII) was a common year starting o...,"[0.0036978961434215307, -0.013878832571208477,..."
1,The year 2023 saw the decline in severity of t...,"[-0.021441509947180748, -0.004789189901202917,..."
2,– The Russian invasion of Ukraine and Myanmar...,"[-0.019335422664880753, -0.017729807645082474,..."
3,– A banking crisis resulted in the collapse o...,"[-0.03288523107767105, -0.01240596640855074, 0..."
4,"– In the realm of technology, 2023 saw the co...","[-0.023159807547926903, -0.014188113622367382,..."
...,...,...
210,"Economics – Claudia Goldin, for her empirical ...","[-0.017360655590891838, -0.00789299700409174, ..."
211,"Literature – Jon Fosse, for his innovative pla...","[-0.009490792639553547, 0.017633236944675446, ..."
212,"Peace – Narges Mohammadi, for her works on the...","[-0.013204479590058327, -0.012974321842193604,..."
213,"Physics – Pierre Agostini, Ferenc Krausz & Ann...","[-0.020525293424725533, 0.01492499653249979, 0..."


In [34]:
# question = "When did Israel's Security Cabinet formally declare war for the first time since the Yom Kippur War in 1973?"
question = "What was the world's estimated population on January 1, 2023?" 

In [35]:
import openai
openai.api_key = "MY API KEY"

In [36]:
from openai.embeddings_utils import get_embedding

In [37]:
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
question_embeddings = get_embedding(question, engine = EMBEDDING_MODEL_NAME)
question_embeddings


[0.0046757301315665245,
 -0.010499547235667706,
 -0.005968129727989435,
 -0.003354467451572418,
 -0.011185834184288979,
 -0.008620276115834713,
 -0.031582023948431015,
 0.009498979896306992,
 -0.03281348943710327,
 -0.006410688627511263,
 0.024655014276504517,
 0.01087796688079834,
 -0.02116585522890091,
 0.007247702218592167,
 -0.0035148148890584707,
 0.0016676129307597876,
 0.021909868344664574,
 -0.027554096654057503,
 0.017099445685744286,
 0.002387572778388858,
 -0.002070084912702441,
 0.010396924801170826,
 0.000179388647666201,
 -0.0025350921787321568,
 -0.002668180502951145,
 0.015290726907551289,
 0.016919856891036034,
 -0.00512790959328413,
 0.03442979231476784,
 -0.02089647203683853,
 -0.0017894769553095102,
 -0.022499946877360344,
 -0.015816666185855865,
 -0.018625952303409576,
 -0.028041552752256393,
 -0.03540470451116562,
 -0.0071835629642009735,
 -0.028349418193101883,
 0.029683509841561317,
 -0.004624418914318085,
 0.012231298722326756,
 0.03245431184768677,
 0.00103664

In [38]:
from openai.embeddings_utils import distances_from_embeddings

In [39]:
distances = distances_from_embeddings(question_embeddings, df["embeddings"].tolist(), distance_metric = "cosine")
distances

[0.18151886566636777,
 0.2185385962018236,
 0.2568763601179893,
 0.32114185751220914,
 0.2498311427392278,
 0.25378573459918063,
 0.2329408461205955,
 0.2145953470132541,
 0.23685037350622273,
 0.25786607448359866,
 0.2359047828064812,
 0.256339058097951,
 0.26154140164409334,
 0.26158209949855515,
 0.26600339073376145,
 0.24525010012071413,
 0.26129183998122785,
 0.25851144410642024,
 0.265072677888046,
 0.27680875929685245,
 0.21855955847477793,
 0.241246503651395,
 0.27088669626776807,
 0.26755097187120813,
 0.27466489980711184,
 0.26436780287615147,
 0.2782126801534499,
 0.29751553109379525,
 0.23258604430515717,
 0.2597732183050734,
 0.26349027356508536,
 0.22688363353672236,
 0.2331123967149521,
 0.2705028588531716,
 0.2638192912593902,
 0.2637980155640174,
 0.2548462635874664,
 0.27312609636972773,
 0.22523191230944473,
 0.2563285459779636,
 0.2696438779195489,
 0.2569423925758796,
 0.22867878884649617,
 0.2525789837896619,
 0.2270026938622982,
 0.22943076808586482,
 0.260233597

In [40]:
df["distances"] = distances
df

Unnamed: 0,text,embeddings,distances
0,– 2023 (MMXXIII) was a common year starting o...,"[0.0036978961434215307, -0.013878832571208477,...",0.181519
1,The year 2023 saw the decline in severity of t...,"[-0.021441509947180748, -0.004789189901202917,...",0.218539
2,– The Russian invasion of Ukraine and Myanmar...,"[-0.019335422664880753, -0.017729807645082474,...",0.256876
3,– A banking crisis resulted in the collapse o...,"[-0.03288523107767105, -0.01240596640855074, 0...",0.321142
4,"– In the realm of technology, 2023 saw the co...","[-0.023159807547926903, -0.014188113622367382,...",0.249831
...,...,...,...
210,"Economics – Claudia Goldin, for her empirical ...","[-0.017360655590891838, -0.00789299700409174, ...",0.275255
211,"Literature – Jon Fosse, for his innovative pla...","[-0.009490792639553547, 0.017633236944675446, ...",0.315339
212,"Peace – Narges Mohammadi, for her works on the...","[-0.013204479590058327, -0.012974321842193604,...",0.297368
213,"Physics – Pierre Agostini, Ferenc Krausz & Ann...","[-0.020525293424725533, 0.01492499653249979, 0...",0.282101


In [41]:
df.to_csv("distances.csv")

In [42]:
import pandas as pd
df = pd.read_csv("distances.csv", index_col=0)
df

Unnamed: 0,text,embeddings,distances
0,– 2023 (MMXXIII) was a common year starting o...,[ 0.0036979 -0.01387883 -0.00744581 ... -0.00...,0.181519
1,The year 2023 saw the decline in severity of t...,[-0.02144151 -0.00478919 0.003297 ... -0.00...,0.218539
2,– The Russian invasion of Ukraine and Myanmar...,[-0.01933542 -0.01772981 0.00626939 ... -0.00...,0.256876
3,– A banking crisis resulted in the collapse o...,[-0.03288523 -0.01240597 0.03708335 ... -0.00...,0.321142
4,"– In the realm of technology, 2023 saw the co...",[-2.31598075e-02 -1.41881136e-02 8.48137552e-...,0.249831
...,...,...,...
210,"Economics – Claudia Goldin, for her empirical ...",[-0.01736066 -0.007893 -0.00115967 ... -0.01...,0.275255
211,"Literature – Jon Fosse, for his innovative pla...",[-0.00949079 0.01763324 0.00505303 ... -0.01...,0.315339
212,"Peace – Narges Mohammadi, for her works on the...",[-0.01320448 -0.01297432 -0.00495826 ... -0.01...,0.297368
213,"Physics – Pierre Agostini, Ferenc Krausz & Ann...",[-0.02052529 0.014925 0.00526825 ... -0.02...,0.282101


In [43]:
current_shortest = df.iloc[0]["distances"]
current_shortest_index =0
current_shortest

0.1815188656663677

In [44]:
for index, distance in enumerate (df["distances"].values):
    if distance < current_shortest:
        current_shortest = distance
        current_shortest_index =  index
current_shortest, current_shortest_index

(0.0932855391772102, 205)

In [46]:
df.iloc[205]["text"]

'December 21 – The world population on January 1, 2023 was estimated at 7.943 billion people, and was expected to increase to 8.119 billion on January 1, 2024. An estimated 134.3 million births and 60.8 million deaths were expected to take place in 2023. The average global life expectancy was 73.16 years, an increase of 0.18 years from 2022. The rate of child mortality was by the end of the year, expected to have decreased from 2022. Less than 23% of people were living in extreme poverty (on or below the international poverty line), a decrease from 2022. In April, India surpassed China as the most populated country in the world.'

In [47]:
df.sort_values(by="distances")

Unnamed: 0,text,embeddings,distances
205,December 21 – The world population on January ...,[ 0.00657458 0.00155844 -0.00251325 ... -0.00...,0.093286
0,– 2023 (MMXXIII) was a common year starting o...,[ 0.0036979 -0.01387883 -0.00744581 ... -0.00...,0.181519
207,December 21 – The highest-grossing movie in 20...,[-0.01989677 -0.0240621 -0.00685305 ... 0.01...,0.209836
7,January 8 – The 2023 Beninese parliamentary el...,[-0.02317706 -0.00256988 -0.0079246 ... -0.01...,0.214595
208,December 21 – The best-selling book in 2023 wa...,[-0.01484779 -0.00668879 -0.00648349 ... -0.00...,0.215859
...,...,...,...
212,"Peace – Narges Mohammadi, for her works on the...",[-0.01320448 -0.01297432 -0.00495826 ... -0.01...,0.297368
27,February 3 – A Norfolk Southern train carrying...,[-0.00799812 0.00468158 -0.00362179 ... -0.00...,0.297516
139,"August 10 – Tapestry, the holding company of C...",[-0.03187628 -0.01692278 -0.01004995 ... -0.00...,0.305285
211,"Literature – Jon Fosse, for his innovative pla...",[-0.00949079 0.01763324 0.00505303 ... -0.01...,0.315339


In [48]:
df.sort_values(by="distances").to_csv("disances_sorted.csv")

In [49]:
# Tokenizing with tiktoken

In [50]:
import tiktoken

In [51]:
tokenizer = tiktoken.get_encoding("cl100k_base")

In [52]:
tokenizer

<Encoding 'cl100k_base'>

In [53]:
tokenizer.encode("This is a question")

[2028, 374, 264, 3488]

In [54]:
question = "What was the world's estimated population on January 1, 2023?"

In [55]:
tokenizer.encode(question)

[3923,
 574,
 279,
 1917,
 596,
 13240,
 7187,
 389,
 6186,
 220,
 16,
 11,
 220,
 2366,
 18,
 30]

In [56]:
len(tokenizer.encode(question))

16

## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [57]:
prompt_template = """
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context:
{}
---

Question:{}
Answer:"""

In [58]:
question = "What was the world's estimated population on January 1, 2023?"

In [59]:
print(prompt_template.format("context", question))


Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context:
context
---

Question:What was the world's estimated population on January 1, 2023?
Answer:


In [60]:
max_token_count = 1000

In [61]:
import tiktoken
tokenizer = tiktoken.get_encoding("cl100k_base")

In [62]:
tokenizer.encode (question)

[3923,
 574,
 279,
 1917,
 596,
 13240,
 7187,
 389,
 6186,
 220,
 16,
 11,
 220,
 2366,
 18,
 30]

In [63]:
len(tokenizer.encode (question))

16

In [64]:
current_token_count = len(tokenizer.encode (question)) + len(tokenizer.encode(prompt_template))
current_token_count

56

In [65]:
context=[]

In [66]:
import pandas as pd
df = pd.read_csv("disances_sorted.csv", index_col=0)
df

Unnamed: 0,text,embeddings,distances
205,December 21 – The world population on January ...,[ 0.00657458 0.00155844 -0.00251325 ... -0.00...,0.093286
0,– 2023 (MMXXIII) was a common year starting o...,[ 0.0036979 -0.01387883 -0.00744581 ... -0.00...,0.181519
207,December 21 – The highest-grossing movie in 20...,[-0.01989677 -0.0240621 -0.00685305 ... 0.01...,0.209836
7,January 8 – The 2023 Beninese parliamentary el...,[-0.02317706 -0.00256988 -0.0079246 ... -0.01...,0.214595
208,December 21 – The best-selling book in 2023 wa...,[-0.01484779 -0.00668879 -0.00648349 ... -0.00...,0.215859
...,...,...,...
212,"Peace – Narges Mohammadi, for her works on the...",[-0.01320448 -0.01297432 -0.00495826 ... -0.01...,0.297368
27,February 3 – A Norfolk Southern train carrying...,[-0.00799812 0.00468158 -0.00362179 ... -0.00...,0.297516
139,"August 10 – Tapestry, the holding company of C...",[-0.03187628 -0.01692278 -0.01004995 ... -0.00...,0.305285
211,"Literature – Jon Fosse, for his innovative pla...",[-0.00949079 0.01763324 0.00505303 ... -0.01...,0.315339


In [67]:
for text in df["text"].values:
    text_token_count = len(tokenizer.encode(text))
    current_token_count += text_token_count
    
    if current_token_count <=max_token_count:
        context.append(text)
    else:
        break

In [68]:
context

['December 21 – The world population on January 1, 2023 was estimated at 7.943 billion people, and was expected to increase to 8.119 billion on January 1, 2024. An estimated 134.3 million births and 60.8 million deaths were expected to take place in 2023. The average global life expectancy was 73.16 years, an increase of 0.18 years from 2022. The rate of child mortality was by the end of the year, expected to have decreased from 2022. Less than 23% of people were living in extreme poverty (on or below the international poverty line), a decrease from 2022. In April, India surpassed China as the most populated country in the world.',
 ' – 2023 (MMXXIII) was a common year starting on Sunday of the Gregorian calendar, the 2023rd year of the Common Era (CE) and Anno Domini (AD) designations, the 23rd  year of the 3rd millennium and the 21st century, and the  4th   year of the 2020s decade.  ',
 'December 21 – The highest-grossing movie in 2023 was Barbie.',
 'January 8 – The 2023 Beninese p

In [69]:
print(prompt_template.format(context, question))


Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context:
['December 21 – The world population on January 1, 2023 was estimated at 7.943 billion people, and was expected to increase to 8.119 billion on January 1, 2024. An estimated 134.3 million births and 60.8 million deaths were expected to take place in 2023. The average global life expectancy was 73.16 years, an increase of 0.18 years from 2022. The rate of child mortality was by the end of the year, expected to have decreased from 2022. Less than 23% of people were living in extreme poverty (on or below the international poverty line), a decrease from 2022. In April, India surpassed China as the most populated country in the world.', ' – 2023 (MMXXIII) was a common year starting on Sunday of the Gregorian calendar, the 2023rd year of the Common Era (CE) and Anno Domini (AD) designations, the 23rd  year of the 3rd millennium and the 21st century, and th

In [70]:
print(prompt_template.format("\n\n###\n\n".join(context), question))


Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know"

Context:
December 21 – The world population on January 1, 2023 was estimated at 7.943 billion people, and was expected to increase to 8.119 billion on January 1, 2024. An estimated 134.3 million births and 60.8 million deaths were expected to take place in 2023. The average global life expectancy was 73.16 years, an increase of 0.18 years from 2022. The rate of child mortality was by the end of the year, expected to have decreased from 2022. Less than 23% of people were living in extreme poverty (on or below the international poverty line), a decrease from 2022. In April, India surpassed China as the most populated country in the world.

###

 – 2023 (MMXXIII) was a common year starting on Sunday of the Gregorian calendar, the 2023rd year of the Common Era (CE) and Anno Domini (AD) designations, the 23rd  year of the 3rd millennium and the 21st century, and t

In [72]:
import openai
openai.api_key = "MY API KEY"

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [73]:
# Question 1: When did Israel's Security Cabinet formally declare war for the first time since the Yom Kippur War in 1973?

In [74]:
openai.Completion.create(
    model = "gpt-3.5-turbo-instruct",
    prompt= prompt_template.format("\n\n###\n\n".join(context), question))["choices"][0]["text"]

' 7.943 billion people'

In [75]:
prompt = "When did Israel's Security Cabinet formally declare war for the first time since the Yom Kippur War in 1973?"
response = openai.Completion.create(
  engine="gpt-3.5-turbo-instruct",
  prompt=prompt,
  max_tokens=100,
  n=1,
  stop=None,
  temperature=0.7
)

# Get the answer from the response
answer = response.choices[0].text.strip()

# Print the answer
print(answer)

Israel's Security Cabinet formally declared war on August 12, 2006, during the Second Lebanon War with Hezbollah. This was the first time Israel had formally declared war since the Yom Kippur War in 1973.


In [76]:
# Conclusion:
# The original GPT provides a wrong answer statinng that :"Israel's Security Cabinet formally declared war on Hamas on July 8, 2014, in what became known as Operation Protective Edge. This was the first time since the Yom Kippur War in 1973 that Israel had declared war."
# My custom chatbot provides a correct answer: October 8 

### Question 2

In [77]:
# Question 2: What was the world's estimated population on January 1, 2023? 

In [78]:
prompt = "What was the world's estimated population on January 1, 2023?"
response = openai.Completion.create(
  engine="gpt-3.5-turbo-instruct",
  prompt=prompt,
  max_tokens=100,
  n=1,
  stop=None,
  temperature=0.7
)

# Get the answer from the response
answer = response.choices[0].text.strip()

# Print the answer
print(answer)

It is not possible to accurately predict the world's population on a specific date in the future. Population growth is influenced by a variety of factors such as birth rates, death rates, and immigration. The United Nations estimates that the world's population will reach 8.5 billion by 2030, but this is just an estimate and the actual number could vary.


In [None]:
# Conclusion:
# The original GPT states that:
#"It is impossible to accurately predict the world's population on a specific date in the future. 
# Population growth is affected by various factors such as birth rates, death rates, and migration. 
# It is also impacted by unforeseen events and changes in global trends. 
# Additionally, population estimates can vary depending on the source and methodology used.

# My custom chatbot provides a correct answer: 7.943 billion