# Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

My datasource is the wikipedia article about finite sphere packing, because it's simply a cool topic.

Everybody should know about the 'sausage catastrophe' when it comes to packaging more than 55 spheres.

https://en.wikipedia.org/wiki/Finite_sphere_packing


## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [43]:
import os
import openai
openai.api_base = "https://openai.vocareum.com/v1"
openai.api_key =  os.getenv("OPENAI_API_KEY")


In [44]:
# Proxy settings when running on my telekom computer:
import os
if 'A306709' in os.environ['USERNAME']:
    print("Running on Christophs computer: update proxy settings.")
    os.environ["http_proxy"] = "http://sia-lb.telekom.de:8080"
    os.environ["https_proxy"] = "http://sia-lb.telekom.de:8080"
else:
    print("Running on any computer but not Christophs: don't update any proxy settings.")


Running on Christophs computer: update proxy settings.


In [72]:
# wikipedia datasource definition. Use vars to be able to change the source at this single point:
file_path="./data/wikipedia/"
file_name="Finite_sphere_packing"
assert_content="sausage"
max_context_paragraphs=25


In [19]:
# read wikipedia source and write content to disc:
######## skip this section if the file is already created
from pathlib import Path
import requests

# wikipedia source: https://en.wikipedia.org/wiki/{file_name}

params = {
    "action": "query",
    "prop": "extracts",
    "exlimit": 1,
    "titles": file_name,
    "explaintext": 1,
    "formatversion": 2,
    "format": "json"
}
resp = requests.get("https://en.wikipedia.org/w/api.php", params=params)
response_dict = resp.json()
wiki_text = response_dict["query"]["pages"][0]["extract"]
assert(len(wiki_text) > 1000)
assert(assert_content in wiki_text)

# Write the source to disk
Path(file_path).mkdir(parents=True, exist_ok=True)
with open(f"{file_path}{file_name}.txt", "w", encoding="utf-8") as f:
    f.write(wiki_text)


In [46]:
# Read content from disc (to treat wikipedia api with care):
with open(f"{file_path}{file_name}.txt", "r", encoding="utf-8") as f:
    wiki_text = f.read()

assert(len(wiki_text) > 1000)
assert(assert_content in wiki_text)
print("wikipedia page loaded from disc")


wikipedia page loaded from disc


In [41]:
# Split data to sentences and create a pandas dataframe:
# for the purpose of this project we do just a very very simple parsing, accepting that we loose the mathematics described on the page:
import pandas as pd

wiki_text_array = wiki_text.split("\n")
for index in range(0, len(wiki_text_array)):
    if "{" in wiki_text_array[index]: # remove formulars
        wiki_text_array[index] = ""
    wiki_text_array[index] = wiki_text_array[index].strip() # strip text to remove mathematic formulars later:

df = pd.DataFrame()
df["text"] = wiki_text_array


df = df[df["text"].str.len() > 10]
df = df[~df["text"].str.startswith("==")]
df.reset_index(inplace=True, drop=True)

print(df)
#print(df.head())
#print(df.tail())


                                                 text
0   In mathematics, the theory of finite sphere pa...
1   The similar problem for infinitely many sphere...
2   Sphere packing problems are distinguished betw...
3   In general, a packing refers to any arrangemen...
4   There are many possible ways to arrange sphere...
..                                                ...
73                         There holds an inequality:
74                  where the volume of the unit ball
75                                      dimensions is
76  and it is predicted that this holds for all di...
77                          can be found from that of

[78 rows x 1 columns]
                                                text
0  In mathematics, the theory of finite sphere pa...
1  The similar problem for infinitely many sphere...
2  Sphere packing problems are distinguished betw...
3  In general, a packing refers to any arrangemen...
4  There are many possible ways to arrange sphere...
           

## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [73]:
context_information_array = []
index = 0
for current_row in df["text"]:
    index += 1
    if index > max_context_paragraphs:
        break
    print(current_row)
    context_information_array.append(current_row)
context_information = "\n###\n".join(context_information_array)
print(f'context_information: {context_information}.')

In mathematics, the theory of finite sphere packing concerns the question of how a finite number of equally-sized spheres can be most efficiently packed. The question of packing finitely many spheres has only been investigated in detail in recent decades, with much of the groundwork being laid by László Fejes Tóth.
The similar problem for infinitely many spheres has a longer history of investigation, from which the Kepler conjecture is most well-known. Atoms in crystal structures can be simplistically viewed as closely-packed spheres and treated as infinite sphere packings thanks to their large number.
Sphere packing problems are distinguished between packings in given containers and free packings. This article primarily discusses free packings.
In general, a packing refers to any arrangement of a set of spatially-connected, possibly differently-sized or differently-shaped objects in space such that none of them overlap. In the case of the finite sphere packing problem, these objects a

In [74]:
def get_custom_prompt_for_question(q: str, use_context: bool = True):
    if use_context:
        return f'{context_information}\n---\nQuestion: {q}\nAnswer:'
    else:
        return f'Question: {q}\nAnswer:'


In [75]:
def answer_question(q: str, use_context: bool = True, talkative: bool = True):
    answer_object = openai.Completion.create(
        model="gpt-3.5-turbo-instruct",
        prompt=get_custom_prompt_for_question(q, use_context),
        max_tokens=50 # original value: 150
    )
    answer = answer_object["choices"][0]["text"].strip()
    if talkative:
        print("*" * 50)
        print(f"Question: {q}")
        print(f"Use context: {use_context}")
        print(answer)
    #print(answer_object)
    return answer


In [81]:
#play around:
answer_question("What kind of packing methods are known in the theory of finite sphere packing", False)
answer_question("What kind of packing methods are known in the theory of finite sphere packing", True)

**************************************************
Question: What kind of packing methods are known in the theory of finite sphere packing
Use context:: False
Some of the known packing methods in the theory of finite sphere packing include:

1. Bravais packing: This method involves arranging spheres in a regular, repeating pattern, such as cubic, hexagonal, or tetrahedral lattices.

2
**************************************************
Question: What kind of packing methods are known in the theory of finite sphere packing
Use context:: True
Sausage, pizza, and cluster packings are all methods commonly used in the theory of finite sphere packing. Additionally, there may be other methods that have not yet been discovered or explored.


'Sausage, pizza, and cluster packings are all methods commonly used in the theory of finite sphere packing. Additionally, there may be other methods that have not yet been discovered or explored.'

In [82]:
#play around:
answer_question("Who did the groundwork for the knowledge of finite sphere packing?", False)
answer_question("Who did the groundwork for the knowledge of finite sphere packing?", True)

**************************************************
Question: Who did the groundwork for the knowledge of finite sphere packing?
Use context:: False
The groundwork for the knowledge of finite sphere packing was laid by mathematicians such as Johannes Kepler, Pierre de Fermat, Isaac Newton, and George mm. Jamima Marshall. They developed theories and proofs related to packing of spheres in 2D and
**************************************************
Question: Who did the groundwork for the knowledge of finite sphere packing?
Use context:: True
László Fejes Tóth.


'László Fejes Tóth.'

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

In [78]:
# asks the given question two times to the model, first time without context and the second time with context from wikipedia:
def compare_answers(q: str):
    print(f"Question: {q}?")
    print("*** Plain answer (no context):")
    print(answer_question(q, use_context=False, talkative=False))
    print("*** Answer with context from wikipedia:")
    print(answer_question(q, use_context=True, talkative=False))
    print("=" * 50)


### Question 1

In [79]:
compare_answers("What kind of packing methods are known in the theory of finite sphere packing?")


Question: What kind of packing methods are known in the theory of finite sphere packing??
*** Plain answer (no context):
There are several packing methods that are known in the theory of finite sphere packing, including Greedy Algorithm, Lloyd's Algorithm, Random Sequential Adsorption, and the Closest Packing method. Other notable packing methods include the Lubachevsky-Stillinger
*** Answer with context from wikipedia:
The known packing methods in the theory of finite sphere packing include sausage packing, pizza packing, and cluster packing.


### Question 2

In [80]:
compare_answers("Who did the groundwork for the knowledge of finite sphere packing?")


Question: Who did the groundwork for the knowledge of finite sphere packing??
*** Plain answer (no context):
The groundwork for the knowledge of finite sphere packing was laid by various mathematicians throughout history, such as Euclid, Johannes Kepler, and Thomas Harriot. However, the most notable contribution to the field was made by mathematician Thue in 190
*** Answer with context from wikipedia:
László Fejes Tóth.


### Question 3

In [77]:
compare_answers("What is meant by 'sausage catastrophe'?")


Question: What is meant by 'sausage catastrophy'??
*** Plain answer (no context):
There are a few possible interpretations of the term "sausage catastrophe," as it is not a specific phrase with a defined meaning. Here are a few potential ways someone could interpret this phrase:

1. Literal interpretation: It is possible that someone is
*** Answer with context from wikipedia:
The sudden transition in optimal packing shape from the orderly sausage packing to the relatively unordered cluster packing, and vice versa, as the number of spheres increases, without a satisfying explanation for this phenomenon.


We see that the queries with context are answered very closely to the given wikipedia article, while the questions without context are answered according to other common knowlege from the web.

Especially question 3 shows the nice difference!
