# An Ancient World Inflected ChatBot

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.png)](https://colab.research.google.com/github/aw-chat/aw-chat/blob/main/app.ipynb)

This notebook creates a gradio app that explores how a chatbot can be encouraged to provide information about topics related to the Mediterranean and nearby ancient cultures. It is not a perfect tool as there will still be errors of fact and analysis in the text it produces. Revealing those errors is a goal in that their occurence is an opportunity to think about how the output can be improved.

Running all cells will instantiate the gradio app at the end of the notebook and will also print out a link similar to https:####...####.gradio.live . Clicking the gradio.live link will provide a better user experience than using the embedded app. If the first cell doesn't properly import gradio, run it again. That seems to work. (I'd be happy for pointers on why this is happening.)

Any queries about this notebook can be directed to [Sebastian Heath](https://isaw.nyu.edu/people/faculty/sebastian-heath) <sebastian.heath -@- nyu.edu>.

In [None]:
from huggingface_hub import InferenceClient
# gradio >= 4.0.0 doesn't work (yet?)
!pip install -q gradio==3.48.0
import gradio as gr
# if this cell fails to run because gradio can't be imported, try running it again. That seems to work. I am not sure why.

In [None]:
model = "mistralai/Mixtral-8x7B-Instruct-v0.1" # see https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

client = InferenceClient(model)

In [None]:
# Handle user clicks on the various checkboxes by updating the system prompt. The user will not see
# these changes (though that could probably be added. No reason not to.)

# There is a tremendous amount of choice involved as to which "cultures", "methods", and "perspectives" to include here.
# Please do edit to your own preferences. I'd also like to see the interface become more flexible so that more options can
# be made available without overwhelming the user.

def update_system_prompt (system_prompt, cultures, methods, perspectives, dates, wikipedia):
   # cultures
   if "Celtic" in cultures:
       system_prompt += "Emphasize evidence from ancient European Celtic (meaning Gallia and Hispania as well as the Celts of Central Europe) history, literature, and archaeology. "
   if "Egyptian" in cultures:
       system_prompt += "Emphasize evidence from ancient Egpytian history, literature, and archaeology."
   if "Greek" in cultures:
       system_prompt += "Emphasize evidence from ancient Greek history, literature, and archaeology. "
   if "Roman" in cultures:
       system_prompt += "Emphasize evidence from ancient Roman history, Latin literature, and Roman archaeology. "
   if "Ancient Southwest Asia (ANE)" in cultures:
       system_prompt += "Emphasize evidence from Ancient Near Eastern (including Mesopotamia, Persia, and Arabia) history, litereature, and archaeology. "
   if "Phoenecian" in cultures:
       system_prompt += "Emphasize evidence from ancient Phoenecian and Punic history, literature, and archaeology. "

   # methods
   if "History" in methods:
       system_prompt += "Especially adopt the perspective of an historian. "
   if "Art History" in methods:
       system_prompt += "Especially adopt the perspective of an art historian. "
   if "Literature" in methods:
       system_prompt += "Especially adopt the perspective of an expert in philology, literature, and literary studies. "
   if "Archaeology" in methods:
       system_prompt += "Especially adopt the perspective of an archaeologist. "
   if "Epigraphy" in methods:
       system_prompt += "Especially adopt the perspective of an epigrapher and of epipgraphy. "
   if "Numismatics" in methods:
       system_prompt += "Especially adopt the perspective of an expert in numismatics and coinage. "
   if "Papyrology" in methods:
       system_prompt += "Especially adopt the perspective of an expert in papyrology. "
   if "Ancient Medicine" in methods:
       system_prompt += "Especially adopt the perspective of an expert in Ancient Medicine. "
   if "Ancient Science" in methods:
       system_prompt += "Especially adopt the perspective of an expert in the Ancient Exact Sciences. "
   if "Modern Scientific Approaches" in methods:
       system_prompt += "Especially adopt the perspective of an expert in recent scientific approaches to the study of archaeology and history."

   # perspectives
   if "Enslaved Persons" in perspectives:
      system_prompt += "Emphasize evidence that includes the perpsective of ancient enslaved persons. "
   if "Gender" in perspectives:
      system_prompt += "Emphasize evidence that supports the discussion of gender. "
   if "Ethnicity" in perspectives:
      system_prompt += "Emphasize evidence that supports the discussion of ethnicity. "
   if "Inequality" in perspectives:
      system_prompt += "Emphasize evidence that supports the discussion of social inequality. "
   if "Hybridity" in perspectives:
      system_prompt += "Emphasize evidence that supports the discussion of hybridity. "
   if "New Approaches" in perspectives:
      system_prompt += "Especially include discussion of new and exciting approaches to the topic. "

   if dates:
       system_prompt += " Add parenthetic dates when you can. Use the BCE/CE system."

   if wikipedia:
       system_prompt += " Add a list of Wikipedia pages with links to the end of your response. Label this section “Wikipedia Pages” and add a note that some may not exist."

   return system_prompt

In [None]:
def format_prompt(message, history):
  prompt = "<s>"
  for user_prompt, bot_response in history:
    prompt += f"[INST] {user_prompt} [/INST]"
    prompt += f" {bot_response}</s> "
  prompt += f"[INST] {message} [/INST]"
  return prompt

In [None]:
def generate(prompt, history, system_prompt,
                cultures = [], methods= [], perspectives= [],
                dates=False, wikipedia = False,
                temperature=0.9, max_new_tokens=2048, top_p=0.95,repetition_penalty=1.0):

  temperature = float(temperature)
  if temperature < 1e-2:
    temperature = 1e-2
  top_p = float(top_p)

  generate_kwargs = dict(
        temperature=temperature,
        max_new_tokens=max_new_tokens,
        top_p=top_p,
        repetition_penalty=repetition_penalty,
        do_sample=True
   )

  system_prompt = update_system_prompt(system_prompt,
                                        cultures,
                                        methods,
                                        perspectives,
                                        dates,
                                        wikipedia)


  formatted_prompt = format_prompt(f"{system_prompt}, {prompt}", history)
  stream = client.text_generation(formatted_prompt, **generate_kwargs, stream=True, details=True, return_full_text=False)
  output = ""

  for response in stream:
    output += response.token.text
    yield output

  return output

In [None]:
# if you edit the options here, you need to edit the update_system_prompt functoin to match.
additional_inputs=[
    gr.Textbox(
        label="System Prompt",
        value="You are an expert in Ancient Mediterranean and Near Eastern political history, social history, literature, art, and archaeology. Your expertise should inform all your responses. Don't make up information if you don't know it. Avoid the first person voice in your response.",
        max_lines=1,
        interactive=True,
    ),
    gr.CheckboxGroup(["Ancient Southwest Asia (ANE)",
                      "Celtic",
                      "Egyptian",
                      "Greek",
                      "Phoenecian",
                      "Roman"],
                      label="Cultures",
                      info='Add language similar to "Emphasize evidence from..." to the system prompt.'),
    gr.CheckboxGroup(["History",
                       "Art History",
                       "Literature",
                        "Archaeology",
                        "Epigraphy",
                        "Numismatics",
                        "Papyrology",
                        "Ancient Medicine",
                        "Ancient Science",
                        "Modern Scientific Approaches"],
                      label="Methodologies/Approaches",
                      info='Add language similar to "Emphasize evidence from..." to the system prompt.'),
    gr.CheckboxGroup(["Enslaved Persons", "Gender", "Ethnicity", "Inequality", "Hybridity", "New Approaches"],
                      label="Perspectives",
                      info='Add language similar to "Emphasize evidence related to..." to the system prompt.'),
    gr.Checkbox(value=False,label="Dates", info="Select to include parenthetic dates when possible."),
    gr.Checkbox(value=False,label="Wikipedia", info="Append a list of wiki pages. (This is likely to generate links to pages that don't exist.)"),
    gr.Slider(
        label="Temperature",
        value=0.9,
        minimum=0.0,
        maximum=1.0,
        step=0.05,
        interactive=True,
        info="Higher values produce more diverse outputs",
    ),
    gr.Slider(
        label="Max new tokens",
        value=2048,
        minimum=0,
        maximum=4096,
        step=64,
        interactive=True,
        info="The maximum numbers of new tokens",
    ),
    gr.Slider(
        label="Top-p (nucleus sampling)",
        value=0.90,
        minimum=0.0,
        maximum=1,
        step=0.05,
        interactive=True,
        info="Higher values sample more low-probability tokens",
    ),
    gr.Slider(
        label="Repetition penalty",
        value=1.2,
        minimum=1.0,
        maximum=2.0,
        step=0.05,
        interactive=True,
        info="Penalize repeated tokens",
    )
]

In [None]:
# add your own here
examples = [
    ["Adopting the persona of a Roman emperor, greet an embassy from the city of Ilion (also known as Troy)."],
    ["Discuss the evidence for Roman manufacturing during the empire."],
    ["Should Athens in the fifth century BCE be considered an empire?"],
    ["What is the case against considering fifth century BCE Athens an empire?"],
    ["Citing specific characters and story-lines, what is the connection between the Literature of the Ancient Near East and Archaic and Classical Greek Literature?"],
    ["Which gods were particularly called upon to heal sickness and maintain good health?"]
]

In [None]:
demo = gr.ChatInterface(
    fn=generate,
    chatbot=gr.Chatbot(show_label=False, show_share_button=False, show_copy_button=True, layout="panel"),
    additional_inputs=additional_inputs,
    examples=examples,
    title="Ancient World (Inflected) Chat",
    description=f"A chatbot that will answer as an “expert in Ancient Mediterranean and Near Eastern political history, social history, literature, art, and archaeology.” Open the “Additional Inputs” panel to see options by which you can affect the output. Currently, this tool uses the “{model}” LLM downloaded from Hugging Face."
)

In [None]:
# executing this cell should instatiate the gradio app within the notebook as well
# as print out a link to gradio.live that will let you use the app in its own window

demo.queue().launch()