# Ancient World Research Assistant

[Open In Google Colab](https://colab.research.google.com/github/ai-for-aw/research-assistant/blob/main/ancient_world_research_assistant.ipynb).

This notebook creates a gradio app that explores how a chatbot can be encouraged to provide information about topics related to the Ancient Mediterranean and nearby ancient cultures. It is not a perfect tool as there will still be errors of fact and analysis in the text it produces. Revealing those errors is a goal in that their occurence is an opportunity to think about how the output can be improved.

If you've opened this notebook in Google Colab, choosing "Run all" from the 'Runtime' menu will instantiate the gradio app at the end of the notebook. Scroll to the end you'll see a link similar to https:####...####.gradio.live . Clicking that link will provide a better user experience than using the embedded app which appears below the last cell. Much of the ability to affect the responses you get comes from expanding and using the "Additional Inputs" section that appears below the example prompts.

Any queries about this notebook can be directed to [Sebastian Heath](https://isaw.nyu.edu/people/faculty/sebastian-heath) <sebastian.heath -@- nyu.edu>.

The code here is released under an MIT license. Feel free to copy and adapt it according to the terms of that licence (see the [github repo](https://github.com/ai-for-aw/research-assistant)).

In [None]:
!pip install -q 'gradio>=4.0.0' # you can probably remove this if you're running locally
import gradio as gr
from huggingface_hub import InferenceClient

# Chances are you'll see errors about "... pip's dependency resolver ...". Those do not
# seem to result in subsequent cells actually failing to run.
# But let me know if you think that's not right.

In [None]:
model = "mistralai/Mixtral-8x7B-Instruct-v0.1" # see https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

client = InferenceClient(model) # see https://huggingface.co/docs/huggingface_hub/package_reference/inference_client

# If you haven't set HF_TOKEN as a secret for your notebooks, you'll get a warning. It can be ignored.

In [None]:
# Handle user clicks on the various interface elements by updating the system prompt. The user will not see
# these changes (though that could probably be added. No reason not to.)

# There is a tremendous amount of choice involved as to which "cultures", "methods", and "perspectives" to include here.
# Please do edit to your own preferences. I'd also like to see the interface become more flexible so that more options can
# be made available without overwhelming the user.

def update_system_prompt (system_prompt, cultures, methods, perspectives, languages, dates, wikipedia):
   # cultures
   if "Celtic" in cultures:
       system_prompt += "Emphasize evidence from ancient European Celtic (meaning Gallia and Hispania as well as the Celts of Central Europe) history, literature, and archaeology. "
   if "Egyptian" in cultures:
       system_prompt += "Emphasize evidence from ancient Egpytian history, literature, and archaeology."
   if "Greek" in cultures:
       system_prompt += "Emphasize evidence from ancient Greek history, literature, and archaeology. "
   if "Roman" in cultures:
       system_prompt += "Emphasize evidence from ancient Roman history, Latin literature, and Roman archaeology. "
   if "Ancient Southwest Asia (ANE)" in cultures:
       system_prompt += "Emphasize evidence from Ancient Near Eastern (including Mesopotamia, Persia, and Arabia) history, litereature, and archaeology. "
   if "Phoenecian" in cultures:
       system_prompt += "Emphasize evidence from ancient Phoenecian and Punic history, literature, and archaeology. "

   # methods
   if "History" in methods:
       system_prompt += "Especially adopt the perspective of an historian. "
   if "Art History" in methods:
       system_prompt += "Especially adopt the perspective of an art historian. "
   if "Literature" in methods:
       system_prompt += "Especially adopt the perspective of an expert in philology, literature, and literary studies. "
   if "Archaeology" in methods:
       system_prompt += "Especially adopt the perspective of an archaeologist. "
   if "Epigraphy" in methods:
       system_prompt += "Especially adopt the perspective of an epigrapher and of epipgraphy. "
   if "Numismatics" in methods:
       system_prompt += "Especially adopt the perspective of an expert in numismatics and coinage. "
   if "Papyrology" in methods:
       system_prompt += "Especially adopt the perspective of an expert in papyrology. "
   if "Ancient Medicine" in methods:
       system_prompt += "Especially adopt the perspective of an expert in Ancient Medicine. "
   if "Ancient Religion" in methods:
       system_prompt += "Especially adopt the perspective of an expert in Ancient Religion and Ancient Belief Systems. "
   if "Ancient Science" in methods:
       system_prompt += "Especially adopt the perspective of an expert in the Ancient Exact Sciences. "
   if "Modern Scientific Approaches" in methods:
       system_prompt += "Especially adopt the perspective of an expert in recent scientific approaches to the study of archaeology and history."

   # perspectives
   if "Enslaved Persons" in perspectives:
      system_prompt += "Emphasize evidence that includes the perpsective of ancient enslaved persons. "
   if "Gender" in perspectives:
      system_prompt += "Emphasize evidence that supports the discussion of gender. "
   if "Ethnicity" in perspectives:
      system_prompt += "Emphasize evidence that supports the discussion of ethnicity. "
   if "Inequality" in perspectives:
      system_prompt += "Emphasize evidence that supports the discussion of social inequality. "
   if "Hybridity" in perspectives:
      system_prompt += "Emphasize evidence that supports the discussion of hybridity. "
   if "New Approaches" in perspectives:
      system_prompt += "Especially include discussion of new and exciting approaches to the topic. "

   # languages
   if  languages == "Arabic":
      system_prompt += "Reply in Arabic. "
   if languages == "English":
      system_prompt += "Reply in English. "
   if languages == "French":
      system_prompt += "Reply in French. "
   if languages == "German":
      system_prompt += "Reply in German. "
   if languages == "Italian":
      system_prompt += "Reply in Italian. "
   if languages == "Mandarin":
      system_prompt += "Reply in Mandarin. "
   if languages == "Russian":
      system_prompt += "Reply in Russian. "
   if languages == "Spanish":
      system_prompt += "Reply in Spanish. "


   if dates:
       system_prompt += " Add parenthetic dates when you can. Use the BCE/CE system."

   if wikipedia:
       system_prompt += " Add a list of Wikipedia pages with links to the end of your response. Label this section “Wikipedia Pages” and add a note that some may not exist."

   return system_prompt

In [None]:
def format_prompt(message, history):
  prompt = "<s>"
  for user_prompt, bot_response in history:
    prompt += f"[INST] {user_prompt} [/INST]"
    prompt += f" {bot_response}</s> "
  prompt += f"[INST] {message} [/INST]"
  return prompt

In [None]:
def generate(prompt, history, system_prompt,
                cultures = [], methods= [], perspectives= [],
                languages="English",
                dates=False, wikipedia = False,
                temperature=0.9, max_new_tokens=2048, top_p=0.95,repetition_penalty=1.0):

  temperature = float(temperature)
  if temperature < 1e-2:
    temperature = 1e-2
  top_p = float(top_p)

  generate_kwargs = dict(
        temperature=temperature,
        max_new_tokens=max_new_tokens,
        top_p=top_p,
        repetition_penalty=repetition_penalty,
        do_sample=True
   )

  # it's important to keep this in sync with the interface defined by additional_inputs.
  system_prompt = update_system_prompt(system_prompt,
                                        cultures,
                                        methods,
                                        perspectives,
                                        languages,
                                        dates,
                                        wikipedia)


  formatted_prompt = format_prompt(f"{system_prompt}, {prompt}", history)
  stream = client.text_generation(formatted_prompt, **generate_kwargs, stream=True, details=True, return_full_text=False)
  output = ""

  for response in stream:
    output += response.token.text
    yield output

  return output

In [None]:
# if you edit the inputs or options here, you may need to edit the update_system_prompt function to match.
additional_inputs=[
    gr.Textbox(
        label="System Prompt",
        value="You are an expert in Ancient Mediterranean and Ancient Near Eastern political history, social history, literature, art, and archaeology. Your expertise should inform all your responses. Don't make up information if you don't know it. Avoid the first person voice in your response.",
        max_lines=1,
        interactive=True,
    ),
    gr.CheckboxGroup(["Ancient Southwest Asia (ANE)",
                      "Celtic",
                      "Egyptian",
                      "Greek",
                      "Phoenecian",
                      "Roman"],
                      label="Cultures",
                      info='Add language similar to "Emphasize evidence from..." to the system prompt.'),
    gr.CheckboxGroup(["History",
                       "Art History",
                       "Literature",
                        "Archaeology",
                        "Epigraphy",
                        "Numismatics",
                        "Papyrology",
                        "Ancient Medicine",
                        "Ancient Religion",
                        "Ancient Science",
                        "Modern Scientific Approaches"],
                      label="Methodologies/Approaches",
                      info='Add language similar to "Emphasize evidence from..." to the system prompt.'),
    gr.CheckboxGroup(["Enslaved Persons", "Gender", "Ethnicity", "Inequality", "Hybridity", "New Approaches"],
                      label="Perspectives",
                      info='Add language similar to "Emphasize evidence related to..." to the system prompt.'),
    gr.Radio(["Arabic","English","French","German","Italian","Mandarin", "Russian","Spanish"],
                     label="Language of Response",
                     info='For languages not listed, add "Reply in ..." to your prompt/query.'),
    gr.Checkbox(value=False,label="Dates", info="Select to include parenthetic dates when possible."),
    gr.Checkbox(value=False,label="Wikipedia", info="Append a list of wiki pages. (This is likely to generate links to pages that don't exist.)"),
    gr.Slider(
        label="Temperature",
        value=0.9,
        minimum=0.0,
        maximum=1.0,
        step=0.05,
        interactive=True,
        info="Higher values produce more diverse outputs",
    ),
    gr.Slider(
        label="Max new tokens",
        value=2048,
        minimum=0,
        maximum=4096,
        step=64,
        interactive=True,
        info="The maximum numbers of new tokens",
    ),
    gr.Slider(
        label="Top-p (nucleus sampling)",
        value=0.90,
        minimum=0.0,
        maximum=1,
        step=0.05,
        interactive=True,
        info="Higher values sample more low-probability tokens",
    ),
    gr.Slider(
        label="Repetition penalty",
        value=1.2,
        minimum=1.0,
        maximum=2.0,
        step=0.05,
        interactive=True,
        info="Penalize repeated tokens",
    )
]

In [None]:
# Add your own prompts here. This is an easy way to affect the interface. If you don't know python,
# just note the brackets and quotes around the prompts and the commas between the lines. Edit
# this cell and choose "Run all" from the "Runtime" menu again. That's a bit of overkill but should work.
examples = [
    ["Citing specific characters and story-lines, what is the connection between the Literature of the Ancient Near East and Archaic and Classical Greek Literature?"],
    ["Should Athens in the fifth century BCE be considered an empire?"],
    ["What is the case against considering fifth century BCE Athens an empire?"],
    ["Create a list of the different myths, legends, and stories of the founding of the city of Rome."],
    ["Discuss the evidence for Roman manufacturing during the empire."],
    ["Which gods were particularly called upon to heal sickness and maintain good health?"],
    ["Adopting the persona of a Roman emperor, greet an embassy from the city of Ilion (also known as Troy)."],
    ["I am an inhabitant of the Ancient Mesopotamian city of Ur. What might I see and who might I talk to as I walk down the street from my house towards the Ziggurat?"]
]

In [None]:
demo = gr.ChatInterface(
    fn=generate,
    chatbot=gr.Chatbot(show_label=True, show_share_button=True, show_copy_button=True, layout="panel"),
    additional_inputs=additional_inputs,
    examples=examples,
    title="Ancient World Research Assistant",
    description=f"A chatbot that will answer as an “expert in Ancient Mediterranean and Near Eastern political history, social history, literature, art, and archaeology.” Open the “Additional Inputs” panel to see options by which you can affect the output. Currently, this tool uses the “{model}” LLM downloaded from Hugging Face."
)

In [None]:
# executing this cell should instatiate the gradio app within the notebook as well
# as print out a link to gradio.live that will let you use the app in its own window

demo.queue().launch() # pass auth=('xxxx','xxxx') if you want a passowrd on the gradio.live link. But it won't work on the in-notebook version (I don't think)