# LLMs (72 points possible)

In this assignment, we'll explore how to augment existing LLM models with capabilities such as

* Referring to custom documents (RAG)
* Calling functions that augment the agent's capabilities (ReAct)
* Adding memory

The main document we'll be using to augment the agent is a *Dungeons and Dragons* adventure - a sketch of a story and some game statistics for the major characters.  It's a little weird, since it's set in the decidedly weird *Spelljammer* setting of "D&D in space," but the main reason we're using it is that it was definitely not in any LLM's training data, so it's a good example of a custom document an LLM might need to refer to.

In [None]:
# Pin versions because things get deprecated very fast around here
!pip install \
  "pydantic>=2,<3" \
  "langchain>=0.3,<0.4" \
  "langchain-core>=0.3,<0.4" \
  "langchain-community>=0.3,<0.4" \
  "langchain-openai>=0.2,<0.3" \
  "llama-cpp-python"

1) We're first going to get a barebones LLM running, and also demonstrate that the context window of a modern LLM tends to be quite large.

a, 3 pts) Look up the size in tokens of the GPT-4o context window.  Compare this to a word count for Spelljammer23.txt (using the wc unix utility, for example), keeping in mind that 1 token is typically 0.75 words.  Does it fit?

**TODO**

b, 4 pts)  Use the example from lecture to create a ChatOpenAI llm that uses GPT-4o as a backend.  Then write a function ask_about_adventure(query, llm) that queries GPT-4o with question *query* with the additional context of the Spelljammer23.txt file, returning the string that is GPT-4o's answer.  Do not use a vector store for this; just dump the whole document in the query.  (Note that if we use remote calls to GPT-4o as our LLM, this assignment should be doable even in free Colab.)

In [None]:
# TODO

In [None]:
ask_about_adventure("What is Kip's class?", llm) # Expect Cleric

2, 10 pts) We're now going to pretend this file is actually too big to fit in the context window, to illustrate how to implement RAG.

Code up a RAG-enabled agent with the help of the examples from lecture, so that our my_rag_app test snippets refer to an object like the RAGApplication defined there.  The text is still Spelljammer23.txt.  Note that the blocks of statistics in the adventure tend to not have periods, so instead of following the lecture example exactly, chunk the text into 1000 character strings that overlap by 100 characters.  Use the same GPT-4o model as the LLM.  Retrieve the 4 best documents on querying the vector store.

In [None]:
# TODO create list of Documents

In [None]:
documents[10] # Expect powers of the magic artifact and some nearby treasure

In [None]:
# TODO create retriever from documents

In [None]:
# TODO create my_rag_app with method run() that can execute queries with retrieved docs as context

In [None]:
my_rag_app.run("What is Kip's character class?").content # Expect Cleric

In [None]:
my_rag_app.run("What is Gardia's brother's name?").content # Expect Cornelius

3, 15 pts) Let's add to the agent the ability to roll dice.  Follow the example at https://python.langchain.com/docs/how_to/custom_tools/ to create a function roll(to_roll) that takes a string "*n*d*s*" (like "2d10"), rolls *n* dice with faces numbered 1 through *s* (like 2 10-sided dice), and returns the total.  Be sure to use the tool decorator and give it a text docstring so that the LLM knows what the function is for.

When you have a basic "nds" roller, make it more robust in the following ways, in case the AI tries to pass weird arguments:

a) Have the docstring contain explicit instructions to not pass any additional modifiers or comments besides NdS, where N is the number of dice and S is the number of sides.

b) Use a regular expression to grab the "nds" part of the argument so that if it says something like "1d20+7" or "1d20 (for the attack)" you can ignore the additional text.

c) If the argument is still unparseable, return the message "Badly formatted input - use just NdS with no modifiers or comments."

In [None]:
my_rag_app.run("Roll 2d10.").content # Expect "I don't know."

In [None]:
# TODO define roll function

In [None]:
print(roll.invoke("3d4")) # Produce random number between 3 and 12
print(roll.invoke("1d20+7 (for stealth check)")) # random number between 1 and 20, ignoring + to keep things simple
print(roll.invoke("my lucky die")) # produce error message

Now create an agent that can use this die-rolling function when necessary.  It doesn't need to have the RAG capabilities for this iteration.

In [None]:
# TODO react template - see lecture

In [None]:
# TODO define Tool, react agent, AgentExecutor - see lecture

In [None]:
agent_executor.invoke({'input': 'Roll 2d10.'}) # expect successful roll

In [None]:
agent_executor.invoke({'input': 'Make a Perception check with +4'}) # it knows some D&D rules already!

4, 15 pts) Let's try putting the previous two steps together.  Give your die-rolling agent RAG capabilities, and add new instructions indicating that the agent is now a "Dungeon Master" narrating the game, and the user is playing the hero Gardia.  (The Dungeon Master role instruction will go a long way toward getting the agent to behave reasonably how we want.)  Create a loop that asks the player what to do next.

Before the player has said anything, use an input of 'Begin the adventure!' and the RAG context should just be the first four chunks of the document.

Note that for AgentExecutor, it always expects just one key in the dictionary for invoke(), so if you want to pass relevant documents for RAG, you need to concatenate them to the user input and pass one big string, like `{'input': input + '\nrelevant_documents: ' + doc_texts}`.  (This is admittedly a little hacky.)

If you have trouble with the AI not giving control back to the user, try asking an AI how you might rewrite the prompt to encourage the AI to stop after one narration.

In [None]:
# TODO new ReAct prompt template

In [None]:
# TODO tool code, largely similar to before

In [None]:
# TODO RAG code, modified; make my_dm a new RAGApplication

Now, this is not going to work fully, as the AI has only a hazy understanding of the rules of Dungeons and Dragons and will not successfully track whose turn it is or how much health enemies have.  We also didn't implement any memory.  Nevertheless, show a reasonable interaction that lasts for at least 5 player inputs where a die is rolled at least once (sneaking and fighting are good for die rolling).

In [None]:
my_input = 'Begin the adventure!'

while(True):
  print(my_dm.run(my_input))
  my_input = input()

5, 10 pts) This would be a little better if the agent had memory.  Modify the class you used to implement `my_dm` so that it has a memory of the last k utterances (let k be a constructor argument) of either user or AI.  When the RAG-augmented string is being constructed for input, also add a string "chat history: " followed by the transcript stored in your object's memory.  Create a new agent where k = 100.

In [None]:
# TODO new RAGApplication class that remembers chat history, appends it to query

Try running this new `my_dm` agent for 5 sessions.  Your fifth input should be "recap the story so far", which should check whether the agent's memory is working properly.

In [None]:
my_input = 'Begin the adventure!'

while(True):
  print(my_dm.run(my_input))
  my_input = input()

6, 9 points) For our last exercise using this file, we'll experiment with chaining LLM outputs.  Create a chain that can take as input a document (like Spelljammer23.txt) and output two good titles for that document.  But do it by chaining LLMs in LangChain:  the first LLM generates a first candidate title, the second generates a second candidate title, and the third rates these according to their originality, how exciting they are, and their faithfulness to the original document, replying with the three scores for each title and the overall winner.  (The ratings don't need to be on any particular scale.)


In [None]:
# TODO

In [None]:
chain.invoke(document)['result'] # Should be the last LLM's output

7, 4 pts) Last question:  Suppose in my prompt I make use of a word that an LLM has never seen in training, like "probiognosis."  Explain how an LLM with byte-pair encoding (BPE) would react to this word, both for the tokenization and the meanings of these tokens.  Compare to what would happen if the LLM instead just tokenized using whitespace and punctuation.

**TODO**

# AI Statement (2 pts)

Please briefly describe whether and how you used generative AI for this assignment.  You will not be penalized for your answer - this is mostly so the course can adapt to AI use.

**TODO**