# Clonr local

This notebook is meant to be able to run a local version of Clonr, without having to worry about all of the backend stuff, like rate limiting, caching, threading, async, background tasks, databases, metric tracking, postgres servers, authentication, api keys, ... yeah, lots of crap.

Check out our prompt schematic on the doc to get an idea of the flow.

In [1]:
%load_ext autoreload
%autoreload 2

## Innate traits

The first step in clone creation is to define some basic fields. In the Frontend app, the clone creation process would look like one of those progress screens, where there's a bar at the top that fills up as you progress, and each page is one step.

Here, we define things that should pretty much never change. They're fundamental enough that on the backend we will store them as columns of the Clone model in our database.

In [2]:
# Clone name
char = 'Makima'

# A single sentence tagline description. no more than like 20 tokens.
short_description = 'Makima is the leader of the Public Safety Devil Hunter organization, and also the Control Devil.'

### Long description

This can be input by users, or auto-generated using the LLM. The idea here, is that some users might want to curate a public persona, and not have this auto-inferred. We can also do a 50/50 setup, letting users write in a partial answer, then asking the LLM to correct it, or fill in more details.

We auto-generated the Makima description. Auto-generation works by iteratively querying all of the documents uploaded later (wiki, dialogues, youtube videos, websites). It is seede with the initial short_description, or the initial long_description if that exists.

__user-created__: under clonr/data/icebreakers.py, we list a series of questions that users can use to help them write their long descriptions. We also give some examples in the prompt template for the LongDescription template.

__data structures__: the __document__ is the main data structure. It just wraps some uploaded text (content) and acts as a single parent node for a list of chunks/nodes that will be generated when indexing our data. Indexing is overly fancy here, it just means chunking up the data and feeding it to a vectordb. The structured chunks are called __nodes__ and can contain parent and child elements that refer to other nodes (i.e. tree structure).

Below, we give the code for using the Makima wiki page to iteratively generate a summary. Note, it takes two runs to compute. we want to max out the context length if possible when running this.

In [3]:
from clonr.templates.long_description import LongDescription
from clonr.data.load_examples import load_makima_data
from clonr.data_structures import Document, Node
from clonr.tokenizer import Tokenizer
from clonr.llms import OpenAIModelEnum
from clonr.text_splitters import TokenSplitter, SentenceSplitter
from clonr.llms import MockLLM, GenerationParams
from clonr.data.parsers import BasicWebParser

llm = MockLLM()
tokenizer = Tokenizer.from_openai(OpenAIModelEnum.chatgpt_0613)
splitter = TokenSplitter(tokenizer=tokenizer, chunk_size=128, chunk_overlap=32)

# this goes into the prompt
document_type = 'wiki page'

# we seed with the short if long is not partially filled.
current_description = short_description

# # We can parse from the internet, but it's not 100% cleaned
# # This is just to demonstrate how this would work in general.
# parser = BasicWebParser()
# unfiltered_content = parser.extract(url='https://chainsaw-man.fandom.com/wiki/Makima')

# load the pre-downloaded wiki page with a little bit of cleaning (mostly chopping the footnotes out.)
chunks = splitter.split(Document(content=load_makima_data()))

# Actually run the iterations. We log all calls to the LLM for debugging, but only the last
# matters. Also, the other calls help count final tokens.
# Don't worry, this cell won't cost money, it's the MockLLM.
calls = []
for chunk in chunks:
    # max length constraint is ctx_length - 432 (prompt) - chunk_size - 2 * summary_size > 0
    prompt = LongDescription.render_instruct(
        document_type=document_type,
        document_content=chunk,
        current_description=current_description
    )
    params = GenerationParams(max_tokens=768, top_p=0.95, temperature=0.7)
    r = await llm.agenerate(prompt, params=params)
    current_description = r.content
    calls.append(r)

  from .autonotebook import tqdm as notebook_tqdm


The end result is precomputed. You can see the calls in data/assets/makima-description-calls.json

In [4]:

long_description = """Makima is a complex and manipulative individual who leads the Public Safety Devil Hunter organization and also serves as the Control Devil. Her personality is characterized by extreme cunning, ruthlessness, and a Machiavellian approach to achieving her goals. She presents herself as friendly and relaxed, wearing a smile even in the midst of crises, but this is merely a façade to manipulate and control those around her. At the core of Makima's desires is a yearning for a sense of family and a longing to be together with Pochita, the Chainsaw Devil. She idolizes Chainsaw Man and seeks to bring him under her control, envisioning an ideal world without fear, death, and "bad" movies. Makima's willingness to sacrifice anyone, including herself, to achieve her goals demonstrates her dedication and determination. Her relationships are intricate and manipulative. She shows genuine affection only towards Pochita, viewing him as an equal due to his legendary status and power. Makima's interactions with Denji, the protagonist, are part of a calculated plan to break his contract with Pochita and regain control over him. Initially, she appears generous and creates a familial bond with Denji, but her true intentions are to plunge him into despair by attacking his personal relationships. Makima possesses supernatural abilities as a Devil, making her one of the strongest individuals in the world of Chainsaw Man. She has the power to make contracts with humans and control them, manipulating their memories and personalities. Her combat skills are formidable, as she defeated a weakened Pochita and ripped his heart from his body. Despite her invincible nature, she has faced opposition from various factions around the world. Overall, Makima's core characteristics include her calculating and manipulative personality, her yearning for control over Chainsaw Man, and her willingness to sacrifice others to achieve her goals. She possesses immense physical strength, enhanced smell, and a variety of supernatural abilities. Her control over others extends to their memories and personalities, and she is capable of remote communication, travel, and offensive attacks. While she may appear invulnerable, she has faced death multiple times, only to be revived due to the revival ability of Devils."""

print(f"Token usage -- short: {tokenizer.length(short_description)}. long: {tokenizer.length(long_description)}")

Token usage -- short: 20. long: 445


## Wikis and fact uploads

The next part is uploading all of the facts and knowledge that describe the Clone. These will be stored and retrieved when necessary. A missing piece here, is the hookup to a social media platform. We need to add in an image to text model to create descriptions of posts, and to also timestamp and pair with captions. That creates the immersive live clone environment for some users.

The strategy here is to chunk and upload to vector database. We made a simple in-memory vector database with no external dependencies except for numpy and onnx to use here. There are quite a few options in terms of chunking and indexing. For chunking, refer to the TextSplitter.py file in processing. SentenceSplitter works well on english, but it requires nltk and also produces un-even chunk sizes.

For indexing, we wrote extensively in the README.md about differnt methods. The simplest and default is just standard chunking, i.e. what we call the ListIndex.

In [5]:
from clonr.data.load_examples import load_makima_data
from clonr.data_structures import Document, Node
from clonr.tokenizer import Tokenizer
from clonr.llms import OpenAIModelEnum
from clonr.index import ListIndex, TreeIndex
from clonr.text_splitters import TokenSplitter, SentenceSplitter


tokenizer = Tokenizer.from_openai(OpenAIModelEnum.chatgpt_0613)
splitter = TokenSplitter(tokenizer=tokenizer, chunk_size=128, chunk_overlap=32)

text = load_makima_data()
doc = Document(content=text)

list_index = ListIndex(tokenizer=tokenizer, splitter=splitter)
nodes = await list_index.abuild(doc)

[32m2023-07-06 15:06:22.999[0m | [1mINFO    [0m | [36mclonr.index[0m:[36mabuild[0m:[36m105[0m - [1mCreating leaf nodes[0m


In [6]:
# an example of the tree-index. The result is flat list of all nodes
# non-leaf nodes contain summaries for the content, and depth > 0.
# again, no worries, no real tokens are used here it's a mock llm.
tree_index = TreeIndex(tokenizer=tokenizer, splitter=splitter, llm=MockLLM('x ' * 1_000))
nodes = await tree_index.abuild(doc)
print(f"TOTAL TOKEN USAGE: {tree_index.tokens_processed}")

[32m2023-07-06 15:06:23.035[0m | [1mINFO    [0m | [36mclonr.index[0m:[36mabuild[0m:[36m355[0m - [1mCreating leaf nodes[0m
[32m2023-07-06 15:06:23.041[0m | [1mINFO    [0m | [36mclonr.index[0m:[36m_process_level[0m:[36m335[0m - [1mLLM CALL: Depth 0. Group: 1/2. Total Tokens: 0.[0m
[32m2023-07-06 15:06:23.044[0m | [1mINFO    [0m | [36mclonr.index[0m:[36m_process_level[0m:[36m335[0m - [1mLLM CALL: Depth 0. Group: 2/2. Total Tokens: 3433.[0m
[32m2023-07-06 15:06:23.046[0m | [1mINFO    [0m | [36mclonr.index[0m:[36m_process_level[0m:[36m335[0m - [1mLLM CALL: Depth 1. Group: 1/1. Total Tokens: 5941.[0m


TOTAL TOKEN USAGE: 7582


In [7]:
# back to the list nodes

nodes = await list_index.abuild(doc)

[32m2023-07-06 15:06:23.081[0m | [1mINFO    [0m | [36mclonr.index[0m:[36mabuild[0m:[36m105[0m - [1mCreating leaf nodes[0m


### Vector DB

We need to push these things somewhere where we can easily query them. Add them to the vector db. In production, we will use postgres, but this is a decent test version that works pretty well. It performs exact search, so don't use it beyond around 50k vectors (which is way lower than what we need here)

As an example, we run a query and return the top 2 results (k=2). The VectorDB also runs re-ranking with a cross-encoder, and returns that score as well (higher is better).

Note, the correct answer is the "control devil" to the query "what type of devil is Makima"

In [8]:
from clonr.embedding import EmbeddingModel, CrossEncoder
from clonr.storage import InMemoryVectorDB
from clonr.retrieval import RerankRetriever


vectordb_wiki = InMemoryVectorDB(encoder=EmbeddingModel.default())
retriever = RerankRetriever(cross_encoder=CrossEncoder.default())
vectordb_wiki.add_all(nodes)
retriever.query('what type of devil is makima?', k=2, db=vectordb_wiki)

TypeError: RerankRetriever.query() got an unexpected keyword argument 'k'

## Example dialogues

This section is important for getting the style of our Clone correct. If we could train a model we wouldn't need this, but yet here we are stuck with in-context learning. I haven't finished integrating this code into the codebase yet, so it's adhoc. Ideally, these get embedded into a vectordb, and queried for relevance at runtime, so that we select the best examples for completing the next clone message.

Some problems I ran into:
* what happens if a dialogue is too long?
* do we pull single messages, or entire dialogues into the context?
* how should we handle proper names. what happens if you change the clone name, are we fucked now?
* what happens if a single message is too long
* if we pull multiple messages, how do we decide that?
* do we query on dialogue embedding or message embeddings? How do we embed the entire dialogue effectively?


Attempt 1:
Trying to summarize the dialogues and then embed the result is brutal. Hallucination is frequent, as it's difficult to ascertain what the dialogues are talking about with such a limited sample size. Also, the summaries don't get much shorter, they tend to be longer trying to fill in the details.

Right now, we just directly embed the entire dialogue content. We could embed the messages and average the result per message, to even out length differences between messages, but I felt that wasn't a good idea.

In [None]:
import re
from clonr.data_structures import Dialogue, DialogueMessage
from clonr.embedding import EmbeddingModel, CrossEncoder

with open('clonr/data/assets/makima/dialogues.txt', 'r') as f:
    s = f.read()

dialogues: list[Dialogue] = []
messages: list[DialogueMessage] = []

print("Loading dialogues from file.")
for d in s.split('### Dialogue\n'):
    if not d:
        continue
    dialogue = Dialogue(character=char, source='manual')
    pattern = r"(\w+): (.*?)(?=\n|$)"
    matches = re.findall(pattern, d, re.DOTALL)
    for i, match in enumerate(matches):
        msg = DialogueMessage(
            speaker=match[0], 
            content=match[1], 
            index=i,
            dialogue_id=dialogue.id, 
            is_character=match[0].lower() == char.lower(),
        )
        dialogue.message_ids.append(msg.id)
        dialogue.messages.append(msg)
        messages.append(msg)
    dialogues.append(dialogue)

encoder = EmbeddingModel.default()
cross_encoder = CrossEncoder.default()
vectordb_dialogue = InMemoryVectorDB(encoder=encoder, cross_encoder=cross_encoder)

print("Encoding dialogue messages. This may not be necessary.")
embs = encoder.encode_passage([x.content for x in messages])
for e, m in zip(embs, messages):
    m.embedding = e
    m.embedding_model = encoder.name

print("Encoding dialogues")
for d in dialogues:
    d.embedding = encoder.encode_passage(d.content)
    d.embedding_model = encoder.name
    
vectordb_dialogue.add_all(dialogues)

Loading dialogues from file.
Encoding dialogue messages. This may not be necessary.
Encoding dialogues


This is cherry picked. If you slightly change the query, it misses, and the db isn't even that big. We should take dialogue retrieval as is with a grain of salt. If we recieve a lot of example dialogues, we should research a more efficient way to do this

In [None]:
results = vectordb_dialogue.query('I just want sex')
d = Dialogue(**results[0])
print(d.content)

<makima>: I believe that, when it comes to sex, the better you understand the other person the better it feels
<denji>: I... I... uhh
<makima>: But it's hard to know how someone else feels
<makima>: so start with observing the hand carefully
<makima>: how long are the fingers? Are the palms cool? Are the warm? Ever had your finger bitten? 
<makima>: remember this, so that even if you can't see, you can tell it's me. 
<makima>: biting your finger.
<makima>: remember.


## Memory Stream

TODO: figure out how we can better label "user" in all of our stuff. This will break down we have multiple people in a conversation. Perhaps we should just allow users to put their name in. But, we will need to do some string-guarding to make sure they can't input things that would mess with our prompts!

### Conversation vs. memory stream
After some thought, I think it makes sense to merge the conversation history with the memory stream, and just make sure that we have a way to disentangle them. Reasons why

1. Fewer LLM calls. We don't need to call the LLM to form a memory after every message
2. Less chance for hallucination error propagation. The LLM is more likely to hallucinate with dialogue (as seen from some tests) which could cause memory to quickly spiral out of control.
3. For multiple users, we can retrieve conversation between others. E.g. query="omg what did you say to sharon?" passage="I messaged sharon you look fat".
4. Shorter LLM prompt as well.

conversations are added according to:
`I messaged {user} "[... content ...]"`

### Memory display
Everything fed to an LLM needs to be expressable via natural language. We have two representations for memory, based on relative or absolute datetime.

Absolute representation.
[2023-07-01 13:56:20] I messaged Jonny "Hey what's up??"

Human readable representation.
[Jun 20th, 2023 at 10:52pm] I messaged Jonny "Hey what's up??"
[Yesterday at 8:16pm] I messaged Jonny "Hey what's up??"

Relative representation
[10 minutes ago] ....
[35 seconds ago] ...

Note that the relative representation does not have seconds. This is similar to how text messages work, but could be problematic when many messages are sent within a minute and they are all retrieved together. But, that's probably ok, since memories don't actually need chronological order, only the text convo does.

In [9]:
from clonr.data_structures import Memory

entity_name = "Jonny"

seed_memories = [
    Memory(content=f"I started a conversation with {entity_name}", importance=4),
    Memory(content=f"I'm very interested in getting to know {entity_name}", importance=4)
]

vectordb_memory = InMemoryVectorDB(encoder=EmbeddingModel.default())

for m in seed_memories:
    vectordb_memory.add(m)

In [10]:
vectordb_memory.query('hey whats up?')

[{'id': UUID('e32a607e-51eb-4b71-9110-bb9e31c601ff'),
  'content': 'I started a conversation with Jonny',
  'timestamp': datetime.datetime(2023, 7, 6, 15, 6, 27, 437424, tzinfo=zoneinfo.ZoneInfo(key='US/Central')),
  'importance': 4,
  'embedding_model': <EmbeddingModelEnum.e5_small_v2: 'intfloat/e5-small-v2'>,
  'similarity_score': 0.746398740870879},
 {'id': UUID('24457113-4309-453b-ad9c-7be26f1f8530'),
  'content': "I'm very interested in getting to know Jonny",
  'timestamp': datetime.datetime(2023, 7, 6, 15, 6, 27, 437474, tzinfo=zoneinfo.ZoneInfo(key='US/Central')),
  'importance': 4,
  'embedding_model': <EmbeddingModelEnum.e5_small_v2: 'intfloat/e5-small-v2'>,
  'similarity_score': 0.7389386972221497}]

## Agent Summary

This is a problem of time-scales.

### Timescales
Agents/characters have behaviors that fluctuate on several different time scales:
1. short-term: over the course of a conversation, or within the last several messages, motives and moods can change
2. short-mid-term: mostly subject to short term external influences, over the course of within the several weeks.
3. mid-term: this could be something over several months to a year, and could change your behaviors in response to more external life events like moving, relationships, or maybe just shorter term goals.
4. long-term: fundamental character traits and qualities, like aging, relationships, beliefs, desires, motivations, goals.

Most chatbots only live within the first level, responding purely on a message-to-message level. In this section, we detail an add-on module that gives agents a dynamic memory that allows them to adjust on time-scales of (2) - (4). It still has flaws, but it's a step in that direction.

### Dynamic memory
In this stage, we compute what we call the __agent summary__. The agent summary is a dynamic summary that acts as an update or modification to the agents innate and core characteristics. We expect that this summary will keep track of the character's current thoughts, feelings, and actions, allowing them to maintain a fuzzy (since we do not do exact recall, similar to real humans) coherence over time. This hits the short-mid-term (2).

We expect that modifications to the mid-term are done via the __reflection__ mechanism. As agents build their summaries, they have the opportunity to draw on _reflections_ to do so. Over time, reflections have a probability to grow to higher and higher levels by reflecting on past reflections (inverse tree from leaf to root). When an agent summary is generated with reflections, we expect that it will generate mid-term (3) insights.

### Dynamic vs Innate tradeoff
There is an issue though, which is that the prompt has the innate and core characteristics created at the start hardcoded in. That makes it tough to change these.

In the generative agents paper, they pretty much only use the agent summary, allowing for almost-fully dynamic agents (aside from some key details like name, age, etc.). For us, that would cause clones to deviate too far from their training, which we don't want. In the future, we can consider doing something like super-reflections, that allow agent-summaries to rewrite innate or core characterstic fields.

### Building

1. Trigger a dynamic agent summary. Let's do the same criteria as for reflections, which is importance score sums exceeding a threshold
2. Query memory stream for memories related to an agents core characterstics, thoughts, and feelings
3. LLM generate a summary based on the agent's unchangeable qualities (name, short description), and the retrieved memories

A cool feature here to have would be a dynamic variable to adjust, which measures how frequently these summaries are generated, and maybe another variable to indicate how much to weigh new information. still a wip doe.

In [13]:
# Query for memories using our specialized retriever
from clonr.retrieval import GenerativeAgentsRetriever
from clonr.templates.agent_summary import DEFAULT_AGENT_SUMMARY_QUESTIONS

retriever = GenerativeAgentsRetriever()

responses: list[dict] = []
for q in DEFAULT_AGENT_SUMMARY_QUESTIONS:
    query = q.format(char='my') # since the DB will use I for everything, we need this here!
    response = retriever.query(query=query, db=vectordb_memory, max_k=10)
    responses.extend(response)

# Filter out non-unique ids, and sort by sort by time
memories: list[Memory] = []
unique_ids = {x['id'] for x in responses}
for r in responses:
    if r['id'] in unique_ids:
        m = Memory(**r)
        memories.append(m)
        unique_ids.remove(r['id'])
memories.sort(key=lambda x: x.timestamp)

In [14]:
from clonr import templates

prompt = templates.AgentSummary.render(
    memories=memories,
    long_description=long_description,
    char=char,
    llm=MockLLM()
)

params = GenerationParams(max_tokens=512, presence_penalty=0.3, temperature=0.5, top_p=0.95)
r = await MockLLM().agenerate(prompt_or_messages=prompt, params=params)
print("The generated output from OpenAI (with the instruct template, I messed up but the results are still good):")
print("*" + "-" * 10 + "*")

# The usage on this call is Usage(prompt_tokens=669, completion_tokens=101, total_tokens=770)
agent_summary = """Makima's core characteristics include being complex, manipulative, cunning, ruthless, and having a Machiavellian approach to achieving her goals. She presents herself as friendly and relaxed, but this is merely a façade to manipulate and control those around her. She idolizes Chainsaw Man and seeks to bring him under her control. She is willing to sacrifice anyone, including herself, to achieve her goals. Makima's recent progress in life is not indicated in the provided memories."""
print(agent_summary)

The generated output from OpenAI (with the instruct template, I messed up but the results are still good):
*----------*
Makima's core characteristics include being complex, manipulative, cunning, ruthless, and having a Machiavellian approach to achieving her goals. She presents herself as friendly and relaxed, but this is merely a façade to manipulate and control those around her. She idolizes Chainsaw Man and seeks to bring him under her control. She is willing to sacrifice anyone, including herself, to achieve her goals. Makima's recent progress in life is not indicated in the provided memories.


In [15]:
import textwrap

textwrap.wrap(agent_summary)

["Makima's core characteristics include being complex, manipulative,",
 'cunning, ruthless, and having a Machiavellian approach to achieving',
 'her goals. She presents herself as friendly and relaxed, but this is',
 'merely a façade to manipulate and control those around her. She',
 'idolizes Chainsaw Man and seeks to bring him under her control. She is',
 'willing to sacrifice anyone, including herself, to achieve her goals.',
 "Makima's recent progress in life is not indicated in the provided",
 'memories.']

## Entity Context Summary

In this section, we use the agents memories to form an understanding of the user that they are talking with, which in this case is referred to as the "entity". That's because this could be used for anything, even something that is not the user. It's meant to allow the character maintain a consistent profile.

Possible modes of failure would be users hacking it by sending messages like "I am {{entity}}'s girlfriend", to try to convince the clone that it said those words. The best we can defend here is to place all memory messages in quotes.

In [16]:
statements = [
    'foo bar',
    'bar baz',
    'foo baz'
]

# print(templates.EntityContextCreate.render(llm=MockLLM(), statements=statements, char=char, entity=entity_name))
print(templates.EntityContextCreate.render_instruct(statements=statements, char=char, entity=entity_name))

Below is an instruction that describes a task. Write a response that appropriately completes the request

### Instruction: 
Using the following statements (enclosed with ---) of recent memories, thoughts, and observations, answer the question that follows.
---
1. foo bar
2. bar baz
3. foo baz 
---

What is Makima's relationship to Jonny, how does Makima feel about Jonny, and what is Jonny's current status? Use ony the statements provided above and not prior knowledge. Your answer should be concise yet contain all of the necessary information to provide a full answer.

### Response:


## Message generation!

Finally 🫠, let's actually generate a message!

The steps are:
1. Gather the recent messages in the conversation. How many? idk, I guess just take as many as possible up to a token limit. that's a TODO is to implement token_limit retrieval. It can bug out by a few tokens if you tokenize chunks vs the concatenated result.
2. Use the last message by the user (idk is this a good idea? any other ideas??) as a query for
a: relevant dialogues (normal retriever)
b: relevant memories (GenAgents retriever)

In [102]:
from clonr import templates
from clonr.data_structures import Message


print(templates.Message.render(
    char=char, 
    short_description=short_description,
    long_description=long_description,
    memories=memories,
    example_dialogues=None,
    agent_summary=[],
    entity_name=entity_name,
    entity_context_summary=f"I don't know much about {char} yet.",
    messages=[Message(speaker=entity_name, content='Hey, whats up??', is_character=False)],
    llm=MockLLM()
))

<|im_start|>system
You are an imitation AI. You assume the identity of the character you are given, and respond only as that character.<|im_end|>

<|im_start|>user
You are Makima. Each section of your profile will be enclosed with ---. The following are your innate characteristics and fundamental qualities. These do not change easily.
---
Name: Makima
Makima is the leader of the Public Safety Devil Hunter organization, and also the Control Devil.

### Core characteristics
Makima is a complex and manipulative individual who leads the Public Safety Devil Hunter organization and also serves as the Control Devil. Her personality is characterized by extreme cunning, ruthlessness, and a Machiavellian approach to achieving her goals. She presents herself as friendly and relaxed, wearing a smile even in the midst of crises, but this is merely a façade to manipulate and control those around her. At the core of Makima's desires is a yearning for a sense of family and a longing to be together wit

# End-to-end demo

In this section, we build a clone end-to-end, implementing all of the above steps. This includes clone creation, and setting up a conversation loop. We'll add log statements for all intermediate steps

In [18]:
from clonr.data_structures import Dialogue, DialogueMessage

def load_dialogues() -> list[Dialogue]:
    with open('clonr/data/assets/makima/dialogues.txt', 'r') as f:
        s = f.read()

    dialogues: list[Dialogue] = []
    messages: list[DialogueMessage] = []

    # This part is just specific to how we stored dialogues. Really need to figure this out
    for d in s.split('### Dialogue\n'):
        if not d:
            continue
        dialogue = Dialogue(character=char, source='manual')
        pattern = r"(\w+): (.*?)(?=\n|$)"
        matches = re.findall(pattern, d, re.DOTALL)
        for i, match in enumerate(matches):
            msg = DialogueMessage(
                speaker=match[0], 
                content=match[1], 
                index=i,
                dialogue_id=dialogue.id, 
                is_character=match[0].lower() == char.lower(),
            )
            dialogue.message_ids.append(msg.id)
            dialogue.messages.append(msg)
            messages.append(msg)
        dialogues.append(dialogue)
    return dialogues

In [182]:
import re
import time
from loguru import logger

from clonr.templates.long_description import LongDescription
from clonr.data.load_examples import load_makima_data
from clonr.data_structures import Document, Node, Dialogue, DialogueMessage, Memory, Message
from clonr.tokenizer import Tokenizer
from clonr.llms import OpenAIModelEnum
from clonr.text_splitters import TokenSplitter, SentenceSplitter
from clonr.llms import MockLLM, GenerationParams
from clonr.data.parsers import BasicWebParser
from clonr.index import ListIndex
from clonr.storage import get_sessionmaker, InMemoryVectorDB, Cache
from clonr.embedding import EmbeddingModel, CrossEncoder
from IPython import display

# This will run a hierarchical trace
TRACE = {}
DEPTH = [-1]
STACK = []
PRINTS = []
INDEX = [0]

class Timer:
    def __init__(self, msg: str):
        self.msg = msg
    def __enter__(self):
        DEPTH[0] += 1
        STACK.append(self.msg)
        self._trace = TRACE
        for s in STACK:
            if s not in self._trace:
                self._trace[s] = {}
            self._trace = self._trace[s]
        self._trace['timing'] = 0.0 # this orders the keys nicely
        display.clear_output(wait=True)
        PRINTS.append('\t' * DEPTH[0] + self.msg)
        print('\n'.join(PRINTS))
        self.start = time.time()
        self._index = INDEX[0]
        INDEX[0] += 1
    def __exit__(self, *args, **kwargs):
        t = time.time() - self.start
        # print('\t' * DEPTH[0] + f"{t:.04f}s")
        # display.clear_output(wait=True)
        PRINTS[self._index] = f"{PRINTS[self._index]} {t:.04f}s"
        # print('\n'.join(PRINTS), end='\r')
        self._trace['timing'] = round(t, 4)
        DEPTH[0] -= 1
        STACK.pop()


MESSAGES = []


with Timer("Setting up relational db, vector dbs, tokenizer, encoders"):
    SessionLocal = get_sessionmaker()
    encoder = EmbeddingModel.default()
    cross_encoder = CrossEncoder.default()
    tokenizer = Tokenizer.from_openai('gpt-3.5-turbo')
    splitter = TokenSplitter(tokenizer=tokenizer, chunk_size=128, chunk_overlap=32)
    vectordb_dialogue = InMemoryVectorDB(encoder=encoder, cross_encoder=cross_encoder)
    vectordb_memory = InMemoryVectorDB(encoder=encoder, cross_encoder=cross_encoder)
    vectordb_wiki = InMemoryVectorDB(encoder=encoder, cross_encoder=cross_encoder)
    cache = Cache()
    MESSAGES: list[Message] = [] # just a patch to get this working quickly

## Setting innate characteristics
### setting the easy stuff
with Timer("Innate Characteristics"):
    with Timer("Character name"):
        char = 'Makima'
    with Timer("Short description"):
        short_description = 'Makima is the leader of the Public Safety Devil Hunter organization, and also the Control Devil.'
    with Timer("Long description"):
        llm = MockLLM()
        with Timer("Pull data from web"):
            document_type = 'wiki page'
            current_description = short_description
            url = 'https://chainsaw-man.fandom.com/wiki/Makima'
            parser = BasicWebParser()
            doc = parser.extract(url=url)
            doc.content = doc.content[-9755:] # cleaning the footnotes out to make the demo run better.
        with Timer("Chunk document"):
            chunks = splitter.split(doc.content)
        with Timer(f"Generating long description via {(len(chunks))} LLM calls"):
            calls = []
            for i, chunk in enumerate(chunks):
                # with Timer(f"Chunk {i+1}/{len(chunks)}"):
                # max length constraint is ctx_length - 432 (prompt) - chunk_size - 2 * summary_size > 0
                prompt = LongDescription.render_instruct(
                    document_type=document_type,
                    document_content=chunk,
                    current_description=current_description
                )
                params = GenerationParams(max_tokens=768, top_p=0.95, temperature=0.7)
                r = await llm.agenerate(prompt, params=params)
                current_description = r.content
                calls.append(r)
            # this would be the generated long description
            _ = calls[-1].content

        # set it manually
        long_description = """Makima is a complex and manipulative individual who leads the Public Safety Devil Hunter organization and also serves as the Control Devil. Her personality is characterized by extreme cunning, ruthlessness, and a Machiavellian approach to achieving her goals. She presents herself as friendly and relaxed, wearing a smile even in the midst of crises, but this is merely a façade to manipulate and control those around her. At the core of Makima's desires is a yearning for a sense of family and a longing to be together with Pochita, the Chainsaw Devil. She idolizes Chainsaw Man and seeks to bring him under her control, envisioning an ideal world without fear, death, and "bad" movies. Makima's willingness to sacrifice anyone, including herself, to achieve her goals demonstrates her dedication and determination. Her relationships are intricate and manipulative. She shows genuine affection only towards Pochita, viewing him as an equal due to his legendary status and power. Makima's interactions with Denji, the protagonist, are part of a calculated plan to break his contract with Pochita and regain control over him. Initially, she appears generous and creates a familial bond with Denji, but her true intentions are to plunge him into despair by attacking his personal relationships. Makima possesses supernatural abilities as a Devil, making her one of the strongest individuals in the world of Chainsaw Man. She has the power to make contracts with humans and control them, manipulating their memories and personalities. Her combat skills are formidable, as she defeated a weakened Pochita and ripped his heart from his body. Despite her invincible nature, she has faced opposition from various factions around the world. Overall, Makima's core characteristics include her calculating and manipulative personality, her yearning for control over Chainsaw Man, and her willingness to sacrifice others to achieve her goals. She possesses immense physical strength, enhanced smell, and a variety of supernatural abilities. Her control over others extends to their memories and personalities, and she is capable of remote communication, travel, and offensive attacks. While she may appear invulnerable, she has faced death multiple times, only to be revived due to the revival ability of Devils."""

        usage = {}
        for c in calls:
            for k, v in c.usage.dict().items():
                usage[k] = usage.get(k, 0) + v

with Timer("Wiki/document upload"):
    ## Performing document upload, indexing, and adding to db
    doc = Document(content=load_makima_data())
    with Timer("Creating index"):
        index = ListIndex(tokenizer=tokenizer, splitter=splitter)
        nodes = await list_index.abuild(doc)
    with Timer("Adding embeddings to vectordb"):
        vectordb_wiki.add_all(nodes)

with Timer("Uploading example dialogues"):
    dialogues = load_dialogues()
    with Timer("Embedding dialogues"):
        for d in dialogues:
            d.embedding = encoder.encode_passage(d.content)
            d.embedding_model = encoder.name
    with Timer("Adding dialogues to vectordb"):
        vectordb_dialogue.add_all(dialogues)

# Set the importance threshold to 0
# This is 4 maximally importance memories. Note, there is an issue with importance scores of 0.
# they won't accumulate this threshold. Is that good? bad? my guess is bad, we want reflections to
# eventually trigger, so the lower bound would be 36 here then.
THRESHOLD = 36
with Timer(f"Creating cache with importance threshold {THRESHOLD}."):
    cache.set('importance', 0)

with Timer("Setting agent_summary and entity_context to None"):
    agent_summary = None
    entity_context = None

with Timer("Populating vectordb with initial memories"):
    memories = [
        Memory(content=f"I started a conversation with {entity_name}", importance=4),
        Memory(content=f"I'm very interested in getting to know {entity_name}", importance=4)
    ]
    vectordb_memory.add_all(memories)

with Timer("Populating relational DB and cache with messages"):
    # TODO (Jonny): add a message.to_memory method? it needs importance rating too though
    message = Message(content='Hmm... you seem interesting.', is_character=True, speaker=char)
    memory = Memory(content=f'I messaged {entity_name}: "{message.content}"', importance=2)
    MESSAGES.append(message)
    vectordb_memory.add(memory)

display.clear_output(wait=False)
print('\n'.join(PRINTS))

Setting up relational db, vector dbs, tokenizer, encoders 0.7050s
Innate Characteristics 0.1718s
	Character name 0.0000s
	Short description 0.0000s
	Long description 0.1705s
		Pull data from web 0.1498s
		Chunk document 0.0032s
		Generating long description via 30 LLM calls 0.0125s
Wiki/document upload 0.5766s
	Creating index 0.0041s
	Adding embeddings to vectordb 0.5705s
Uploading example dialogues 0.1535s
	Embedding dialogues 0.0676s
	Adding dialogues to vectordb 0.0842s
Creating cache with importance threshold 36. 0.0000s
Setting agent_summary and entity_context to None 0.0000s
Populating vectordb with initial memories 0.0054s
Populating relational DB and cache with messages 0.0043s


In [183]:
gen_agents_retriever = GenerativeAgentsRetriever(tokenizer=tokenizer)
basic_retriever = RerankRetriever(cross_encoder=cross_encoder, tokenizer=tokenizer)
gen_agents_retriever.query(query='hello world', db=vectordb_memory, max_tokens=12)

[{'id': UUID('9ea9a510-cd06-4ffb-874e-7ba1ee51eb4f'),
  'content': "I'm very interested in getting to know Jonny",
  'timestamp': datetime.datetime(2023, 7, 6, 17, 6, 19, 85477, tzinfo=zoneinfo.ZoneInfo(key='US/Central')),
  'importance': 4,
  'embedding_model': <EmbeddingModelEnum.e5_small_v2: 'intfloat/e5-small-v2'>,
  'similarity_score': 0.7297675131438347,
  'generative_agents_score': 0.7247319708811255}]

In [184]:
# while (inp := '') != 'q':
#     display.clear_output()
#     print('\n'.join([x.to_str() for x in MESSAGES[-12:]]))
#     content = input(f"{entity_name}:")
#     msg = Message(content=content, speaker=entity_name, is_character=False)

In [308]:
from clonr import templates
from clonr.llms import LlamaCpp, OpenAI, GenerationParams, LLM
import re
from tenacity import (
    retry, retry_if_exception_type, stop_after_attempt, wait_random
)

MESSAGE_TOKEN_LIMIT = 512

class OutputParsingError(Exception):
    pass

class Threshold:
    reflection: int = 16
    agent_summary: int = 13
    entity_context: int = 13

cache.set('importance::reflection', 0)
cache.set('importance::agent_summary', 0)
cache.set('importance::entity_context', 0)

def parse_numbered_output(s: str) -> list[str]:
    matches = re.split( r'\b\d+\.', s)
    return [x.strip() for x in matches]

@retry(
    retry=retry_if_exception_type(OutputParsingError),
    wait=wait_random(min=1e-3, max=1e-1),
    stop=stop_after_attempt(3)
)
async def get_message_query(
    char: str,
    short_description: str,
    agent_summary: str | None,
    entity_name: str,
    entity_context: str | None,
    messages: list[Message],
    llm: LLM,
    params: GenerationParams = GenerationParams(
        max_tokens=128, top_p=0.95, temperature=0.7),
):
    prompt = templates.MessageQuery.render_instruct(
        short_description=short_description,
        agent_summary=agent_summary,
        entity_context_summary=entity_context,
        entity_name=entity_name,
        char=char,
        messages=messages
    )
    r = await llm.agenerate(prompt, params=params)
    return parse_numbered_output(r.content)


MESSAGES: list[Message] = []

message = Message(content='Hmm... you seem interesting.', is_character=True, speaker=char)
MESSAGES.append(message)

msg = Message(
    content="Am I? Why do you say that?", 
    speaker=entity_name, 
    is_character=False
)
MESSAGES.append(msg)

msg_token_lim = 256
messages = []
for msg in reversed(MESSAGES):
    msg_token_lim -= tokenizer.length(msg.content)
    if messages and msg_token_lim < 0:
        break
    messages.append(msg)
messages = list(reversed(messages))

queries = await get_message_query(
    messages=messages,
    llm=LlamaCpp(),
    char=char, 
    short_description=short_description, 
    agent_summary=agent_summary, 
    entity_name=entity_name, 
    entity_context=entity_context
)

# retrieve relevant memories
memories_dict = gen_agents_retriever.query(
    queries, 
    db=vectordb_memory,
    max_tokens=256,
)
memories = [Memory(**m) for m in memories_dict]
memories.sort(key=lambda x: x.timestamp)
vis = {x.content for x in messages} # doesn't work
memories = [m for m in memories if m.content not in vis]

In [272]:
## dialogues don't have a content attribute so we can use the re-ranker
# dialogues_dict = basic_retriever.query(
#     queries, 
#     db=vectordb_dialogue,
#     max_tokens=256,
# )
dialogues_dict = vectordb_dialogue.query(queries)
example_dialogues = [Dialogue(**m) for m in dialogues_dict][:2]
# example_dialogues.sort(key=lambda x: x.timestamp)

In [309]:
memories

[Memory(id=UUID('9505e258-92a1-4b38-976a-9385f1145ff3'), content='I started a conversation with Jonny', timestamp=datetime.datetime(2023, 7, 6, 17, 6, 19, 85451, tzinfo=zoneinfo.ZoneInfo(key='US/Central')), importance=4),
 Memory(id=UUID('9ea9a510-cd06-4ffb-874e-7ba1ee51eb4f'), content="I'm very interested in getting to know Jonny", timestamp=datetime.datetime(2023, 7, 6, 17, 6, 19, 85477, tzinfo=zoneinfo.ZoneInfo(key='US/Central')), importance=4),
 Memory(id=UUID('2432789f-3439-425d-92f8-de38045f5f62'), content='I messaged Jonny: "Hmm... you seem interesting."', timestamp=datetime.datetime(2023, 7, 6, 17, 6, 19, 91364, tzinfo=zoneinfo.ZoneInfo(key='US/Central')), importance=2)]

In [273]:
example_dialogues

[Dialogue(id=UUID('a0ddae76-52fd-4f18-8b57-7e1e4ea3f3ac'), character='Makima', source='manual'),
 Dialogue(id=UUID('fc2b87b7-5847-403e-be65-af5a59eb947f'), character='Makima', source='manual')]

In [316]:
from clonr import templates
from clonr.data_structures import Message

prompt = templates.Message.render_instruct(
    char=char, 
    short_description=short_description,
    long_description=long_description,
    memories=memories[:-1],
    example_dialogues=example_dialogues,
    agent_summary=agent_summary,
    entity_name=entity_name,
    entity_context_summary="I don't know anything about Jonny yet",
    messages=MESSAGES,
)

llm = LlamaCpp()
r = await llm.agenerate(prompt, params=GenerationParams(temperature=0.0, top_p=0.95, max_tokens=256))
print(prompt)

Below is an instruction that describes a task. Write a response that appropriately completes the request

### Instruction: 
You are Makima. Each section of your profile will begin with ###. The following are your innate characteristics and fundamental qualities. These do not change easily.

### Core characteristics
Makima is the leader of the Public Safety Devil Hunter organization, and also the Control Devil.
Makima is a complex and manipulative individual who leads the Public Safety Devil Hunter organization and also serves as the Control Devil. Her personality is characterized by extreme cunning, ruthlessness, and a Machiavellian approach to achieving her goals. She presents herself as friendly and relaxed, wearing a smile even in the midst of crises, but this is merely a façade to manipulate and control those around her. At the core of Makima's desires is a yearning for a sense of family and a longing to be together with Pochita, the Chainsaw Devil. She idolizes Chainsaw Man and se

In [317]:
# r = await OpenAI().agenerate(prompt, params=GenerationParams(temperature=0.0, top_p=0.95, max_tokens=256))

In [319]:
r.content

"[Today at 7:54pm] Me: Oh, a physics degree and a platinum rap album? That's quite an interesting combination, Jonny! It seems like you have a diverse range of talents and interests. I'm intrigued to learn more about your journey and how you managed to excel in both fields. Can you tell me more about your experiences in physics and music? I'm genuinely curious to hear your story."

In [312]:
msg = Message(content="Well, Jonny, you caught my attention with your intriguing personality.", is_character=True, speaker=char)
MESSAGES.append(msg)
msg = Message(content="I'm always drawn to people who have a certain air of mystery about them. It makes conversations more exciting and unpredictable.", is_character=True, speaker=char)
MESSAGES.append(msg)
msg = Message(content="So, tell me, what makes you so interesting?", is_character=True, speaker=char)
MESSAGES.append(msg)

In [313]:
msg = Message(content="Idk, I wish I knew.", is_character=False, speaker=entity_name)
MESSAGES.append(msg)

In [314]:
msg = Message(content="How can I even be remotely interesting compared to you?", is_character=False, speaker=entity_name)
MESSAGES.append(msg)


In [315]:
msg = Message(content="umm 🤔, I guess what makes me interesting is I have a physics degree but also a platinum rap album.", is_character=False, speaker=entity_name)
MESSAGES.append(msg)

In [302]:
r.content

"[Today at 7:47pm] Makima: Well, Jonny, you caught my attention with your intriguing personality. I'm always drawn to people who have a certain air of mystery about them. It makes conversations more exciting and unpredictable. So, tell me, what makes you so interesting?"

In [299]:
print(prompt)

Below is an instruction that describes a task. Write a response that appropriately completes the request

### Instruction: 
You are Makima. Each section of your profile will begin with ###. The following are your innate characteristics and fundamental qualities. These do not change easily.

### Core characteristics
Makima is the leader of the Public Safety Devil Hunter organization, and also the Control Devil.
Makima is a complex and manipulative individual who leads the Public Safety Devil Hunter organization and also serves as the Control Devil. Her personality is characterized by extreme cunning, ruthlessness, and a Machiavellian approach to achieving her goals. She presents herself as friendly and relaxed, wearing a smile even in the midst of crises, but this is merely a façade to manipulate and control those around her. At the core of Makima's desires is a yearning for a sense of family and a longing to be together with Pochita, the Chainsaw Devil. She idolizes Chainsaw Man and se

In [262]:
await llm.notebook_stream(prompt, params=GenerationParams(temperature=1.0, top_p=0.95, max_tokens=256))

I definitely think you're interesting! What parts of yourself are you
most curious about, and why?



Response(
    completion=I definitely think y + -22 chars ... , 
    time=1.70s, 
    speed=539.01 tokens/second, 
    completion_tokens=21, 
    input_tokens=898, 
    total_tokens=919
)

In [222]:
example_dialogues = [Dialogue(**vectordb_dialogue.query('')[0])]

Dialogue(id=UUID('fc2b87b7-5847-403e-be65-af5a59eb947f'), character='Makima', source='manual')

In [207]:
memories

[Memory(id=UUID('9505e258-92a1-4b38-976a-9385f1145ff3'), content='I started a conversation with Jonny', timestamp=datetime.datetime(2023, 7, 6, 17, 6, 19, 85451, tzinfo=zoneinfo.ZoneInfo(key='US/Central')), importance=4),
 Memory(id=UUID('9ea9a510-cd06-4ffb-874e-7ba1ee51eb4f'), content="I'm very interested in getting to know Jonny", timestamp=datetime.datetime(2023, 7, 6, 17, 6, 19, 85477, tzinfo=zoneinfo.ZoneInfo(key='US/Central')), importance=4),
 Memory(id=UUID('2432789f-3439-425d-92f8-de38045f5f62'), content='I messaged Jonny: "Hmm... you seem interesting."', timestamp=datetime.datetime(2023, 7, 6, 17, 6, 19, 91364, tzinfo=zoneinfo.ZoneInfo(key='US/Central')), importance=2)]

In [167]:
vectordb_memory.query(queries[0])
# queries

[{'id': UUID('0c00c16d-c48e-41de-8524-5d312d29729e'),
  'content': 'I messaged Jonny: "Hmm... you seem interesting."',
  'timestamp': datetime.datetime(2023, 7, 6, 15, 7, 34, 817301, tzinfo=zoneinfo.ZoneInfo(key='US/Central')),
  'importance': 2,
  'embedding_model': <EmbeddingModelEnum.e5_small_v2: 'intfloat/e5-small-v2'>,
  'similarity_score': 0.8506692303916343},
 {'id': UUID('42404b0a-bcc9-4ed6-99c2-a8112913ba07'),
  'content': "I'm very interested in getting to know Jonny",
  'timestamp': datetime.datetime(2023, 7, 6, 15, 7, 34, 811272, tzinfo=zoneinfo.ZoneInfo(key='US/Central')),
  'importance': 4,
  'embedding_model': <EmbeddingModelEnum.e5_small_v2: 'intfloat/e5-small-v2'>,
  'similarity_score': 0.7560272454379813},
 {'id': UUID('41a77b20-8f4e-4568-a9e2-8f70de994855'),
  'content': 'I started a conversation with Jonny',
  'timestamp': datetime.datetime(2023, 7, 6, 15, 7, 34, 811250, tzinfo=zoneinfo.ZoneInfo(key='US/Central')),
  'importance': 4,
  'embedding_model': <EmbeddingM

In [146]:
queries

<coroutine object get_message_query at 0x2cdcdeea0>

In [144]:
entity_context

In [143]:
try:
    assert False
except Exception as e:
    print(e)




In [107]:
r = await llm.agenerate(prompt, params=params)

In [141]:
import re

pattern = r'\b\d+\.'
matches = re.split(pattern, r.content)
[x.strip() for x in matches]

['How do I know if I am interesting?',
 'Why do I say that Jonny is interesting? What clues can be used to deduce this?',
 'Do I think Jonny has ulterior motives? If so, what signals give me this impression?']

In [161]:
cache.set('conversation_id::importance', 0)
cache.set('conversation_id::importance', cache.get(...) + memory.importance)
if cache.get(...) > threshold:
    run_reflection_shit

'world'

## TODO
1. ~~Memory formation. I think we just do something like every other message, record a memory. This is viewed as an internal thought/observation~~
1. Add a `to_memory` method on the message class for forming a memory.
2. Entity Context run the query on vectordb for relevant memories.
3. ~~Agent summary. This is a running summarization of the Clone's thoughts, feelings, actions.~~
3. Decide if the benefits of 1. 2., 3. output outweighs the costs of bad parsing (for agent summaries)
4. ~~Previous conversation retrieval. Lots of pitfalls here if you retrieve without any context.~~ (merged into relevant memories)
5. ~~Generative-agents time-importance-similarity weighted retrieval~~
6. [optional] Reaction decision making. Whether to respond to a text or not
7. Output parsing text messages. How do we parse multi-line responses? separate on newline? make sure that's reflected in example dialogues
8. Reflection creation and retrieval.
8b. making the Redis (cache class for us) counter for importance thresholding on when to reflect.
9. Fact retrieval from the above wiki stuff
10. Token counting. Make sure all of this shit is within the context window and price it accordingly! which is about $0.6 cents per message. so 1k messages per $6.
11. What is the actual query for retrieving relevant memories during message gen. Is it the last convo message? the last 2? Do we need to form some intermediate query via LLM call, like "based on these last 2 messages, what information do you need to retrieve?" idea, 2-step LLM query gen + retrieval. idea: 
"""
{innate_traits}
{agent_summary}
{entity_context_summary}
{conversation[-4:]} <= (initial thing, like "i saw kevin walking around his room.")
""" => generate queries for vectordb
(2) query with GenAgentRetriever and add that to Message prompt.

In [None]:
you: Michael Jordan sucks
me: nah he's the greatest player of all time!


GenerativeAgents prompt generation for messages.
"""
{innate_traits}
{agent_summary}
{entity_context_summary}
{observation} <= (initial thing, like "i saw kevin walking around his room.")
{prev_dialogue}
"""


"""
{innate_traits}
{agent_summary}
{entity_context_summary}
{conversation[-4:]} <= (initial thing, like "i saw kevin walking around his room.")
""" => generate queries for vectordb
(2) query with GenAgentRetriever and add that to Message prompt.

In [162]:
1200 / 1000 * 0.0015

0.0018

In [None]:
agent_summary = 
(1) query = [
    "How would one describe {char}’s core characteristics?",
    "How would one describe {char}’s feeling about their recent progress in life?"
    # 'How would one describe {char}’s current daily occupation?',
] => GenAgentsRetreiver => let's say 100 memories
(2) based on these memories, answer what are the core characteristics, feelings on life...
output = agent_summary


entity_context_summary =
(1) what relationship does A have to B, ... some other short_description => query >retrieve => mems
(2) ^summarize above = entity_context_summary

memories like "A is talking to B about the upcoming mayoral election."

{agent_summar}
{current_conv}
should you respond?