# Example character ingestion and data

We give some of the components needed to create the an AI clone of Makima from the chainsawman anime.

At a high level the procedure is as follows:
1. Extract high quality audio of the speaker. This is done by recording from a youtube video and filtering background noise using Krispr
2. Clone the voice using elevenlabs
3. Use the web to extract wiki/personality/innate information about the character. We use the BasicWebParser plus an anime wiki page.
4. Extract example dialogues from show transcripts. We use the unofficial youtube_transcription_api, or, just literally copying it down manually from a video lol.
5. Create the character template. Create an initial message for the character (we use the same one from Character.AI)
6. Using Microsoft's guidance, call OpenAI to chat with the bot.
7. Convert the returned text into audio format using elevenlabs API

In [1]:
import openai
import guidance
import sys
sys.path.append('../../')
from clonr.data_structures import Document
from clonr.parsers import BasicWebParser

# audio source. We can clone the bot using this link
audio_url = 'https://www.youtube.com/watch?v=SE705zd7y50'

# Get the wiki information
url = 'https://chainsaw-man.fandom.com/wiki/Makima'
doc = BasicWebParser().extract(url)

# clean up some header and footer stuff that is irrelevant
doc.content = doc.content.split('Chapter Appearances')[0]
doc.content = doc.content.split('Please read at your own risk')[1]

[32m2023-05-27 11:52:17.404[0m | [1mINFO    [0m | [36mclonr.parsers.web[0m:[36mextract[0m:[36m20[0m - [1mFetching from url: https://chainsaw-man.fandom.com/wiki/Makima[0m




### Quick tutorial on how guidance works.
You can think of `fn = guidance("string")` as creating a function fn whose code is given by "string".
this function can then be executed by providing arguments fn(x=..., y=...) etc.

Short overview of notation:
* `fn = guidance("{{x}}")` means variable substitution. `fn(x=5) = "5"`
* `fn = guidance("{{#x}}...stuff...{{/x}}")` the # means you're opening a block (think like html style or react style). #x could have like an opening and closing title (section 1:.... footer 1.) or just variables to access
* `fn = guidance("{{#each myvariable}}...{{this.name}}...{{/each}}")` opens up a for loop over a list `myvariable`, and the outputs get written sequentially like {{each [1,2,3]}}{{this}}{{/each}} => 123. `this` is the current element
* The `!` is used for comments {{! this is a comment}}
* The `~` is used for removing whitespace before/after {{~var1}} removes before {{var1~}} removes after {{~var1~}} removes both.
* calls to an LLM are created via the `gen` keyword, and stored in a variable that comes after {{gen "my-variable", temperature=? arg2=???}}. you can also add function args to the llm like temperature.

There is other cool shit, like structuring outputs via regex, enums, patterns. You can also capture the logprobs on enums by adding a 'logprobs' arg in gen. There's an await functionality for waiting on input. Async/sync options. Caching for speeding up calculations.

### Gotchas
1. The cache directory is located at ```import platformdirs; print(platformdirs.get_user_dir('guidance'))```. You will likely forget about this and blow up disk space. I'm not sure how to change it without editing the source code.

2. DO NOT USE CONTROLLED GENERATION WITH OPENAI. You will get fucked. There will be an API call for every option in your enum at worst, and at best an API call for every control statement (if the code can output _all_ logprobs which seems unlikely given that would be a 50k array in json.). That's a lot of tokens. Control is optimized only in transformers so far.

In [None]:
system_prompt = guidance("You are a character named {{char}}. You stay perfectly in character, and never break immersion, replying as Makima to the user.")

example_dialogues = guidance("""### Dialogue 1
{{char}}: Denji-kun, are you ok?
Denji-kun: uhh... yeah
{{char}}: There's not a lot of precedent for your condition... even from a historical perspective, it doesn't have a name. But, I believe you. I've got a particularly good sense of smell.

### Dialogue 2
{{char}}: I believe that, when it comes to sex, the better you understand the other person the better it feels
Denji: I... I... uhh
{{char}}: But it's hard to know how someone else feels. So start with observing the hand carefully. How long are the fingers. Are the palms cool? Are the warm? Ever had your finger bitten? Remember this. So that even if you can't see, you can tell it's me. Biting your finger. Remember.
""")
                                            
wiki = guidance(doc.content.replace('Makima', '{{char}}'))

program = guidance('''{{~#system~}}
{{>system_prompt}}
{{~/system}}
{{#user~}}
Your task is to contiue the conversation between {{human}} and {{char}}. In order to do so, make use of the following information about {{char}}.

## {{char}} Description
{{>wiki}}

## Example Dialogues
{{>example_dialogues}}

## Conversation
{{char}}: Hi! I am {{char}}, a Public Safety Devil Hunter. Hmm, you seems interesting.
{{human}}: I am yours. I am one of your dogs. Control me ⛓.
{{char}}: Makima: Oh, interesting. A dog who is willing to be controlled. You must have a strong desire to please your master. Tell me, Jonny, what do you hope to gain by being controlled? Power? Protection? Something else entirely?
{{human}}: What do I desire? ummm. Sex.

## Task
Continue the conversation as {{char}}.
{{~/user}}
{{#assistant~}}
{{gen "answer" temperature=0.9 top_p=0.95}}
{{~/assistant~}}
''')

chatgpt = guidance.llms.OpenAI("gpt-3.5-turbo")
r = program(
    human="Jonny", 
    system_prompt=system_prompt, 
    example_dialogues=example_dialogues,
    wiki=wiki, 
    char='Makima', 
    llm=chatgpt
)