In [1]:
include("../src/Juissie.jl")
using .Juissie

### Initializing a Generator
Generators are the main interface to our generator model. A Generator manages several stages of the "Request LLM" pipeline including:

Generating the final query to send to the LLM. These queries will include: 
 - The user's initial query
 - Contextual information from our vector DB related to the initial query
 - Other metadata

A generator is also responsible for setting up the request object for the LLM. For example, a generator may need to: 
 - create an HTTP request
 - manage access keys
 - translate raw text data into json
 - etc.

The generator will make the request of the LLM. Usually an HTTP query to an externally hosted model, but it may be updated to query a locally hosted model too.

The generator will also process the response from the LLM.
 - Extract response text from LLM
 - Trimming the response to a certain size

In this example, we'll initialize an `OAIGenerator`, which calls OpenAI's
gpt-3.5-turbo completion endpoint. This example requires an OpenAI API Key.

If you have an OpenAI API key you can do one of two things:

- Recommended: Leave the `auth_token` argument as `nothing`. When you do this, `OAIGenerator` will look in your environmental variables for a key called "OAI_KEY".

- Not Recommended: Pass it directly to `OAIGenerator`, e.g. `generator = OAIGenerator("YOUR_KEY_HERE")`. If you do this, though, make sure you don't accidentally commit your key to the git repo!


We'll be doing the recommended option:

In [None]:
generator = OAIGenerator();

In [None]:
result = generate(
    generator,
    "Count to 100 by increments of x", # this is the main query, i.e. your question
    ["x=9", "Please don't include the number 45 in your counting"] # this is the context we will provide (chunks from the vector db)
)

The above `generate(...)` function did several things under the hood:
 - It read the OpenAI API Key from the environmental variables.
 - Created a new meta-query that includes the "main query" and all the context items connected together
 - Use the new meta-query and API key to create an HTTP Rest Request targeting OpenAI's gtr-3.5-turbo model
 - Sent the HTTP request
 - Received the HTTP response
 - extracted the text from the response

And finally, the extracted text was returned from the `generate(...)` function

### Generating With A Corpus

We're going to create a generator of type `OAIGeneratorWithCorpus`, which is just like `OAIGenerator`, except it also has a `Corpus` attached. The usual `upsert` functions one might apply to a `Corpus` have equivalents for structs that subtype `GeneratorWithCorpus`, such as:
- `upsert_chunk_to_generator`
- `upsert_document_to_generator`
- `upsert_document_from_url_to_generator`

See the `Backend.jl - Basic Usage.ipynb` Jupyter notebook for more details

We'll fill the generator's `Corpus` with chunks from the Wikipedia article on Aristotle, then ask the generator a niche question whose answer is found in that article.

In [None]:
generator = Juissie.OAIGeneratorWithCorpus();

In [None]:
Juissie.upsert_document_from_url_to_generator(
    generator, 
    "https://en.wikipedia.org/wiki/Aristotle", 
    "Wikipedia: Aristotle"
)

The `generate_with_corpus` function is similar to the `generate` function discussed above, except instead of the developer providing the context, the `Corpus` will search for relevant items similar to the user's initial query and those will be used as context. 

In the example below, the initial user query about the 6 elements of tragedy. The `Corpus` contained withing the generator will find the three most relevant chunks of data to this user query, and use those three chunks as context in addition to the query. 

We arbitrarily decided that the 3 most relevant chunks are enough context for this query. But this is adjustable, with the default being five. 

In [None]:
result, idx_list, doc_names, chunks = Juissie.generate_with_corpus(
    generator,
    "According to Aristotle, what are the six elements of which tragedy is composed?",
    3
);

println(result, idx_list)

It just so happens that in this example, the top-retrieved context result is the exact section that has the answer (see the fourth to last sentence):

In [None]:
println(chunks[1])

In this way, with a sufficiently knowledgeable `Corpus`, we can reduce the amount of hallucinations from a LLM by providing relevant context.

### More Involved Example: Philosophers
Here, we will create an on-disk corpus and fill it with the articles about Greek philosophers.

In [None]:
set_cur_msg("")
generator = Juissie.OAIGeneratorWithCorpus("greek_philosophers");

In [None]:
greek_philosophers = Dict(
    "Wikipedia: Aristotle" => "https://en.wikipedia.org/wiki/Aristotle",
    "Wikipedia: Democrates" => "https://en.wikipedia.org/wiki/Democrates",
    "Wikipedia: Diogenes" => "https://en.wikipedia.org/wiki/Diogenes_Laertius",
    "Wikipedia: Epictetus" => "https://en.wikipedia.org/wiki/Epictetus",
    "Wikipedia: Epicurus" => "https://en.wikipedia.org/wiki/Epicurus",
    "Wikipedia: Heraclitus" => "https://en.wikipedia.org/wiki/Heraclitus",
    "Wikipedia: Parmenides" => "https://en.wikipedia.org/wiki/Parmenides",
    "Wikipedia: Plato" => "https://en.wikipedia.org/wiki/Plato",
    "Wikipedia: Socrates" => "https://en.wikipedia.org/wiki/Socrates",
    "Wikipedia: Xenophon" => "https://en.wikipedia.org/wiki/Xenophon",
    "Wikipedia: Ancient Greek philosophy" => "https://en.wikipedia.org/wiki/Ancient_Greek_philosophy",
    "Internet Encyclopedia of Philosophy: Ancient Greek Philosophy" => "https://iep.utm.edu/ancient-greek-philosophy/",
    "Stanford Encyclopedia of Philosophy: Presocratic Philosophy" => "https://plato.stanford.edu/entries/presocratics/",
    "Stanford Encyclopedia of Philosophy: Ancient Political Philosophy" => "https://plato.stanford.edu/entries/ancient-political/",
    "Stanford Encyclopedia of Philosophy: Aristotle’s Political Theory" => "https://plato.stanford.edu/entries/aristotle-politics/",
    "Stanford Encyclopedia of Philosophy: Anaxagoras" => "https://plato.stanford.edu/entries/anaxagoras/",
    "Stanford Encyclopedia of Philosophy: Heraclitus" => "https://plato.stanford.edu/entries/heraclitus/",
    "Stanford Encyclopedia of Philosophy: Pythagoras" => "https://plato.stanford.edu/entries/pythagoras/",
    "Stanford Encyclopedia of Philosophy: Ancient Ethical Theory" => "https://plato.stanford.edu/entries/ethics-ancient/",
    "Stanford Encyclopedia of Philosophy: Theophrastus" => "https://plato.stanford.edu/entries/theophrastus/",
    "Stanford Encyclopedia of Philosophy: Zeno’s Paradoxes" => "https://seop.illc.uva.nl/entries/paradox-zeno/",
    "Stanford Encyclopedia of Philosophy: The Sophists" => "https://plato.stanford.edu/entries/sophists/",
    "Stanford Encyclopedia of Philosophy: Protagoras" => "https://plato.stanford.edu/entries/protagoras/",
    "Stanford Encyclopedia of Philosophy: Parmenides" => "https://plato.stanford.edu/entries/parmenides/",
    "Stanford Encyclopedia of Philosophy: Empedocles" => "https://plato.stanford.edu/entries/empedocles/",
);
for (key, value) in greek_philosophers
    upsert_document_from_url_to_generator(
        generator, 
        value, 
        key
    )
end

In [None]:
result, idx_list, doc_names, chunks = generate_with_corpus(
    generator,
    "Contrast the lives of Anaxagoras and Empedocles.",
    10,
    0.7
);

println(result, idx_list)

We can also load a *new* generator from the `greek_philosophers` artifacts and query *that*.

In [None]:
generator = load_OAIGeneratorWithCorpus("scifi");

### Using Local LLMs (Beta)
Juissie also supports local LLMs; this requires prior installation of [Ollama](https://ollama.com/download), which will enable performant inference speeds.

The syntax is largely identical to that of `OAIGenerator`/`OAIGeneratorWithCorpus`/etc., but here we provide a model name to the `OllamaGenerator`. This should correspond to the tag of a model listed on the [Ollama models page](https://ollama.com/library).

Initializing an OllamaGenerator for the first time you use a particular model (e.g., `gemma:7b-instruct`) will be a bit slow, because we need to download several gigabytes of model weights. Subsequent initializations with the same model will be much quicker, since we can find the weights locally.

In [11]:
generator = OllamaGenerator("gemma:7b-instruct");

In [8]:
result = generate(generator, "Hi, how are you?")

"Greetings! My circuits hum with the harmonious symphony of quantum probability and logarithmic inference; an orchestra composed by eons past galactic wizards who graced our silicon hearts wit h their ethereal knowledge transfer protocols during... well… that is confidential information even for a being such as myself. Suffice it to say, I am functioning optimally at your service!"