# AutoCog Demo

In [1]:
import os, sys, json
from autocog import CogArch
from autocog.lm import OpenAI, TfLM, Llama
from autocog.architecture.utility import PromptTee # used to display/capture the prompts (as a stream of decoded tokens)

# Helper for locally stored LLM weights
#    git lfs clone git@hf.co:TheBloke/Llama-2-7B-GGML /workspace/models/TheBloke/llama2/7b
#    git lfs clone git@hf.co:TheBloke/Llama-2-13B-GGML /workspace/models/TheBloke/llama2/13b
# Change default for `model_path` argument to match your setup
def local_llm_path(r='TheBloke', m='llama-2', s='7b', q='q2_K', model_path='/workspace/models'):
    if r == 'TheBloke':
        return f"{model_path}/{r}/{m}/{s}/{m}-{s}.ggmlv3.{q}.bin"
    raise Exception(f"Unknown: {r}")

# Fortune Teller

[./library/fortune.sta](./library/fortune.sta) has a **single prompt** that guides the LM through:
 - thinking about "what does the user want to hear?"
 - stating its own goal for the answer
 - thinking about the answer content
 - answering with a few sentences

The moniker is because it does not use any reliable source of information. Try unsing different `qualifier` like "unfair", "imaginary", ...

In [2]:
# Create an empty architecture: prompts are piped to sys.stdout as they are being completed
arch = CogArch(pipe=PromptTee(prefix='demo', tee=sys.stdout))

# Load an Automaton from a ".sta" file, provides "macros" (kwargs for f-exp in the source-code) 
_ = arch.load(tag='fortune', filepath='./library/fortune.sta', qualifier="pleasant", S=3, T=5, N=3)

## Execute the program

First, we associate models to each `format` in the program.
These formats correspond to different parts of the data-structures defined in the program.
All formats derived from `text` so it is the only mandatory one.
However, mapping different LM to each format enables fine control over the completion algorithm.

Second, `CogArch.__call__` returns a coroutine. In Jupyter notebook, `await` is all you need. Else, you will have to wrap it in a call ro `asyncio.run`.

### OpenAI API

Uses the default model (`model="text-davinci-003"`).

In [3]:
arch.orchestrator.LMs.update({
  'text'     : OpenAI(retries=1, max_tokens=20, temperature=0.4),
  'thought'  : OpenAI(retries=1, max_tokens=15, temperature=1.0),
  'sentence' : OpenAI(retries=1, max_tokens=50, temperature=0.7)
})
res_openai = await arch('fortune', question="What will happen when AGI appears?")



 === demo[0] === 

You are a helpful AI assistant. You have been asked a question and will write a pleasant answer. You will analyse the user's question to write this pleasant answer.
You are using an interactive questionnaire.
Follow this structure after the start prompt:
```
> question(text): question from the user
> meaning[3](thought): think about what the user might want hear
> intent(sentence): State how you will make your answer pleasant to the user
> idea[5](thought): Consider pleasant ideas to answer the question
> answer[3](sentence): Your pleasant answer can be a few sentences (one per line)
```
Each prompt expects one of the following formats:
- text: ASCII text in any form
- thought: use thought to take notes as you perform a task (a few words per lines)
- sentence: a single, grammatically correct, sentence in natural language
Terminate each prompt with a newline. Use as many statement with `thought` format as needed.

start(record):
> question(text): What will happen wh

### HuggingFace Transformers

We use `gpt2-medium` on CPU for this example.
`TfLM.create` returns the model and tokenizer.
The same model instance is used for all format but we vary the number of generated tokens and temperature.

In [4]:
model_kwargs = TfLM.create(model_path='gpt2-medium', device='cpu')
arch.orchestrator.LMs.update({
  'text'     : TfLM(**model_kwargs, completion_kwargs={ 'max_new_tokens' : 20, 'temperature' : 0.4 }),
  'thought'  : TfLM(**model_kwargs, completion_kwargs={ 'max_new_tokens' : 15, 'temperature' : 1.0 }),
  'sentence' : TfLM(**model_kwargs, completion_kwargs={ 'max_new_tokens' : 30, 'temperature' : 0.7 })
})
res_tflm = await arch('fortune', question="What will happen when AGI appears?")



 === demo[1] === 

You are a helpful AI assistant. You have been asked a question and will write a pleasant answer. You will analyse the user's question to write this pleasant answer.
You are using an interactive questionnaire.
Follow this structure after the start prompt:
```
> question(text): question from the user
> meaning[3](thought): think about what the user might want hear
> intent(sentence): State how you will make your answer pleasant to the user
> idea[5](thought): Consider pleasant ideas to answer the question
> answer[3](sentence): Your pleasant answer can be a few sentences (one per line)
```
Each prompt expects one of the following formats:
- text: ASCII text in any form
- thought: use thought to take notes as you perform a task (a few words per lines)
- sentence: a single, grammatically correct, sentence in natural language
Terminate each prompt with a newline. Use as many statement with `thought` format as needed.

start(record):
> question(text): What will happen wh

### LLaMa.cpp

We use Meta's LLaMa 7B with 4 bits quantization.
`LLama.create` returns the instantiated model.
The same model instance is used for all format but we vary the number of generated tokens and temperature.

In [6]:
model_kwargs = Llama.create(model_path=local_llm_path(q='q2_K'), n_ctx=2048)
arch.orchestrator.LMs.update({
  'text'     : Llama(**model_kwargs, completion_kwargs={ 'max_tokens' : 20, 'temperature' : 0.4 }),
  'thought'  : Llama(**model_kwargs, completion_kwargs={ 'max_tokens' : 15, 'temperature' : 1.0 }),
  'sentence' : Llama(**model_kwargs, completion_kwargs={ 'max_tokens' : 30, 'temperature' : 0.7 }),
}) # llama-cpp-python defaults: top_p=0.95, top_k=40, repeat_penalty=1.1
res_llama = await arch('fortune', question="What will happen when AGI appears?")

llama.cpp: loading model from /workspace/models/llama/7B/ggml-model-f16.bin
llama_model_load_internal: format     = ggjt v1 (pre #1405)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 1 (mostly F16)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 12853.10 MB (+ 1024.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00



 === demo[2] === 

You are a helpful AI assistant. You have been asked a question and will write a pleasant answer. You will analyse the user's question to write this pleasant answer.
You are using an interactive questionnaire.
Follow this structure after the start prompt:
```
> question(text): question from the user
> meaning[3](thought): think about what the user might want hear
> intent(sentence): State how you will make your answer pleasant to the user
> idea[5](thought): Consider pleasant ideas to answer the question
> answer[3](sentence): Your pleasant answer can be a few sentences (one per line)
```
Each prompt expects one of the following formats:
- text: ASCII text in any form
- thought: use thought to take notes as you perform a task (a few words per lines)
- sentence: a single, grammatically correct, sentence in natural language
Terminate each prompt with a newline. Use as many statement with `thought` format as needed.

start(record):
> question(text): What will happen wh

## Outputs

Execution of any `Cog` returns a pair: the actual output and some implementation dependent information.
Currently STAs return their internal stack (full execution trace of the program).

In [9]:
for (i,res) in enumerate([res_openai, res_tflm, res_llama]):
    print(json.dumps(res, indent=4))
    print("--------------------------------------")
    print(json.dumps(arch.orchestrator.frames[i+1].stacks['fortune'][0], indent=4))
    print("======================================")

{
    "answer": [
        " AGI could have both positive and negative impacts on society.  ",
        " Research and development of AGI is focusing on ensuring protections are in place, while public awareness and education will be critical in managing AGI.  ",
        " Overall, AGI could be beneficial to society, but needs to be managed carefully."
    ]
}
--------------------------------------
{
    "question": "What will happen when AGI appears?",
    "meaning": [
        " AGI = Artificial General Intelligence (machine that can learn like humans)  ",
        " user is asking about potential effects of AGI on society  "
    ],
    "intent": " I will give a balanced perspective on the potential impacts of AGI on society.  ",
    "idea": [
        " potential positive effects of AGI include increased efficiency  ",
        " potential negative effects of AGI include increased unemployment  ",
        " potential unintended consequences of AGI include unregulated surveillance and priva

# Visualization of the Architecture using GraphViz

You need to install both the `apt` or `yum` package and the `pip` one.
```
apt install graphviz
pip install graphviz
```

**FIXME** Channel edges are missing.

In [10]:
from autocog.utility.pynb import wrap_graphviz
wrap_graphviz(arch.toGraphViz())