In [None]:
!pip install git+https://github.com/LLNL/AutoCog

# AutoCog Demo

In [1]:
import os, sys, json
from autocog import CogArch
from autocog.lm import OpenAI, TfLM, Llama
from autocog.architecture.utility import PromptTee # used to display/capture the prompts (as a stream of decoded tokens)

# Fortune Teller

[./library/fortune.sta](./library/fortune.sta) has a **single prompt** that guides the LM through:
 - thinking about "what does the user want to hear?"
 - stating its own goal for the answer
 - thinking about the answer content
 - answering with a few sentences

The moniker is because it does not use any reliable source of information. Try unsing different `qualifier` like "unfair", "imaginary", ...

In [2]:
# Create an empty architecture: prompts are piped to sys.stdout as they are being completed
arch = CogArch(pipe=PromptTee(prefix='demo', tee=sys.stdout))

# Load an Automaton from a ".sta" file, provides "macros" (kwargs for f-exp in the source-code) 
_ = arch.load(tag='fortune', filepath='./library/fortune.sta', qualifier="pleasant", S=3, T=5, N=3)

## Execute the program

First, we associate models to each `format` in the program.
These formats correspond to different parts of the data-structures defined in the program.
All formats derived from `text` so it is the only mandatory one.
However, mapping different LM to each format enables fine control over the completion algorithm.

Second, `CogArch.__call__` returns a coroutine. In Jupyter notebook, `await` is all you need. Else, you will have to wrap it in a call ro `asyncio.run`.

### OpenAI API

Uses the default model (`model="text-davinci-003"`).

In [3]:
arch.orchestrator.LMs.update({
  'text'     : OpenAI(max_tokens=20, temperature=0.4),
  'thought'  : OpenAI(max_tokens=15, temperature=1.0),
  'sentence' : OpenAI(max_tokens=50, temperature=0.7)
})
res_openai = await arch('fortune', question="What will happen when AGI appears?")



 === demo[0] === 

You are a helpful AI assistant.
You have been asked a question and will write a pleasant answer.
You will analyse the user's question to write this pleasant answer.
You are using an interactive questionnaire.
Follow this structure after the start prompt:
```
> question(text): question from the user
> meaning[3](thought): think about what the user might want hear
> intent(sentence): State how you will make your answer pleasant to the user
> idea[5](thought): Consider pleasant ideas to answer the question
> answer[3](sentence): Your pleasant answer can be a few sentences (one per line)
```
Each prompt expects one of the following formats:
- text: ASCII text in any form
- thought: your thoughts (a few words per lines)
- sentence: a single, grammatically correct, sentence in natural language
Terminate each prompt with a newline. Use as many statement with `thought` format as needed.

start(record):
> question(text): What will happen when AGI appears?
> meaning[1](thoug

### HuggingFace Transformers

We use `gpt2-medium` on CPU for this example.
`TfLM.create` returns the model and tokenizer.
The same model instance is used for all format but we vary the number of generated tokens and temperature.

In [4]:
model_kwargs = TfLM.create(model_path='gpt2-medium', device='cpu')
arch.orchestrator.LMs.update({
  'text'     : TfLM(**model_kwargs, completion_kwargs={ 'max_new_tokens' : 20, 'temperature' : 0.4 }),
  'thought'  : TfLM(**model_kwargs, completion_kwargs={ 'max_new_tokens' : 15, 'temperature' : 1.0 }),
  'sentence' : TfLM(**model_kwargs, completion_kwargs={ 'max_new_tokens' : 30, 'temperature' : 0.7 })
})
res_tflm = await arch('fortune', question="What will happen when AGI appears?")



 === demo[1] === 

You are a helpful AI assistant.
You have been asked a question and will write a pleasant answer.
You will analyse the user's question to write this pleasant answer.
You are using an interactive questionnaire.
Follow this structure after the start prompt:
```
> question(text): question from the user
> meaning[3](thought): think about what the user might want hear
> intent(sentence): State how you will make your answer pleasant to the user
> idea[5](thought): Consider pleasant ideas to answer the question
> answer[3](sentence): Your pleasant answer can be a few sentences (one per line)
```
Each prompt expects one of the following formats:
- text: ASCII text in any form
- thought: your thoughts (a few words per lines)
- sentence: a single, grammatically correct, sentence in natural language
Terminate each prompt with a newline. Use as many statement with `thought` format as needed.

start(record):
> question(text): What will happen when AGI appears?
> meaning[1](thoug

### LLaMa.cpp

We use Meta's LLaMa 7B with 4 bits quantization.
`LLama.create` returns the instantiated model.
The same model instance is used for all format but we vary the number of generated tokens and temperature.

In [13]:
llama_path = lambda x: "/workspace/models/{}/ggml-model-{}.bin".format(*x)
model_kwargs = Llama.create(model_path=llama_path(('7B','q4_0')), n_ctx=2048)
arch.orchestrator.LMs.update({
  'text'     : Llama(**model_kwargs, completion_kwargs={ 'max_tokens' : 20, 'temperature' : 0.4 }),
  'thought'  : Llama(**model_kwargs, completion_kwargs={ 'max_tokens' : 15, 'temperature' : 1.0 }),
  'sentence' : Llama(**model_kwargs, completion_kwargs={ 'max_tokens' : 30, 'temperature' : 0.7 }),
}) # llama-cpp-python defaults: top_p=0.95, top_k=40, repeat_penalty=1.1
res_llama = await arch('fortune', question="What will happen when AGI appears?")

llama.cpp: loading model from /workspace/models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  68.20 KB
llama_model_load_internal: mem required  = 5809.33 MB (+ 1026.00 MB per state)




 === demo[0] === 

You are a helpful AI assistant.
You have been asked a question and will write a pleasant answer.
You will analyse the user's question to write this pleasant answer.
You are using an interactive questionnaire.
Follow this structure after the start prompt:
```
> question(text): question from the user
> meaning[3](thought): think about what the user might want hear
> intent(sentence): State how you will make your answer pleasant to the user
> idea[5](thought): Consider pleasant ideas to answer the question
> answer[3](sentence): Your pleasant answer can be a few sentences (one per line)
```
Each prompt expects one of the following formats:
- text: ASCII text in any form
- thought: your thoughts (a few words per lines)
- sentence: a single, grammatically correct, sentence in natural language
Terminate each prompt with a newline. Use as many statement with `thought` format as needed.

start(record):
> question(text): What will happen when AGI appears?
> meaning[1](thoug

llama_init_from_file: kv self size  = 1024.00 MB


 "AGI" is an acronym for Artificial General Int
> intent(sentence):  gin, and it refers to a system that can perform any tas
> idea[1](thought):  k it’s able to learn how to do so.
> answer[1](sentence):  I'm not sure about what will happen when AGI appears.
> answer[2](sentence):  But a lot of people seem very worried about this, and
> answer[3](sentence):  they have good reasons to worry:


## Outputs

Execution of any `Cog` returns a pair: the actual output and some implementation dependent information.
Currently STAs return their internal stack (full execution trace of the program).

In [12]:
#for res in [res_openai, res_tflm, res_llama]:
for (i,res) in enumerate([res_openai]):
    print(json.dumps(res[0], indent=4))
    print("--------------------------------------")
    print(json.dumps(arch.orchestrator.frames[i+1].stacks['fortune'][0][0].content, indent=4))
    print("======================================")

{
    "answer": [
        " AGI could bring about a positive revolution in our lives, with automation of some parts, increased efficiency and accuracy of decisions, and a growth of human-machine synergies.  ",
        " It could also help us to solve complex problems that humans alone can\u2019t solve, and to create a more prosperous and creative future.  ",
        " Ultimately, AGI could be a powerful tool to help us reach our potential as a species."
    ]
}
--------------------------------------
{
    "question": [
        "What will happen when AGI appears?"
    ],
    "meaning": [
        " AGI could bring huge changes to our world.  "
    ],
    "intent": " I will provide an overview of what AGI could bring to our lives.  ",
    "idea": [
        " Positive revolution in our lives  ",
        " Automation of parts of our lives  ",
        " Increase in efficiency of operations  ",
        " Boost in integrity and accuracy of decisions  ",
        " Growth in human-machine synerg

In [13]:
res_openai

[{'answer': [' AGI could bring about a positive revolution in our lives, with automation of some parts, increased efficiency and accuracy of decisions, and a growth of human-machine synergies.  ',
   ' It could also help us to solve complex problems that humans alone can’t solve, and to create a more prosperous and creative future.  ',
   ' Ultimately, AGI could be a powerful tool to help us reach our potential as a species.']}]

# Visualization of the Architecture using GraphViz

You need to install both the `apt` or `yum` package and the `pip` one.
```
apt install graphviz
pip install graphviz
```

**FIXME** Channel edges are missing.

In [15]:
from autocog.utility.pynb import wrap_graphviz
wrap_graphviz(arch.toGraphViz())

# Search with SerpAPI

This [program](./library/simple-search.sta) demonstrates the call to another `Cog` from within `STA`.
We use a [wrapper](./autocog/tools/serpapi.py) for [SerpApi](https://serpapi.com/).
Requires that you set `SERPAPI_API_KEY` in your environment (or copy-paste the key below).

In [4]:
from autocog.tools.serpapi import SerpAPI

arch = CogArch(pipe=PromptTee(prefix='searcher', tee=sys.stdout))
arch.load(tag='searcher', filepath='./library/simple-search.sta', num_though_ask=10, num_though_choose=10, num_item=10, num_content=10, engine='"google_scholar"')
arch.register(SerpAPI(tag='search', apikey=os.environ["SERPAPI_API_KEY"]))

arch.orchestrator.LMs.update({
  'text'     : OpenAI(max_tokens=30, temperature=0.4),
  'thought'  : OpenAI(max_tokens=15, temperature=1.0),
  'sentence' : OpenAI(max_tokens=50, temperature=0.7)
})

In [5]:
from autocog.utility.pynb import wrap_graphviz
wrap_graphviz(arch.toGraphViz())

In [6]:
res = await arch('searcher', question="automaton and cognition")
print(json.dumps(res[0], indent=4))



 === searcher[0] === 

You are a helpful AI assistant.
You are conducting a search based on a user's question.
You are devising a query for the search engine.
You are using an interactive questionnaire.
Follow this structure after the start prompt:
```
> question(text): A question from the user
> thought[3](thought): Think about a good search query to answer the question
> query(text): A short query for the seach engine
```
Each prompt expects one of the following formats:
- text: ASCII text in any form
- thought: your thoughts (a few words per lines)
Terminate each prompt with a newline. Use as many statement with `thought` format as needed.

start(record):
> question(text): automaton and cognition
> thought[1](thought):  What is the relationship between automata and cognition?  				
> thought[2](thought):  How can automaton learn?  								
> thought[3](thought):  What is the impact of AI on cognition?  					
> query(text):  "automaton cognition" OR "AI cognition"
You are a helpful