In [None]:
!pip install git+https://github.com/LLNL/AutoCog

# AutoCog Demo notebook

You can currently use three LM wrappers.

## OpenAI

```
!pip install openai tiktoken
import openai, os
openai.api_key = os.environ["OPENAI_API_KEY"] # or simply write your key here, don't commit it!!!
from autocog.lm import OpenAI
```

## HugginFace's tranformers

```
!pip install transformers
# Some models (llama) might require the latest version of transformers
!pip install git+https://github.com/huggingface/transformers
```

## Llama.cpp

### Clone and build llama.cpp

```
!git clone https://github.com/ggerganov/llama.cpp.git
!python3 -m pip install -r llama.cpp/requirements.txt
!make -C llama.cpp -j4 # runs make in subdir with 4 processes
```

### Download and convert original weights

Please fill the [paperwork with MetaAI](https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA/viewform) first!
There is an easy "community" downloader to get the weights from anywhere.
```
!wget https://raw.githubusercontent.com/juncongmoo/pyllama/main/llama/download_community.sh
!chmod +x download_community.sh
!./download_community.sh 7B llama.cpp/models
!python3 llama.cpp/convert.py llama.cpp/models/7B/
!./llama.cpp/quantize ./llama.cpp/models/7B/ggml-model-f16.bin ./llama.cpp/models/7B/ggml-model-q4_0.bin q4_0
```
This download script support all sizes of LLaMa: 7B, 13B, 30B, and 65B (download them all at once with `7B,13B,30B,65B` as 1st argument).

### Install python binding

It will install its own `llama.cpp` so you only need to preserve the models.
```
!pip install -y git+https://github.com/tristanvdb/llama-cpp-python@choice-dev
```

In [1]:
import os, sys, json
from autocog import CogArch
from autocog.lm import OpenAI, TfLM, Llama
from autocog.architecture.utility import PromptTee # used to display/capture the prompts (as a stream of decoded tokens)

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


# Fortune Teller

[./library/fortune.sta](./library/fortune.sta) has a single prompt that guides the LM through:
 - thinking about "what does the user want to hear?"
 - stating its own goal for the answer
 - thinking about the answer content
 - answering with a few sentences

The moniker is because this program create this answer from its weights. It does not use any reliable source of information and will lie with the full eloquence of a LM. Try unsing different `qualifier` like "unfair", "imaginary", ...

In [2]:
# Create an empty architecture: prompts are piped to sys.stdout as they are being completed
arch = CogArch(pipe=PromptTee(prefix='demo', tee=sys.stdout))

# Load an Automaton from a ".sta" file, provides "macros" (kwargs for f-exp in the source-code) 
sta_fortune = arch.load(tag='fortune', filepath='./library/fortune.sta', qualifier='balanced', S=10, T=20, N=5)

# Attach instances of OpenAI LLM (GPT 3.5) to the Automaton
arch.orchestrator.LMs.update({
  'text'     : OpenAI(max_tokens=20, temperature=0.4),
  'thought'  : OpenAI(max_tokens=15, temperature=1.0),
  'sentence' : OpenAI(max_tokens=50, temperature=0.7)
})

## Execute the automaton

Jupyter notebook already have a running `asyncio` loop, so `CogArch.__call__` will return a coroutine.

In [3]:
res = await arch('fortune', question="Is Eureka, CA a good place for a computer scientist who love nature?")



 === demo[0] === 

You are a helpful AI assistant.
You have been asked a question and will write a balanced answer.
You will analyse the user's question to write this balanced answer.
You are using an interactive questionnaire.
Follow this structure after the start prompt:
```
> question(text): question from the user
> meaning[10](thought): think about what the user might want hear
> intent(sentence): State how you will make your answer balanced to the user
> idea[20](thought): Consider balanced ideas to answer the question
> answer[5](sentence): Your balanced answer can be a few sentences (one per line)
```
Each prompt expects one of the following formats:
- text: ASCII text in any form
- thought: your thoughts (a few words per lines)
- sentence: a single, grammatically correct, sentence in natural language
Terminate each prompt with a newline. Use as many statement with `thought` format as needed.

start(record):
> question(text): Is Eureka, CA a good place for a computer scientist

## Outputs

Execution of any `Cog` returns a pair: the actual output and some implementation dependent information.
Currently STAs return their internal stack (full execution trace of the program).

In [4]:
print(json.dumps(res[0], indent=4))
print("======================================")
print(json.dumps(res[1]['fortune'][0][0].content, indent=4))

{
    "answer": [
        " Eureka, CA could be a great fit for a computer scientist who loves nature.  ",
        " It is a moderate sized city with great access to nature, and is close enough to Silicon Valley to have job opportunities nearby.  ",
        " Additionally, the area offers the chance to network with tech entrepreneurs or like minded professionals in the Humboldt State.  ",
        " Overall, Eureka, CA could offer a great balance for a computer scientist who loves nature.  ",
        " It could offer the perfect mix of job opportunities, networking possibilities, and access to nature."
    ]
}
{
    "question": [
        "Is Eureka, CA a good place for a computer scientist who love nature?"
    ],
    "meaning": [
        " The user is asking about the location's suitability for a computer scientist who",
        " loves nature. They are likely seeking a balance between having accessibility to their field",
        " and getting to experience nature.   "
    ],
    "int

# Visualization of the Architecture using GraphViz

You need to install both the `apt` or `yum` package and the `pip` one.
```
apt install graphviz
pip install graphviz
```

**FIXME** Channel edges are missing.

In [5]:
from autocog.utility.pynb import wrap_graphviz
wrap_graphviz(arch.toGraphViz())

# Search with SerpAPI

This [program](./library/simple-search.sta) demonstrates the call to another `Cog` from within `STA`.
We use a [wrapper](./autocog/tools/serpapi.py) for [SerpApi](https://serpapi.com/).
Requires that you set `SERPAPI_API_KEY` in your environment (or copy-paste the key below).

In [6]:
from autocog.tools.serpapi import SerpAPI

arch = CogArch(pipe=PromptTee(prefix='searcher', tee=sys.stdout))
arch.load(tag='searcher', filepath='./library/simple-search.sta', num_though_ask=3, num_though_choose=3, num_item=10, num_content=10)
arch.register(SerpAPI(tag='search', apikey=os.environ["SERPAPI_API_KEY"]))

arch.orchestrator.LMs.update({
  'text'     : OpenAI(max_tokens=30, temperature=0.4),
  'thought'  : OpenAI(max_tokens=15, temperature=1.0),
  'sentence' : OpenAI(max_tokens=50, temperature=0.7)
})

In [7]:
from autocog.utility.pynb import wrap_graphviz
wrap_graphviz(arch.toGraphViz())

In [8]:
res = await arch('searcher', question="Is Eureka, CA a good place for a computer scientist who love nature?")
print(json.dumps(res[0], indent=4))



 === searcher[0] === 

You are a helpful AI assistant.
You are conducting a search based on a user's question.
You are devising a query for the search engine.
You are using an interactive questionnaire.
Follow this structure after the start prompt:
```
> question(text): A question from the user
> thought[3](thought): Think about a good search query to answer the question
> query(text): A short query for the seach engine
```
Each prompt expects one of the following formats:
- text: ASCII text in any form
- thought: your thoughts (a few words per lines)
Terminate each prompt with a newline. Use as many statement with `thought` format as needed.

start(record):
> question(text): Is Eureka, CA a good place for a computer scientist who love nature?
> thought[1](thought):  Consider nature related words.  
> thought[2](thought):  Consider words related to computer scientists.  
> thought[3](thought):  Consider words related to Eureka.  
> query(text):  "Eureka, CA" nature computer scientist

# LLaMa

In [None]:
model_path = lambda x: "/workspace/models/{}/ggml-model-{}.bin".format(*x)

arch = CogArch(cogctx={'prompt_out':PromptTee(prefix='fortune', tee=sys.stdout)})
sta_fortune = arch.load(tag='fortune', filepath='./library/fortune.sta', qualifier='pleasant')
arch.cogs['fortune'].LMs.update({
  'text'     : Llama(model_path=model_path(('7B','q4_0')), n_ctx=2048, defaults={'max_tokens':20}),
  'thought'  : Llama(model_path=model_path(('7B','q4_0')), n_ctx=2048, defaults={'max_tokens':15}),
  'sentence' : Llama(model_path=model_path(('7B','q4_0')), n_ctx=2048, defaults={'max_tokens':50}),
})
res = await arch('fortune', question="Is Eureka, CA a good place for a computer scientist who love nature?")
print(json.dumps(res[0], indent=4))

# HuggingFace

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = 'gpt2-medium'
device     = None # 'cuda'
tokenizer  = AutoTokenizer.from_pretrained(model_name)
model      = AutoModelForCausalLM.from_pretrained(model_name).to(device)

from autocog.lm import TfLM

LMs = { 'text' : TfLM(tokenizer=tokenizer, model=model, device=device, sampling={ 'top_k' : 100, 'top_p' : 0.50, 'max_new_tokens' : 20 }) }

# Writer

It is pushing the current data-flow implementation pass its limits.
Using it to collect patterns to add to the [unit-tests](./tests/unittests).

**Current issue**: Losing notepad when ending the loop (only seem to happen if it iterates)

**TODO**: unit-tests for the diffrent features used:
 - loops
 - append
 - ...

In [None]:
import os, sys, json
from autocog import CogArch
from autocog.lm import OpenAI, TfLM, Llama
from autocog.architecture.utility import PromptTee # used to display/capture the prompts (as a stream of decoded tokens)

from autocog.tools.serpapi import SerpAPI

arch = CogArch(pipe=PromptTee(prefix='writer', tee=sys.stdout))
#arch = CogArch()
arch.load(tag='writer', filepath='./library/writer.sta')

# model_path__ = lambda x: "/workspace/models/{}/ggml-model-{}.bin".format(*x)
# model_path = model_path__(('30B','f16'))
# arch.orchestrator.LMs.update({
#   'text'     : Llama(model_path=model_path, n_ctx=2048, defaults={'max_tokens':20}),
#   'thought'  : Llama(model_path=model_path, n_ctx=2048, defaults={'max_tokens':15}),
#   'sentence' : Llama(model_path=model_path, n_ctx=2048, defaults={'max_tokens':30}),
#   'paragraph' : Llama(model_path=model_path, n_ctx=2048, defaults={'max_tokens':100})
# })
arch.orchestrator.LMs.update({
  'text'     : OpenAI(model='text-curie-001', max_tokens=30, temperature=0.4),
  'thought'  : OpenAI(model='text-curie-001', max_tokens=15, temperature=1.0),
  'sentence' : OpenAI(model='text-curie-001', max_tokens=50, temperature=0.5),
  'paragraph': OpenAI(model='text-curie-001', max_tokens=100, temperature=0.5)
})
# arch.orchestrator.LMs.update({
#   'text'     : OpenAI(max_tokens=30, temperature=0.4),
#   'thought'  : OpenAI(max_tokens=15, temperature=1.0),
#   'sentence' : OpenAI(max_tokens=30, temperature=0.5),
#   'paragraph': OpenAI(max_tokens=100, temperature=0.5)
# })

# from autocog.utility.pynb import wrap_graphviz
# wrap_graphviz(arch.toGraphViz())

res = await arch('writer', idea="A fun story about Cookie the dog running in the Redwood forest")
print(json.dumps(res[0], indent=4))

In [None]:
###### 

In [None]:
res[1].keys()

In [None]:
print(res[1]['write'][0][0].prompt)

In [None]:
# TODO exec-trace: graph of STs unrolled from the stack
import json
print(json.dumps(res[1]['write'][0][0].content, indent=4))

In [None]:
from autocog.utility.pynb import wrap_graphviz
wrap_graphviz(arch.toGraphViz())