In [13]:
# Load from parent directory if not installed
import importlib
import os

if not importlib.util.find_spec("sammo"):
    import sys

    sys.path.append("../")
os.environ["CACHE_FILE"] = "cache/quickstart.tsv"

# 🚀 Quick Start

To illustrate some of the core concepts, let's use SAMMO to generate content for a travel website.

To run this example, you need API credentials to an OpenAI API compatible model. 

Below, we will use ChatGPT-3.5 and also cache any requests made.

In [14]:
# %load -r :27 _init.py
import pathlib
import sammo
from sammo.runners import OpenAIChat
from sammo.base import Template, EvaluationScore
from sammo.components import Output, GenerateText, ForEach, Union
from sammo.extractors import ExtractRegex
from sammo.data import DataTable
import json
import requests
import os

API_CONFIG_FILE = pathlib.Path().cwd().parent / "config" / "personal.openai"
API_CONFIG = ""
if API_CONFIG_FILE.exists():
    API_CONFIG = API_CONFIG_FILE
if not API_CONFIG:
    raise ValueError('Please set API_CONFIG to {"api_key": "YOUR_KEY"}')

_ = sammo.setup_logger("WARNING")  # we're only interested in warnings for now

runner = OpenAIChat(
    model_id="gpt-3.5-turbo-16k",
    api_config=API_CONFIG,
    cache=os.getenv("CACHE_FILE", "cache.tsv"),
    timeout=30,
)

How about a quick 'Hello World?'?

In [15]:
Output(GenerateText("Hello World!")).run(runner)

+---------+------------------------------------+
| input   | output                             |
| None    | Hello! How can I assist you today? |
+---------+------------------------------------+
Constants: None

Calls via `.run()` always return a DataTable which keeps track of the input and output. It might be a little confusing to see an empty input field, but this is because we did not specify any actual input data. More on this in "Working with Data".

## Specifying a metaprompt
Let's say we have a list of countries. For each country, we want the top reason to visit as well as when to visit.

In [16]:
COUNTRIES = ["Switzerland", "Morocco", "Tanzania", "Indonesia", "Peru"]

reason_to_visit = GenerateText(
    Template("What is the top reason to visit {{input}} in one sentence?")
)
when_to_visit = GenerateText(
    Template(
        "Which season is the best time to visit {{input}}? Answer in one sentence."
    )
)
country_pages = Template(
    "# {{input}}\n{{reason}}\n\n## When to Visit\n{{when}}",
    reason=reason_to_visit,
    when=when_to_visit,
)

results = Output(country_pages).run(runner, COUNTRIES)
print(results.to_string(max_col_width=100, max_cell_length=300))

minibatches[###################################################################################]5/5[00:00<??:??, 0.00it/s]
+-------------+------------------------------------------------------------------------------------------------------+
| input       | output                                                                                               |
| Switzerland | # Switzerland The top reason to visit Switzerland is to experience its breathtaking landscapes, from |
|             | majestic mountains to pristine lakes.  ## When to Visit The best time to visit Switzerland is during |
|             | the summer season (June to August) when the weather is pleasant and outdoor activities are abun...   |
+-------------+------------------------------------------------------------------------------------------------------+
| Morocco     | # Morocco The top reason to visit Morocco is to immerse yourself in its rich and diverse culture,    |
|             | blending Arab, Berber, and E

Great, we just finished our travel blog in less than five minutes! 

Under the hood, `country_pages` is a graph of nested `Components` and gets called from the inside out. We refer to these call graphs as *metaprompts* because they are abstract away input data (as opposed to *prompts* which are concrete text strings sent to an LLM).

We can see the metaprompt structure by simply printing it:

In [11]:
print(country_pages)

Template(
  template_text = '# {{input}}
{{reason}}

## When to Visit
{{when}}',
  name = None,
  reason = GenerateText(
    child = Template(
      template_text = 'What is the top reason to visit {{input}} in one sentence?',
      name = None
    ),
    name = None,
    system_prompt = None,
    history = None,
    seed = 0,
    randomness = 0,
    max_tokens = None,
    on_error = 'raise'
  ),
  when = GenerateText(
    child = Template(
      template_text = 'Which season is the best time to visit {{input}}? Answer in one sentence.',
      name = None
    ),
    name = None,
    system_prompt = None,
    history = None,
    seed = 0,
    randomness = 0,
    max_tokens = None,
    on_error = 'raise'
  )
)


`SAMMO` also knows which operations can be done in parallel and schedules things accordingly. You can specify call limits the `Runner` instance (more on this in the section on minibatching).

## Recap
Let's talk about some of the key concepts from SAMMO we have used:

1. We constructed a **metaprompt** — a dynamic prompt that is re-used for different inputs.
2. This metaprompt has a structure which was constructed by nesting **components** from SAMMO. A helpful analogy might be to think of how we construct neural architectures.
3. To get the **output** for a metaprompt, we need to wrap the metaprompt in an Output component which returns a list of Result objects.
4. SAMMO **parallelized** execution for us on the input data — no extra work was needed! 