# completions

> Otherwise known as *chat completions*. See the [`litellm` documention](https://docs.litellm.ai/docs/completion).

In [None]:
#|default_exp llm.completions

In [None]:
#|hide
import nblite; from nblite import show_doc; nblite.nbl_export()

In [None]:
#|export
try:
    import litellm
    import inspect
    from inspect import Parameter
    import functools
    from typing import List, Dict
    from adulib.llm._utils import _llm_func_factory, _llm_async_func_factory
    from adulib.llm.tokens import token_counter
except ImportError as e:
    raise ImportError(f"Install adulib[llm] to use this API.") from e

In [None]:
#|hide
from pydantic import BaseModel
from adulib.caching import set_default_cache_path
from adulib.asynchronous import batch_executor
from adulib.utils import as_dict
import adulib.llm.completions as this_module

In [None]:
#|hide
from adulib.llm import set_call_log_save_path
repo_path = nblite.config.get_project_root_and_config()[0]
set_default_cache_path(repo_path / '.tmp_cache')
set_call_log_save_path(repo_path / '.call_logs.jsonl') 

The two required fields for completions are `model` and `message`. Some optional arguments are:

#### Properties of `messages`

Each message in the `messages` array can include the following fields:

- `role: str` (**required**) - The role of the message's author. Roles can be: system, user, assistant, function or tool.
- `content: Union[str,List[dict],None]` (**required**) - The contents of the message. It is required for all messages, but may be null for assistant messages with function calls.
- `name: str` - The name of the author of the message. It is required if the role is "function". The name should match the name of the function represented in the content. It can contain characters (a-z, A-Z, 0-9), and underscores, with a maximum length of 64 characters.
- `function_call: object` - The name and arguments of a function that should be called, as generated by the model.
- `tool_call_id: str` - Tool call that this message is responding to.


#### Explanation of roles

- **system**: Sets assistant context. Example: `{ "role": "system", "content": "You are a helpful assistant." }`
- **user**: End user input. Example:  `{ "role": "user", "content": "What's the weather like today?" }`
- **assistant**: AI response. Example: `{ "role": "assistant", "content": "The weather is sunny and warm." }`
- **function**: Function call/result (`name` required). Example:  `{ "role": "function", "name": "get_weather", "content": "{\"location\": \"San Francisco\"}" }`
- **tool**: Tool/plugin interaction (`tool_call_id` required). Example: `{ "role": "tool", "tool_call_id": "abc123", "content": "Tool response here" }`

#### Simplified completions: `prompt`

Use the `llm.prompt` (async: `llm.async_prompt`) to perform a simplified single-turn completion.

In [None]:
#|echo: false
show_doc(this_module.completion)

## completion

```python
completion(
   model: str,
   messages: typing.List[typing.Dict[str, str]],
   *args,
   cache_enabled: bool,
   cache_path: typing.Union[str, pathlib.Path, NoneType],
   cache_key_prefix: typing.Optional[str],
   include_model_in_cache_key: bool,
   return_cache_key: bool,
   return_info: bool,
   enable_retries: bool,
   retry_on_exceptions: typing.Optional[list[Exception]],
   retry_on_all_exceptions: bool,
   max_retries: typing.Optional[int],
   retry_delay: typing.Optional[int],
   **kwargs
)
```

This function is a wrapper around a corresponding function in the `litellm` library, see [this](https://docs.litellm.ai/docs/completion/input) for a full list of the available arguments.

---


In [None]:
#|export
completion = _llm_func_factory(
    func=litellm.completion,
    func_name="completion",
    func_cache_name="completion",
    module_name=__name__,
    cache_key_content_args=['messages', 'response_format'],
    retrieve_log_data=lambda model, func_kwargs, response, cache_args: {
        "method": "completion",
        "input_tokens": token_counter(model=model, messages=func_kwargs['messages'], **cache_args),
        "output_tokens": sum([token_counter(model=model, messages=[{'role': c.message.role, 'content': c.message.content}], **cache_args) for c in response.choices]),
        "cost": response._hidden_params['response_cost'],
    }
)

completion.__doc__ = """
This function is a wrapper around a corresponding function in the `litellm` library, see [this](https://docs.litellm.ai/docs/completion/input) for a full list of the available arguments.
""".strip()

sig = inspect.signature(completion)
sig = sig.replace(parameters=[
    Parameter("model", Parameter.POSITIONAL_OR_KEYWORD, annotation=str),
    Parameter("messages", Parameter.POSITIONAL_OR_KEYWORD, annotation=List[Dict[str, str]]),
    *sig.parameters.values()
])
completion.__signature__ = sig

In [None]:
response, cache_hit, call_log = completion(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
)
response.choices[0].message.content

'The capital of France is Paris.'

In [None]:
print(f"Cache hit: {cache_hit}")
print(f"Input tokens: {call_log['input_tokens']}")
print(f"Output tokens: {call_log['output_tokens']}")
print(f"Cost: {call_log['cost']}")

Cache hit: True
Input tokens: 24
Output tokens: 14
Cost: 7.8e-06


In [None]:
response = completion(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
    return_info=False
)
response.choices[0].message.content

'The capital of France is Paris.'

In [None]:
class Recipe(BaseModel):
    name: str
    ingredients: List[str]
    steps: List[str]

response, cache_hit, call_log = completion(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful cooking assistant."},
        {"role": "user", "content": "Give me a simple recipe for pancakes."}
    ],
    response_format=Recipe
)

Recipe.model_validate_json(response.choices[0].message.content).model_dump()

{'name': 'Simple Pancakes',
 'ingredients': ['1 cup all-purpose flour',
  '2 tablespoons sugar',
  '2 teaspoons baking powder',
  '1/2 teaspoon salt',
  '1 cup milk',
  '1 egg',
  '2 tablespoons melted butter',
  '1 teaspoon vanilla extract'],
 'steps': ['In a large bowl, whisk together the flour, sugar, baking powder, and salt.',
  'In a separate bowl, mix the milk, egg, melted butter, and vanilla extract until well combined.',
  "Pour the wet ingredients into the dry ingredients and stir until just combined. Do not overmix; it's okay if there are a few lumps.",
  'Heat a non-stick skillet or griddle over medium heat and grease lightly with butter or oil.',
  'Pour 1/4 cup of batter onto the skillet for each pancake. Cook until bubbles form on the surface, about 2-3 minutes.',
  'Flip the pancakes and cook for another 2-3 minutes, until golden brown.',
  'Remove from skillet and keep warm while cooking the remaining pancakes.']}

You can save costs during testing using [mock responses](https://docs.litellm.ai/docs/completion/mock_requests):

In [None]:
response, cache_hit, call_log = completion(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of Sweden?"}
    ],
    mock_response = "Stockholm"
)
response.choices[0].message.content

'Stockholm'

In [None]:
#|echo: false
show_doc(this_module.async_completion)

## async_completion *(async)*

```python
async_completion(
   model: str,
   messages: typing.List[typing.Dict[str, str]],
   *args,
   cache_enabled: bool,
   cache_path: typing.Union[str, pathlib.Path, NoneType],
   cache_key_prefix: typing.Optional[str],
   include_model_in_cache_key: bool,
   return_cache_key: bool,
   return_info: bool,
   enable_retries: bool,
   retry_on_exceptions: typing.Optional[list[Exception]],
   retry_on_all_exceptions: bool,
   max_retries: typing.Optional[int],
   retry_delay: typing.Optional[int],
   timeout: typing.Optional[int],
   **kwargs
)
```

---


In [None]:
#|export
async_completion = _llm_async_func_factory(
    func=litellm.acompletion,
    func_name="async_completion",
    func_cache_name="completion",
    module_name=__name__,
    cache_key_content_args=['messages', 'response_format'],
    retrieve_log_data=lambda model, func_kwargs, response, cache_args: {
        "method": "completion",
        "input_tokens": token_counter(model=model, messages=func_kwargs['messages'], **cache_args),
        "output_tokens": sum([token_counter(model=model, messages=[{'role': c.message.role, 'content': c.message.content}], **cache_args) for c in response.choices]),
        "cost": response._hidden_params['response_cost'],
    }
)

completion.__doc__ = """
This function is a wrapper around a corresponding function in the `litellm` library, see [this](https://docs.litellm.ai/docs/completion/input) for a full list of the available arguments.
""".strip()

sig = inspect.signature(async_completion)
sig = sig.replace(parameters=[
    Parameter("model", Parameter.POSITIONAL_OR_KEYWORD, annotation=str),
    Parameter("messages", Parameter.POSITIONAL_OR_KEYWORD, annotation=List[Dict[str, str]]),
    *sig.parameters.values()
])
async_completion.__signature__ = sig

In [None]:
response, cache_hit, call_log = await async_completion(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ],
)
response.choices[0].message.content

'The capital of France is Paris.'

In [None]:
#|hide
show_doc(this_module.single)

## single

```python
single(
   prompt: str,
   model: str | None,
   system: str | None,
   *args,
   multi: typing.Union[bool, typing.Dict, NoneType],
   return_full_response: bool,
   return_info: bool,
   **kwargs
)
```

Simplified chat completions designed for single-turn tasks like classification, summarization, or extraction. For a full list of the available arguments see the [documentation](https://docs.litellm.ai/docs/completion/input) for the `completion` function in `litellm`.

If 'return_info' is set to True, the function returns a tuple of the response, cache hit status, and call log. If set to False, it returns only the response.
If 'multi' is provided, it should be a dictionary containing the model and messages from previous turns, allowing for multi-turn interactions. The function will append the new prompt to the existing messages.

---


In [None]:
#|exporti
def _get_msgs(orig_msgs, response):
    if len(response.choices) == 0: return orig_msgs
    msgs = orig_msgs.copy()
    # msgs.append(response.choices[0].message.model_dump())
    msgs.append({ 'role': response.choices[0].message.role, 'content': response.choices[0].message.content })
    return msgs

In [None]:
#|export
def single(
    prompt: str,
    model: str|None = None,
    system: str|None = None,
    *args,
    multi: bool|Dict|None = None,
    return_full_response: bool=False,
    return_info: bool=True,
    **kwargs,
):
    if system is None and multi is None: system = "You are a helpful assistant."
    if system is not None and type(multi) == dict: raise ValueError("Cannot provide `system` if already in multi-turn completion mode.")
    if multi: model = model or multi['model']
    
    if type(multi) == dict:
        messages = multi['messages'].copy() + [ {"role": "user", "content": prompt} ]
    else:
        messages = [
            {"role": "system", "content": system},
            {"role": "user", "content": prompt}
        ]
    
    if model is None: raise ValueError("`model` must be provided.")
    
    response, cache_hit, call_log = completion(model, messages, *args, **kwargs)
    
    res = response.choices[0].message.content if not return_full_response else response
    if multi is not None:
        res = res, {'model' : model, 'messages' : _get_msgs(messages, response)}
    
    return res, cache_hit, call_log if return_info else res
    
single.__name__ = "single"
single.__doc__ = """
Simplified chat completions designed for single-turn tasks like classification, summarization, or extraction. For a full list of the available arguments see the [documentation](https://docs.litellm.ai/docs/completion/input) for the `completion` function in `litellm`.

If 'return_info' is set to True, the function returns a tuple of the response, cache hit status, and call log. If set to False, it returns only the response.
If 'multi' is provided, it should be a dictionary containing the model and messages from previous turns, allowing for multi-turn interactions. The function will append the new prompt to the existing messages.
""".strip()

In [None]:
response, cache_hit, call_log = single(
    model='gpt-4o-mini',
    system='You are a helpful assistant.',
    prompt='What is the capital of France?',
)
response

'The capital of France is Paris.'

In [None]:
class Recipe(BaseModel):
    name: str
    ingredients: List[str]
    steps: List[str]

response, cache_hit, call_log = single(
    model="gpt-4o-mini",
    system="You are a helpful cooking assistant.",
    prompt="Give me a simple recipe for pancakes.",
    response_format=Recipe
)

Recipe.model_validate_json(response)

Recipe(name='Simple Pancakes', ingredients=['1 cup all-purpose flour', '2 tablespoons sugar', '2 teaspoons baking powder', '1/2 teaspoon salt', '1 cup milk', '1 egg', '2 tablespoons melted butter', '1 teaspoon vanilla extract'], steps=['In a large bowl, whisk together the flour, sugar, baking powder, and salt.', 'In a separate bowl, mix the milk, egg, melted butter, and vanilla extract until well combined.', "Pour the wet ingredients into the dry ingredients and stir until just combined. Do not overmix; it's okay if there are a few lumps.", 'Heat a non-stick skillet or griddle over medium heat and grease lightly with butter or oil.', 'Pour 1/4 cup of batter onto the skillet for each pancake. Cook until bubbles form on the surface, about 2-3 minutes.', 'Flip the pancakes and cook for another 2-3 minutes, until golden brown.', 'Remove from skillet and keep warm while cooking the remaining pancakes.'])

Can do multi-turn completions using `get_msgs=True` and passing the messages to the `prev` argument:

In [None]:
(res, _ctx), cache_hit, call_log = single(
    model='gpt-4o-mini',
    system='You are a helpful assistant.',
    prompt='Add 1 and 1',
    multi=True
)
print(res)

(res, _ctx), cache_hit, call_log = single(
    prompt='Multiply that by 10',
    multi=_ctx,
)
print(res)

1 plus 1 equals 2.
2 multiplied by 10 equals 20.


In [None]:
#|hide
show_doc(this_module.async_single)

## async_single *(async)*

```python
async_single(
   prompt: str,
   model: str | None,
   system: str | None,
   *args,
   multi: typing.Union[bool, typing.Dict, NoneType],
   return_full_response: bool,
   return_info: bool,
   **kwargs
)
```

Simplified chat completions designed for single-turn tasks like classification, summarization, or extraction. For a full list of the available arguments see the [documentation](https://docs.litellm.ai/docs/completion/input) for the `completion` function in `litellm`.

If 'return_info' is set to True, the function returns a tuple of the response, cache hit status, and call log. If set to False, it returns only the response.
If 'multi' is provided, it should be a dictionary containing the model and messages from previous turns, allowing for multi-turn interactions. The function will append the new prompt to the existing messages.

---


In [None]:
#|export
async def async_single(
    prompt: str,
    model: str|None = None,
    system: str|None = None,
    *args,
    multi: bool|Dict|None = None,
    return_full_response: bool=False,
    return_info: bool=True,
    **kwargs,
):
    if system is None and multi is None: system = "You are a helpful assistant."
    if system is not None and type(multi) == dict: raise ValueError("Cannot provide `system` if already in multi-turn completion mode.")
    if multi: model = model or multi['model']
    
    if type(multi) == dict:
        messages = multi['messages'].copy() + [ {"role": "user", "content": prompt} ]
    else:
        messages = [
            {"role": "system", "content": system},
            {"role": "user", "content": prompt}
        ]
    
    if model is None: raise ValueError("`model` must be provided.")
    
    response, cache_hit, call_log = await async_completion(model, messages, *args, **kwargs)
    
    res = response.choices[0].message.content if not return_full_response else response
    if multi is not None:
        res = res, {'model' : model, 'messages' : _get_msgs(messages, response)}
    
    return res, cache_hit, call_log if return_info else res
    
async_single.__name__ = "async_single"
async_single.__doc__ = """
Simplified chat completions designed for single-turn tasks like classification, summarization, or extraction. For a full list of the available arguments see the [documentation](https://docs.litellm.ai/docs/completion/input) for the `completion` function in `litellm`.

If 'return_info' is set to True, the function returns a tuple of the response, cache hit status, and call log. If set to False, it returns only the response.
If 'multi' is provided, it should be a dictionary containing the model and messages from previous turns, allowing for multi-turn interactions. The function will append the new prompt to the existing messages.
""".strip()

In [None]:
response, cache_hit, call_log = await async_single(
    model='gpt-4o-mini',
    system='You are a helpful assistant.',
    prompt='What is the capital of France?',
)
response

'The capital of France is Paris.'

You can execute a batch of prompt calls using `adulib.asynchronous.batch_executor`

In [None]:
results = await batch_executor(
    func=async_single,
    constant_kwargs=as_dict(model='gpt-4o-mini', system='You are a helpful assistant.'),
    batch_kwargs=[
        { 'prompt': 'What is the capital of France?' },
        { 'prompt': 'What is the capital of Germany?' },
        { 'prompt': 'What is the capital of Italy?' },
        { 'prompt': 'What is the capital of Spain?' },
        { 'prompt': 'What is the capital of Portugal?' },
    ],
    concurrency_limit=2,
    verbose=False,
)

print("\n".join([response for response, _, _ in results]))

The capital of France is Paris.
The capital of Germany is Berlin.
The capital of Italy is Rome.
The capital of Spain is Madrid.
The capital of Portugal is Lisbon.
