# Costly

Costly adds a `simulate` argument and a cost logger to your function. The idea is this cost logger logs the cost of your function calls if `simulate=False`, and reasonably estimates it if `simulate=True`.

The default behaviour is for LLM API calls, and uses `LLM_Simulator_Faker.simulate_llm_call()` as the simulator and `LLM_API_Estimation.get_cost_real()`, `LLM_API_Estimation.get_cost_simulating()` as the estimator.

**Table of contents**

- [Quick start](#quick-start)
- [Customizations](#customizations)
  - [different simulators and estimators](#different-simulators-and-estimators)
  - [Costlog customizations](#costlog-customizations)
  - [Simulator](#simulator)
  - [Estimator](#estimator)
- [Some assumptions made](#some-assumptions-made)
- [Calculating costs within a function](#calculating-costs-within-a-function)

## Quick start

Just mark the function responsible for your costs with `@costly()` decorator, as follows.

This will only work sensibly out of the box if your function is doing an LLM API call, and takes the arguments `input_string`, `model` (and optionally `response_model` if the response is expected to be a complex object e.g. a Pydantic model) and returns a string (or a `response_model` object if you specified one).


In [1]:
from openai import OpenAI
from costly import Costlog, costly

@costly()
def chatgpt(input_string: str, model: str) -> str:
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": input_string}]
    )
    output_string = response.choices[0].message.content
    return output_string


    
cl = Costlog()
x = chatgpt(input_string="Write the Lorem ipsum text", model = "gpt-4o-mini", cost_log=cl, simulate=False, description=["chatgpt call"])
y = chatgpt(input_string="Write the Lorem ipsum text", model = "gpt-4o-mini", cost_log=cl, simulate=True, description=["chatgpt call"])
print(x)
print(y)
print(cl.totals)
print(cl.items[0])
cl.items[1]




Certainly! Here is the classic "Lorem Ipsum" text:

```
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
```

Let me know if you need more text or any variations!
Quite rather later challenge parent research important. Recent worry thought of nation past.
Knowledge become I together. Analysis series share past here quality. Cell south TV water upon fall.
Institution daughter available into collection family. Government lose notice finish less life.
Professional executive window whatever. Degree garden collection important each. Environment probably reason very rise pass. President make write beli

{'cost_min': 7.5e-07,
 'cost_max': 0.00122955,
 'time_min': 0.0,
 'time_max': 18.432,
 'input_tokens': 5,
 'output_tokens_min': 0,
 'output_tokens_max': 2048,
 'calls': 1,
 'model': 'gpt-4o-mini',
 'simulated': True,
 'input_string': 'Write the Lorem ipsum text',
 'output_string': None,
 'description': ['chatgpt call']}

Here's basically what's happening under the hood, courtesy of `costly.decorator.costly`:


In [2]:
from costly.simulators.llm_simulator_faker import LLM_Simulator_Faker
from costly.estimators.llm_api_estimation import LLM_API_Estimation

def _chatgpt(input_string: str, model: str) -> str:
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": input_string}]
    )
    output_string = response.choices[0].message.content
    return output_string

def chatgpt(input_string: str, model: str, cost_log: Costlog=None, simulate: bool=False, description: list[str]=None) -> str:
    if simulate:
        return LLM_Simulator_Faker.simulate_llm_call(
            input_string=input_string,
            model=model,
            response_model=str,
            cost_log=cost_log,
            description=description,
        )
    if cost_log is not None:
        with cost_log.new_item() as (item, timer):
            output_string = _chatgpt(input_string, model)
            cost_item = LLM_API_Estimation.get_cost_real(
                model=model,
                input_string=input_string,
                output_string=output_string,
                description=description,
                timer=timer(),
            )
            item.update(cost_item)
    else:
        output_string = _chatgpt(input_string, model)
    return output_string


## Customizations

Although the library is designed for LLM API calls, it can be extended to estimating other types of costs with some customization and subclassing.

Customizations you can do:

### different simulators and estimators

The defualt decorator behaviour is

```python
@costly(
    simulator=LLM_Simulator_Faker.simulate_llm_call,
    estimator=LLM_API_Estimation.get_cost_real,
)
```

These functions can be replaced by your own custom functions. For reference, you can see how the default ones are implemented in [`costly.simulators.llm_simulator_faker`](costly/simulators/llm_simulator_faker.py) and [`costly.estimators.llm_api_estimation`](costly/estimators/llm_api_estimation.py).

Also, the simulator and the estimator both have very specific type signatures:

```python
class LLM_Simulator_Faker:

    @staticmethod
    def simulate_llm_call(
        input_string: str,
        model: str = None,
        response_model: type = str,
        cost_log: Costlog = None,
        description: list[str] = None,
    ) -> str | Any:
        ...

class LLM_API_Estimation:

    @staticmethod
    def get_cost_real(
        model: str,
        input_tokens: int = None,
        output_tokens_min: int = None,
        output_tokens_max: int = None,
        input_string: str = None,
        output_string: str = None,
        timer: float = None,
        **kwargs,
    ) -> dict[str, float]:
        ...
```

So e.g. if your function takes different argument names -- say `prompt`, `model_name` and `response_type` -- you can change the decorator to

In [3]:
from costly.simulators.llm_simulator_faker import LLM_Simulator_Faker
from costly.estimators.llm_api_estimation import LLM_API_Estimation

@costly(
    simulator=lambda prompt, model_name, response_type=str, cost_log=None, description=None: LLM_Simulator_Faker.simulate_llm_call(
        input_string=prompt,
        model=model_name,
        response_model=response_type,
        cost_log=cost_log,
        description=description,
    ),
    estimator=lambda model_name, prompt, output_string, description, timer: LLM_API_Estimation.get_cost_real(
        model=model_name,
        input_string=prompt,
        output_string=output_string,
        description=description,
        timer=timer,
    ),
)
def chatgpt2(prompt: str, model_name: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(
        model=model_name, messages=[{"role": "user", "content": prompt}]
    )
    output_string = response.choices[0].message.content
    return output_string


cl = Costlog(totals_keys={"cost_min", "cost_max", "time_min", "time_max", "calls", "input_tokens", "output_tokens_min", "output_tokens_max"})
chatgpt2(prompt="Hello", model_name="gpt-3.5-turbo", simulate=False, cost_log=cl)
cl.items[0]

{'cost_min': 1.4e-05,
 'cost_max': 1.4e-05,
 'time_min': 0.5816699999850243,
 'time_max': 0.5816699999850243,
 'input_tokens': 1,
 'output_tokens': 9,
 'output_tokens_min': 9,
 'output_tokens_max': 9,
 'calls': 1,
 'model': 'gpt-3.5-turbo',
 'simulated': False,
 'input_string': 'Hello',
 'output_string': 'Hello! How can I assist you today?',
 'description': None}

Fortunately, life isn't so hard, and there is a less cumbersome way of doing this in the case of remapping/computing variables:

In [2]:
from costly import Costlog, costly
from costly.simulators.llm_simulator_faker import LLM_Simulator_Faker
from costly.estimators.llm_api_estimation import LLM_API_Estimation


@costly(
    input_string=(lambda kwargs: kwargs["prompt"]),
    model=(lambda kwargs: kwargs["model_name"]),
)
def chatgpt2(prompt: str, model_name: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(
        model=model_name, messages=[{"role": "user", "content": prompt}]
    )
    output_string = response.choices[0].message.content
    return output_string


cl = Costlog(
    totals_keys={
        "cost_min",
        "cost_max",
        "time_min",
        "time_max",
        "calls",
        "input_tokens",
        "output_tokens_min",
        "output_tokens_max",
    }
)
chatgpt2(prompt="Hello", model_name="gpt-3.5-turbo", simulate=False, cost_log=cl)
cl.items[0]

{'cost_min': 1.4e-05,
 'cost_max': 1.4e-05,
 'time_min': 2.4095481999684125,
 'time_max': 2.4095481999684125,
 'input_tokens': 1,
 'output_tokens': 9,
 'output_tokens_min': 9,
 'output_tokens_max': 9,
 'calls': 1,
 'model': 'gpt-3.5-turbo',
 'simulated': False,
 'input_string': 'Hello',
 'output_string': 'Hello! How can I assist you today?',
 'description': None}

Actually for the specific case of just remapping variables you can simply do:

In [11]:
from costly import Costlog, costly
from costly.simulators.llm_simulator_faker import LLM_Simulator_Faker
from costly.estimators.llm_api_estimation import LLM_API_Estimation


@costly(input_string="prompt", model="model_name")
def chatgpt2(prompt: str, model_name: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(
        model=model_name, messages=[{"role": "user", "content": prompt}]
    )
    output_string = response.choices[0].message.content
    return output_string


cl = Costlog(
    totals_keys={
        "cost_min",
        "cost_max",
        "time_min",
        "time_max",
        "calls",
        "input_tokens",
        "output_tokens_min",
        "output_tokens_max",
    }
)
chatgpt2(prompt="Hello", model_name="gpt-3.5-turbo", simulate=True, cost_log=cl)
cl.items[0]

{'cost_min': 5e-07,
 'cost_max': 0.0030725,
 'time_min': 0.0,
 'time_max': 73.728,
 'input_tokens': 1,
 'output_tokens_min': 0,
 'output_tokens_max': 2048,
 'calls': 1,
 'model': 'gpt-3.5-turbo',
 'simulated': True,
 'input_string': 'Hello',
 'output_string': None,
 'description': None}

These hacks will still be supported if you swap `simulator` and `estimator` for something else: just provide a way to calculate whatever parameters your new functions take!

Some more examples:

In [6]:
from costly import Costlog, costly
from costly.simulators.llm_simulator_faker import LLM_Simulator_Faker
from costly.estimators.llm_api_estimation import LLM_API_Estimation


@costly(
    input_string=lambda kwargs: LLM_API_Estimation.messages_to_input_string(
        kwargs["messages"]
    ),
)
def chatgpt_messages(messages: list[dict[str, str]], model: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(model=model, messages=messages)
    output_string = response.choices[0].message.content
    return output_string


cl = Costlog(
    totals_keys={
        "cost_min",
        "cost_max",
        "time_min",
        "time_max",
        "calls",
        "input_tokens",
        "output_tokens_min",
        "output_tokens_max",
    }
)
chatgpt_messages(
    messages=LLM_API_Estimation._input_string_to_messages("Hey"),
    model="gpt-3.5-turbo",
    simulate=False,
    cost_log=cl,
)
cl.items[0]

{'cost_min': 1.4e-05,
 'cost_max': 1.4e-05,
 'time_min': 0.5815050000092015,
 'time_max': 0.5815050000092015,
 'input_tokens': 1,
 'output_tokens': 9,
 'output_tokens_min': 9,
 'output_tokens_max': 9,
 'calls': 1,
 'model': 'gpt-3.5-turbo',
 'simulated': False,
 'input_string': 'Hey',
 'output_string': 'Hello! How can I assist you today?',
 'description': None}

In [10]:
import instructor
from pydantic import BaseModel
from openai import OpenAI
from instructor import Instructor
from costly import costly, Costlog
from costly.estimators.llm_api_estimation import LLM_API_Estimation


@costly(
    input_string=lambda kwargs: LLM_API_Estimation.get_raw_prompt_instructor(**kwargs),
)
def chatgpt_instructor(
    messages: str | list[dict[str, str]],
    model: str,
    client: Instructor,
    response_model: BaseModel,
) -> str:
    if isinstance(messages, str):
        messages = [{"role": "user", "content": messages}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        response_model=response_model,
    )
    return response


class PersonInfo(BaseModel):
    name: str
    age: int


cl = Costlog()
chatgpt_instructor(
    messages="Hey",
    model="gpt-3.5-turbo",
    response_model=PersonInfo,
    client=instructor.from_openai(OpenAI()),
    simulate=True,
    cost_log=cl,
)
cl.items[0]

2024-08-30 03:56:35,564 DEBUG instructor: Patching `client.chat.completions.create` with mode=<Mode.TOOLS: 'tool_call'>
2024-08-30 03:56:35,572 DEBUG instructor: Instructor Request: mode.value='tool_call', response_model=<class '__main__.PersonInfo'>, new_kwargs={'messages': [{'content': 'Hey', 'role': 'user'}], 'model': 'gpt-3.5-turbo', 'tools': [{'type': 'function', 'function': {'name': 'PersonInfo', 'description': 'Correctly extracted `PersonInfo` with all the required parameters with correct types', 'parameters': {'properties': {'name': {'title': 'Name', 'type': 'string'}, 'age': {'title': 'Age', 'type': 'integer'}}, 'required': ['age', 'name'], 'type': 'object'}}}], 'tool_choice': {'type': 'function', 'function': {'name': 'PersonInfo'}}}
2024-08-30 03:56:35,574 DEBUG instructor: max_retries: 3


{'cost_min': 1.7e-05,
 'cost_max': 0.003089,
 'time_min': 0.0,
 'time_max': 73.728,
 'input_tokens': 34,
 'output_tokens_min': 0,
 'output_tokens_max': 2048,
 'calls': 1,
 'model': 'gpt-3.5-turbo',
 'simulated': True,
 'input_string': "HeyPersonInfoCorrectly extracted `PersonInfo` with all the required parameters with correct typesdict_values(['Name', 'string'])dict_values(['Age', 'integer'])",
 'output_string': None,
 'description': None}

### Costlog customizations

The default [`costly.Costlog`](costly/costlog.py) class has two modes: `memory` and `jsonl`. The default is `memory`, but for large projects you may want to use `jsonl`: this dumps the cost log into a `.costly` folder in your working directory.

The other thing that can be customized is the `totals_keys` parameter, which is a set of keys to aggregate costs by. By default it is `{"cost_min", "cost_max", "time_min", "time_max", "calls"}`, i.e. it tracks the range of possible costs and running times (`max` and `min` are usually only different when simulating because then you have to estimate). Out-of-the box you can customize it to also track `input_tokens`, `output_tokens_min`, `output_tokens_max`; any other customizations will only make sense if you are using your own estimator.

### Simulator

[`costly.simulators.llm_simulator_faker`](costly/simulators/llm_simulator_faker.py) has some examples of how to subclass it.

One obvious reason to subclass it is to have custom simulating functions for the types you are interested in. Although the default class "works" for any Pydantic basemodel etc., you might want to have a custom function -- e.g. if a value needs to be within a certain range, or if its distribution is not uniform, or if you want to use examples from your data, etc.

Also, the simulator has a very specific type signature:

```python
class LLM_Simulator_Faker:

    @staticmethod
    def simulate_llm_call(
        input_string: str,
        model: str = None,
        response_model: type = str,
        cost_log: Costlog = None,
        description: list[str] = None,
    ) -> str | Any:
        ...
```

So it would make sense to subclass it to just change this function so you don't have to do that ridiculous lambda thing above and can just use `@costly()`.

### Estimator

Again [`costly.estimators.llm_api_estimation`](costly/estimators/llm_api_estimation.py) has some examples of how to subclass it. The most obvious reason would be to add prices for other models we don't have listed (right now it's just OpenAI and Anthropic). The `PRICES` dict is like this:

```python
class LLM_API_Estimation:

    PRICES = {
        "gpt-4o": {
            "input_tokens": 5.0e-6,
            "output_tokens": 15.0e-6,
            "time": 18e-3,
        },
        "gpt-4o-mini": {
            "input_tokens": 0.15e-6,
            "output_tokens": 0.6e-6,
            "time": 9e-3,
        },
        ...
    }
```

Something like this would be quite natural:

```python
class My_Estimation(LLM_API_Estimation):
    PRICES = LLM_API_Estimation.PRICES | {"my_model": LLM_API_Estimation.PRICES["gpt-4o"]}
```

Note that `LLM_API_Estimation` _can_ handle things like `gpt-4o-2024-05-13` etc. because it `LLM_API_Estimation.get_model()` gets the longest prefix matching model name in `PRICES`. 

### Some assumptions made

`LLM_Simulator_Faker`, when producing text, produces text of about `600 * 4.5` characters.

Generally we assume that 1 token is about 4.5 characters. Though actual token estimation does use `tiktoken` (unless you subclass `LLM_API_Estimation` and set `tokenize=_tokenize_rough`).

Generally we assume, for cost estimation, that output tokens are in the range `[0, 2048]`, and the min and max are computed accordingly. As a rule of thumb for complex projects the true value tends to be about 1/3 the way through, and for projects that receive quite short responses it would be much lower.


All of this can be overriden by subclassing.


## Calculating costs within a function

For more accurate estimation of _real_ (not simulated) costs, we might want to calculate costs within the function, as follows.

In [4]:
from costly import Costlog, costly, CostlyResponse


@costly()
def chatgpt(input_string: str, model: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": input_string},
        ],
    )

    return CostlyResponse(
        output=response.choices[0].message.content,
        cost_info={
            "input_tokens": response.usage.prompt_tokens,
            "output_tokens": response.usage.completion_tokens,
        },
    )


cl = Costlog()
x = chatgpt(
    input_string="Write the Lorem ipsum text",
    model="gpt-4",
    cost_log=cl,
    simulate=True,
    description=["chatgpt call"],
)
y = chatgpt(
    input_string="Write the Lorem ipsum text",
    model="gpt-4",
    cost_log=cl,
    simulate=False,
    description=["chatgpt call"],
)
print(x)
print(y)
print(cl.totals)
print(cl.items[0])
cl.items[1]

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Important listen apply scene school us morning.
Image hold go worry large. Want study enough indeed then against sense. Congress politics become lot close.
Worry lose western southern. Type purpose push though nearly.
Others north goal article. Amount entire game national inside share game.
Certain minute spring support car.
Everything while those assume yeah we much. Really wall write population bar nation it.
Our customer cold smile born price another. Sport next such. Spend administration wind apply.
Husband issue camera serve fight. Three alrea

{'cost_min': 0.00015000000000000001,
 'cost_max': 0.12303,
 'time_min': 0.0,
 'time_max': 73.728,
 'input_tokens': 5,
 'output_tokens_min': 0,
 'output_tokens_max': 2048,
 'calls': 1,
 'model': 'gpt-4',
 'simulated': True,
 'input_string': 'Write the Lorem ipsum text',
 'output_string': None,
 'description': ['chatgpt call']}

In [8]:
import instructor
from pydantic import BaseModel
from instructor import Instructor
from openai import OpenAI
from costly import Costlog, costly, CostlyResponse
from costly.estimators.llm_api_estimation import LLM_API_Estimation


@costly(
    input_string=lambda kwargs: LLM_API_Estimation.get_raw_prompt_instructor(**kwargs),
)
def chatgpt_instructor(
    messages: str | list[dict[str, str]],
    model: str,
    client: Instructor,
    response_model: BaseModel,
) -> str:
    if isinstance(messages, str):
        messages = [{"role": "user", "content": messages}]
    response = client.chat.completions.create_with_completion(
        model=model,
        messages=messages,
        response_model=response_model,
    )
    output_string, cost_info = response
    return CostlyResponse(
        output=output_string,
        cost_info={
            "input_tokens": cost_info.usage.prompt_tokens,
            "output_tokens": cost_info.usage.completion_tokens
        }
    )
    
class PersonInfo(BaseModel):
    name: str
    age: int


cl = Costlog()
chatgpt_instructor(
    messages="Hey",
    model="gpt-3.5-turbo",
    response_model=PersonInfo,
    client=instructor.from_openai(OpenAI()),
    simulate=True,
    cost_log=cl,
)
chatgpt_instructor(
    messages="Hey",
    model="gpt-3.5-turbo",
    response_model=PersonInfo,
    client=instructor.from_openai(OpenAI()),
    simulate=False,
    cost_log=cl,
)
print(cl.items[0])
print(cl.items[1])

2024-08-31 13:33:49,019 DEBUG instructor: Patching `client.chat.completions.create` with mode=<Mode.TOOLS: 'tool_call'>


2024-08-31 13:33:49,026 DEBUG instructor: Instructor Request: mode.value='tool_call', response_model=<class '__main__.PersonInfo'>, new_kwargs={'messages': [{'content': 'Hey', 'role': 'user'}], 'model': 'gpt-3.5-turbo', 'tools': [{'type': 'function', 'function': {'name': 'PersonInfo', 'description': 'Correctly extracted `PersonInfo` with all the required parameters with correct types', 'parameters': {'properties': {'name': {'title': 'Name', 'type': 'string'}, 'age': {'title': 'Age', 'type': 'integer'}}, 'required': ['age', 'name'], 'type': 'object'}}}], 'tool_choice': {'type': 'function', 'function': {'name': 'PersonInfo'}}}
2024-08-31 13:33:49,028 DEBUG instructor: max_retries: 3
2024-08-31 13:33:49,060 DEBUG instructor: Patching `client.chat.completions.create` with mode=<Mode.TOOLS: 'tool_call'>
2024-08-31 13:33:49,075 DEBUG instructor: Instructor Request: mode.value='tool_call', response_model=<class '__main__.PersonInfo'>, new_kwargs={'messages': [{'content': 'Hey', 'role': 'user'

{'cost_min': 1.7e-05, 'cost_max': 0.003089, 'time_min': 0.0, 'time_max': 73.728, 'input_tokens': 34, 'output_tokens_min': 0, 'output_tokens_max': 2048, 'calls': 1, 'model': 'gpt-3.5-turbo', 'simulated': True, 'input_string': "HeyPersonInfoCorrectly extracted `PersonInfo` with all the required parameters with correct typesdict_values(['Name', 'string'])dict_values(['Age', 'integer'])", 'output_string': None, 'description': None}
{'cost_min': 5.099999999999999e-05, 'cost_max': 5.099999999999999e-05, 'time_min': 0.5973313000285998, 'time_max': 0.5973313000285998, 'input_tokens': 75, 'output_tokens': 9, 'output_tokens_min': 9, 'output_tokens_max': 9, 'calls': 1, 'model': 'gpt-3.5-turbo', 'simulated': False, 'input_string': "HeyPersonInfoCorrectly extracted `PersonInfo` with all the required parameters with correct typesdict_values(['Name', 'string'])dict_values(['Age', 'integer'])", 'output_string': PersonInfo(name='Alice', age=30), 'description': None}
