In [1]:
import os
os.environ['OPENAI_API_KEY'] = "EXAMPLES"

# 채팅 모델
##### 채팅 모델은 채팅 메시지를 입력으로 하고 체팅 메시지를 출력으로 반환하는 언어 모델이다.

## QuickStart
##### 채팅 모델은 내부적으로 언어 모델을 사용하지만 사용하는 인터페이스가 약간 다르다.
##### 텍스트 입력/출력 API를 사용하는 대신 채팅 메시지가 입력 및 출력인 인터페이스를 사용한다.

In [None]:
!pip install -qU langchain-openai

In [3]:
from langchain_openai import ChatOpenAI

chat = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [4]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is the purpose of model regularization?"),
]

In [5]:
# invoke 출력
chat.invoke(messages)

AIMessage(content="The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model learns the training data too well, to the point where it performs poorly on new, unseen data. Regularization techniques introduce a penalty term to the model's loss function, discouraging overly complex models that may fit the training data too closely. By adding this penalty, regularization helps to generalize the model and improve its performance on unseen data, leading to better overall predictive accuracy.", response_metadata={'token_usage': {'completion_tokens': 97, 'prompt_tokens': 24, 'total_tokens': 121}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_3bc1b5746c', 'finish_reason': 'stop', 'logprobs': None})

In [6]:
# 스트리밍 출력
for chunk in chat.stream(messages):
    print(chunk.content, end="", flush=True)

Model regularization is a technique used in machine learning to prevent overfitting, which occurs when a model learns the training data too well and performs poorly on new, unseen data. Regularization adds a penalty term to the model's loss function, discouraging the model from fitting the noise in the training data and instead focusing on the more prominent patterns.

The purpose of model regularization is to improve the generalization ability of the model, ensuring that it performs well on unseen data by reducing the complexity of the model and preventing it from memorizing the training data. This helps to strike a balance between bias and variance, leading to better performance on new data and improving the model's ability to make accurate predictions.

In [7]:
# 배치 출력
chat.batch([messages])

[AIMessage(content="Model regularization is a technique used in machine learning to prevent overfitting. Overfitting occurs when a model learns the training data too well, capturing noise and random fluctuations that are not representative of the underlying patterns in the data. Regularization helps to prevent overfitting by adding a penalty term to the model's loss function that discourages overly complex models.\n\nThe purpose of model regularization is to find a balance between fitting the training data well and generalizing to new, unseen data. By penalizing complex models, regularization encourages the model to prioritize simpler explanations that are more likely to generalize well to new data. This can lead to better performance on unseen data and improve the model's ability to make accurate predictions.", response_metadata={'token_usage': {'completion_tokens': 142, 'prompt_tokens': 24, 'total_tokens': 166}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_3bc1b5746c

In [8]:
# 비동기 invoke 출력
await chat.ainvoke(messages)

AIMessage(content="Model regularization is a technique used in machine learning to prevent overfitting. Overfitting occurs when a model learns the details and noise in the training data to the extent that it negatively impacts the model's performance on new, unseen data. Regularization helps to address this issue by adding a penalty term to the model's loss function, which discourages the model from becoming too complex and helps it generalize better to new data.\n\nThere are different types of regularization techniques, such as L1 regularization (Lasso), L2 regularization (Ridge), and elastic net regularization, each with its own way of penalizing the model's complexity. By incorporating regularization into the model training process, machine learning practitioners can improve the model's performance on unseen data and make it more robust and reliable.", response_metadata={'token_usage': {'completion_tokens': 156, 'prompt_tokens': 24, 'total_tokens': 180}, 'model_name': 'gpt-3.5-turbo

In [9]:
# 비동기 스트리밍 출력
async for chunk in chat.astream(messages):
    print(chunk.content, end="", flush=True)

The purpose of model regularization is to prevent overfitting in machine learning models. Overfitting occurs when a model learns the training data too well, including its noise and random fluctuations, which can lead to poor performance on unseen data. Regularization techniques add a penalty term to the model's loss function, discouraging overly complex models that may be prone to overfitting. By adding this penalty, regularization helps to find a balance between fitting the training data well and generalizing to new, unseen data. Some common regularization techniques include L1 regularization (Lasso), L2 regularization (Ridge), and dropout in neural networks.

In [None]:
# 비동기 스트리밍 로그 출력
async for chunk in chat.astream_log(messages):
    print(chunk)

### 랭스미스

```python
export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY=<your-api-key>
```

## 함수 호출

In [11]:
!pip install -qU langchain-core langchain-openai

### 바인딩
##### 다양한 함수형 객체의 형식을 지정하고 모델에 바인딩하는 도우미 메서드를 구현한다.
##### Pydantic 함수 스키마를 사용하여 이를 호출하는 다양한 모델을 얻는 방법을 살펴보자

In [12]:
from langchain_core.pydantic_v1 import BaseModel, Field

class Multiply(BaseModel):
    """Multiply two integers together."""

    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")

In [13]:
!pip install -qU langchain-openai

In [14]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [15]:
# Multply 클래스를 모델에 바인딩
llm_with_tools = llm.bind_tools([Multiply])
llm_with_tools.invoke("what's 3 * 12")

AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_u81W8Ei0yTp4uMA8OP3xE3a5', 'function': {'arguments': '{"a":3,"b":12}', 'name': 'Multiply'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 62, 'total_tokens': 80}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_b28b39ffa8', 'finish_reason': 'tool_calls', 'logprobs': None})

In [16]:
from langchain_core.output_parsers.openai_tools import JsonOutputToolsParser

tool_chain = llm_with_tools | JsonOutputToolsParser()
tool_chain.invoke("what's 3 * 12")

[{'type': 'Multiply', 'args': {'a': 3, 'b': 12}}]

In [17]:
from langchain_core.output_parsers.openai_tools import PydanticToolsParser

tool_chain = llm_with_tools | PydanticToolsParser(tools=[Multiply])
tool_chain.invoke("what's 3 * 12")

[Multiply(a=3, b=12)]

##### 모델이 도구를 사용하지 않는 경우 tool_choice="any"
##### 도구를 강제로 사용하고 한 번만 사용하도록 하려면 tool_choice="도구 이름" 설정한다.

### 함수 스키마
#### 파이썬

In [18]:
import json

from langchain_core.utils.function_calling import convert_to_openai_tool


def multiply(a: int, b: int) -> int:
    """Multiply two integers together.

    Args:
        a: First integer
        b: Second integer
    """
    return a * b


print(json.dumps(convert_to_openai_tool(multiply), indent=2))

{
  "type": "function",
  "function": {
    "name": "multiply",
    "description": "Multiply two integers together.",
    "parameters": {
      "type": "object",
      "properties": {
        "a": {
          "type": "integer",
          "description": "First integer"
        },
        "b": {
          "type": "integer",
          "description": "Second integer"
        }
      },
      "required": [
        "a",
        "b"
      ]
    }
  }
}


#### Pydantic

In [19]:
from langchain_core.pydantic_v1 import BaseModel, Field


class multiply(BaseModel):
    """Multiply two integers together."""

    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")


print(json.dumps(convert_to_openai_tool(multiply), indent=2))

{
  "type": "function",
  "function": {
    "name": "multiply",
    "description": "Multiply two integers together.",
    "parameters": {
      "type": "object",
      "properties": {
        "a": {
          "description": "First integer",
          "type": "integer"
        },
        "b": {
          "description": "Second integer",
          "type": "integer"
        }
      },
      "required": [
        "a",
        "b"
      ]
    }
  }
}


#### LangChain

In [20]:
from typing import Any, Type

from langchain_core.tools import BaseTool


class MultiplySchema(BaseModel):
    """Multiply tool schema."""

    a: int = Field(..., description="First integer")
    b: int = Field(..., description="Second integer")


class Multiply(BaseTool):
    args_schema: Type[BaseModel] = MultiplySchema
    name: str = "multiply"
    description: str = "Multiply two integers together."

    def _run(self, a: int, b: int, **kwargs: Any) -> Any:
        return a * b


# Note: we're passing in a Multiply object not the class itself.
print(json.dumps(convert_to_openai_tool(Multiply()), indent=2))

{
  "type": "function",
  "function": {
    "name": "multiply",
    "description": "Multiply two integers together.",
    "parameters": {
      "type": "object",
      "properties": {
        "a": {
          "description": "First integer",
          "type": "integer"
        },
        "b": {
          "description": "Second integer",
          "type": "integer"
        }
      },
      "required": [
        "a",
        "b"
      ]
    }
  }
}


## 캐싱
##### 채팅 모델을 위한 선택적 캐싱 레이어를 제공한다.



1.   동일한 완료를 여러 번 요청하는 경우가 많을 경우 LLM 제공업체에 대한 API 호출 수를 줄여 비용 절약
2.   LLM 제공업체에 대한 API 호출 수를 줄여 애플리케이션 속도 증가



In [None]:
!pip install langchain

In [21]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [24]:
from langchain.globals import set_llm_cache

### 메모리 캐싱

In [25]:
%%time
from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")

  warn_deprecated(


CPU times: user 288 ms, sys: 35.7 ms, total: 323 ms
Wall time: 1.02 s


"Why don't scientists trust atoms?\n\nBecause they make up everything!"

In [26]:
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")

CPU times: user 3.35 ms, sys: 0 ns, total: 3.35 ms
Wall time: 3.21 ms


"Why don't scientists trust atoms?\n\nBecause they make up everything!"

### SQLite 캐싱

In [27]:
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))

In [28]:
%%time
# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")

CPU times: user 40.2 ms, sys: 1.78 ms, total: 42 ms
Wall time: 656 ms


"Why don't scientists trust atoms?\n\nBecause they make up everything!"

In [29]:
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")

CPU times: user 6.52 ms, sys: 0 ns, total: 6.52 ms
Wall time: 7.09 ms


"Why don't scientists trust atoms?\n\nBecause they make up everything!"

## 사용자 정의 채팅 모델
##### Langchain 추상화를 사용하여 사용자 정의 채팅 모델을 만들어보자

### 입력 및 출력
#### 메시지


*   SystemMessage : AI 동작을 준비하는 데 사용되며 일반적으로 일련의 입력 메시지 중 첫 번째로 전달
*   HumanMessage : 채팅 모델과 상호작용하는 사람의 메시지
*   AIMessage : 채팅 모델의 메시지를 나타냅니다. 이는 텍스트일 수도 있고 도구 호출 요청일 수도 있다.
*   FunctionMessage/ToolMessage : 도구 호출 결과를 모델에 다시 전달하기 위한 메시지



In [30]:
from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    FunctionMessage,
    HumanMessage,
    SystemMessage,
    ToolMessage,
)

### 스트리밍
##### 모든 채팅 메시지에 Chunk가 포함되어 있다.

In [31]:
from langchain_core.messages import (
    AIMessageChunk,
    FunctionMessageChunk,
    HumanMessageChunk,
    SystemMessageChunk,
    ToolMessageChunk,
)

In [32]:
AIMessageChunk(content="Hello") + AIMessageChunk(content=" World!")

AIMessageChunk(content='Hello World!')

### 사용자 정의 채팅 모델 - 기본 채팅 모델 생성

In [33]:
from typing import Any, AsyncIterator, Dict, Iterator, List, Optional

from langchain_core.callbacks import (
    AsyncCallbackManagerForLLMRun,
    CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseChatModel, SimpleChatModel
from langchain_core.messages import AIMessageChunk, BaseMessage, HumanMessage
from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
from langchain_core.runnables import run_in_executor


class CustomChatModelAdvanced(BaseChatModel):
    """A custom chat model that echoes the first `n` characters of the input.

    When contributing an implementation to LangChain, carefully document
    the model including the initialization parameters, include
    an example of how to initialize the model and include any relevant
    links to the underlying models documentation or API.

    Example:

        .. code-block:: python

            model = CustomChatModel(n=2)
            result = model.invoke([HumanMessage(content="hello")])
            result = model.batch([[HumanMessage(content="hello")],
                                 [HumanMessage(content="world")]])
    """

    n: int
    """The number of characters from the last message of the prompt to be echoed."""

    def _generate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        """Override the _generate method to implement the chat model logic.

        This can be a call to an API, a call to a local model, or any other
        implementation that generates a response to the input prompt.

        Args:
            messages: the prompt composed of a list of messages.
            stop: a list of strings on which the model should stop generating.
                  If generation stops due to a stop token, the stop token itself
                  SHOULD BE INCLUDED as part of the output. This is not enforced
                  across models right now, but it's a good practice to follow since
                  it makes it much easier to parse the output of the model
                  downstream and understand why generation stopped.
            run_manager: A run manager with callbacks for the LLM.
        """
        last_message = messages[-1]
        tokens = last_message.content[: self.n]
        message = AIMessage(content=tokens)
        generation = ChatGeneration(message=message)
        return ChatResult(generations=[generation])

    def _stream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[ChatGenerationChunk]:
        """Stream the output of the model.

        This method should be implemented if the model can generate output
        in a streaming fashion. If the model does not support streaming,
        do not implement it. In that case streaming requests will be automatically
        handled by the _generate method.

        Args:
            messages: the prompt composed of a list of messages.
            stop: a list of strings on which the model should stop generating.
                  If generation stops due to a stop token, the stop token itself
                  SHOULD BE INCLUDED as part of the output. This is not enforced
                  across models right now, but it's a good practice to follow since
                  it makes it much easier to parse the output of the model
                  downstream and understand why generation stopped.
            run_manager: A run manager with callbacks for the LLM.
        """
        last_message = messages[-1]
        tokens = last_message.content[: self.n]

        for token in tokens:
            chunk = ChatGenerationChunk(message=AIMessageChunk(content=token))

            if run_manager:
                run_manager.on_llm_new_token(token, chunk=chunk)

            yield chunk

    async def _astream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> AsyncIterator[ChatGenerationChunk]:
        """An async variant of astream.

        If not provided, the default behavior is to delegate to the _generate method.

        The implementation below instead will delegate to `_stream` and will
        kick it off in a separate thread.

        If you're able to natively support async, then by all means do so!
        """
        result = await run_in_executor(
            None,
            self._stream,
            messages,
            stop=stop,
            run_manager=run_manager.get_sync() if run_manager else None,
            **kwargs,
        )
        for chunk in result:
            yield chunk

    @property
    def _llm_type(self) -> str:
        """Get the type of language model used by this chat model."""
        return "echoing-chat-model-advanced"

    @property
    def _identifying_params(self) -> Dict[str, Any]:
        """Return a dictionary of identifying parameters."""
        return {"n": self.n}

In [34]:
model = CustomChatModelAdvanced(n=3)

In [35]:
model.invoke(
    [
        HumanMessage(content="hello!"),
        AIMessage(content="Hi there human!"),
        HumanMessage(content="Meow!"),
    ]
)

AIMessage(content='Meo')

In [36]:
model.invoke("hello")

AIMessage(content='hel')

In [37]:
model.batch(["hello", "goodbye"])

[AIMessage(content='hel'), AIMessage(content='goo')]

In [38]:
for chunk in model.stream("cat"):
    print(chunk.content, end="|")

c|a|t|

In [39]:
async for chunk in model.astream("cat"):
    print(chunk.content, end="|")

c|a|t|

In [40]:
async for event in model.astream_events("cat", version="v1"):
    print(event)

{'event': 'on_chat_model_start', 'run_id': '59fb09a6-28be-41dc-b927-8f806d82a16b', 'name': 'CustomChatModelAdvanced', 'tags': [], 'metadata': {}, 'data': {'input': 'cat'}}
{'event': 'on_chat_model_stream', 'run_id': '59fb09a6-28be-41dc-b927-8f806d82a16b', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='c')}}
{'event': 'on_chat_model_stream', 'run_id': '59fb09a6-28be-41dc-b927-8f806d82a16b', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='a')}}
{'event': 'on_chat_model_stream', 'run_id': '59fb09a6-28be-41dc-b927-8f806d82a16b', 'tags': [], 'metadata': {}, 'name': 'CustomChatModelAdvanced', 'data': {'chunk': AIMessageChunk(content='t')}}
{'event': 'on_chat_model_end', 'name': 'CustomChatModelAdvanced', 'run_id': '59fb09a6-28be-41dc-b927-8f806d82a16b', 'tags': [], 'metadata': {}, 'data': {'output': AIMessageChunk(content='cat')}}


  warn_beta(


## 로그 확률 반환

In [41]:
from langchain_openai import ChatOpenAI

# logprobs = True 매개변수를 구성
llm = ChatOpenAI(model="gpt-3.5-turbo-0125").bind(logprobs=True)

msg = llm.invoke(("human", "how are you today"))

In [44]:
msg

AIMessage(content="I'm just a computer program, so I don't have feelings or emotions like humans do. But I'm here and ready to help you with any questions or tasks you have! How can I assist you today?", response_metadata={'token_usage': {'completion_tokens': 43, 'prompt_tokens': 16, 'total_tokens': 59}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_3bc1b5746c', 'finish_reason': 'stop', 'logprobs': {'content': [{'token': 'I', 'bytes': [73], 'logprob': -0.2322815, 'top_logprobs': []}, {'token': "'m", 'bytes': [39, 109], 'logprob': -0.38429767, 'top_logprobs': []}, {'token': ' just', 'bytes': [32, 106, 117, 115, 116], 'logprob': -0.20297308, 'top_logprobs': []}, {'token': ' a', 'bytes': [32, 97], 'logprob': -0.002020236, 'top_logprobs': []}, {'token': ' computer', 'bytes': [32, 99, 111, 109, 112, 117, 116, 101, 114], 'logprob': -0.053281885, 'top_logprobs': []}, {'token': ' program', 'bytes': [32, 112, 114, 111, 103, 114, 97, 109], 'logprob': -5.371606e-05, 'top_logprobs'

In [42]:
msg.response_metadata["logprobs"]["content"][:5]

[{'token': 'I', 'bytes': [73], 'logprob': -0.2322815, 'top_logprobs': []},
 {'token': "'m",
  'bytes': [39, 109],
  'logprob': -0.38429767,
  'top_logprobs': []},
 {'token': ' just',
  'bytes': [32, 106, 117, 115, 116],
  'logprob': -0.20297308,
  'top_logprobs': []},
 {'token': ' a',
  'bytes': [32, 97],
  'logprob': -0.002020236,
  'top_logprobs': []},
 {'token': ' computer',
  'bytes': [32, 99, 111, 109, 112, 117, 116, 101, 114],
  'logprob': -0.053281885,
  'top_logprobs': []}]

In [43]:
ct = 0
full = None
for chunk in llm.stream(("human", "how are you today")):
    if ct < 5:
        full = chunk if full is None else full + chunk
        if "logprobs" in full.response_metadata:
            print(full.response_metadata["logprobs"]["content"])
    else:
        break
    ct += 1

[]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.76512, 'top_logprobs': []}]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.76512, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.032747388, 'top_logprobs': []}]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.76512, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.032747388, 'top_logprobs': []}, {'token': ' AI', 'bytes': [32, 65, 73], 'logprob': -0.009283651, 'top_logprobs': []}]
[{'token': 'As', 'bytes': [65, 115], 'logprob': -1.76512, 'top_logprobs': []}, {'token': ' an', 'bytes': [32, 97, 110], 'logprob': -0.032747388, 'top_logprobs': []}, {'token': ' AI', 'bytes': [32, 65, 73], 'logprob': -0.009283651, 'top_logprobs': []}, {'token': ',', 'bytes': [44], 'logprob': -0.054568645, 'top_logprobs': []}]


## 스트리밍
##### 모든 ChatModel은 모든 메소드의 기본 구현과 함께 제공되는 Runnable 인터페이스를 구현

In [45]:
for chunk in llm.stream("Write me a song about goldfish on the moon"):
    print(chunk.content, end="", flush=True)

Verse 1:
In a world where fish can fly
And the moon shines in the sky
There's a place where goldfish roam
In a universe of their own

Chorus:
Goldfish on the moon
Swimming in the silver lagoon
A magical sight to see
In a place where dreams run free

Verse 2:
They glide through the starry night
With their scales shimmering bright
Dancing in the cosmic sea
In a world of possibility

Chorus:
Goldfish on the moon
Swimming in the silver lagoon
A magical sight to see
In a place where dreams run free

Bridge:
They leap and twirl in zero gravity
In a celestial symphony
Their fins like wings, they soar and play
In a moonlit ballet

Chorus:
Goldfish on the moon
Swimming in the silver lagoon
A magical sight to see
In a place where dreams run free

Outro:
So if you ever feel alone
Just look up at the moon
And know that somewhere out there
There's goldfish swimming in the silver lagoon.

## 토큰 사용량 추적

In [46]:
from langchain.callbacks import get_openai_callback
from langchain_openai import ChatOpenAI

In [47]:
# get_openai_callback를 사용해 토큰 사용량 추적
with get_openai_callback() as cb:
    result = llm.invoke("Tell me a joke")
    print(cb)

Tokens Used: 28
	Prompt Tokens: 11
	Completion Tokens: 17
Successful Requests: 1
Total Cost (USD): $3.1e-05


##### 여러 단계가 포함된 체인이나 에이전트를 사용하는 경우 해당 단계를 모두 추적한다.

In [None]:
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain_openai import OpenAI

tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)

In [None]:
with get_openai_callback() as cb:
    response = agent.run(
        "Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"
    )
    print(f"Total Tokens: {cb.total_tokens}")
    print(f"Prompt Tokens: {cb.prompt_tokens}")
    print(f"Completion Tokens: {cb.completion_tokens}")
    print(f"Total Cost (USD): ${cb.total_cost}")