# LLM Cost Breakdown

- LLM configuration

| Parameter    | Value |
|--------------|-------|
| top_k        | 40    |
| top_p        | 0.8   |
| temperature  | 0.2   |


- Usage

| Metric | Value |
|--------|--------|
| Total input tokens | 628,047 |
| Total output tokens | 26,152 |
| Average predict time | 1.53 seconds |


- error/success rate

| Status | Rate (%) |
|--------|--------|
| Success | 100.00% |
| Error | 0.00% |


- Total usage cost: **$0.025379**

---

In [2]:
import requests
import os
import pandas as pd
from dotenv import load_dotenv
load_dotenv() 

True

In [None]:
url = "https://api.replicate.com/v1/predictions"
headers = {
    "Authorization": f"Bearer {os.getenv('REPLICATE_API_TOKEN')}"
}

all_predictions = []

while url:
    response = requests.get(url, headers=headers)
    data = response.json()
    all_predictions.extend(data.get("results", []))
    url = data.get("next")

In [32]:
df_predictions = pd.DataFrame(all_predictions)
df_predictions.head()

Unnamed: 0,completed_at,created_at,data_removed,error,id,input,metrics,output,started_at,source,status,urls,model,version
0,2025-08-01T06:03:14.610252Z,2025-08-01T06:03:13.589000Z,False,,ypvsexsq6nrma0crcgx8b6z484,"{'top_k': 40, 'top_p': 0.8, 'prompt': '  ...","{'input_token_count': 597, 'tokens_per_second'...","[{, \n , "", Lo, yal, Custom, e...",2025-08-01T06:03:13.594775Z,api,succeeded,{'stream': 'https://stream-b.svc.ric1.c.replic...,ibm-granite/granite-3.3-8b-instruct,hidden
1,2025-08-01T05:52:37.510177Z,2025-08-01T05:52:34.817000Z,False,,6xqnzpvr05rm80crcgra8481d4,"{'top_k': 40, 'top_p': 0.8, 'prompt': '  ...","{'input_token_count': 842, 'tokens_per_second'...","[{, \n , "", Lo, yal, Custom, e...",2025-08-01T05:52:34.824013Z,api,succeeded,{'stream': 'https://stream-b.svc.ric1.c.replic...,ibm-granite/granite-3.3-8b-instruct,hidden
2,2025-08-01T05:48:36.884733Z,2025-08-01T05:48:33.167000Z,False,,np133qp81xrmc0crcgpazxt2g0,"{'top_k': 40, 'top_p': 0.8, 'prompt': '  ...","{'input_token_count': 862, 'tokens_per_second'...","[{, \n , "", Lo, yal, Custom, e...",2025-08-01T05:48:33.172999Z,api,succeeded,{'stream': 'https://stream-b.svc.ric1.c.replic...,ibm-granite/granite-3.3-8b-instruct,hidden
3,2025-08-01T05:47:36.379092Z,2025-08-01T05:47:32.927000Z,False,,p8vm22ywqxrm80crcgnvk25fp8,"{'top_k': 40, 'top_p': 0.8, 'prompt': '  ...","{'input_token_count': 861, 'tokens_per_second'...","[{, \n , "", Lo, yal, Custom, e...",2025-08-01T05:47:32.933545Z,api,succeeded,{'stream': 'https://stream-b.svc.ric1.c.replic...,ibm-granite/granite-3.3-8b-instruct,hidden
4,2025-08-01T05:44:39.137712Z,2025-08-01T05:44:35.653000Z,False,,yy91hs988nrma0crcgmsgzc04r,"{'top_k': 40, 'top_p': 0.8, 'prompt': '  ...","{'input_token_count': 874, 'tokens_per_second'...","[{, \n , "", Lo, yal, Custom, e...",2025-08-01T05:44:35.659215Z,api,succeeded,{'stream': 'https://stream-b.svc.ric1.c.replic...,ibm-granite/granite-3.3-8b-instruct,hidden


In [25]:
error_rate = (df_predictions['error'].sum() / len(df_predictions)) * 100
print(f"Error rate: {error_rate}%")

Error rate: 0.0%


In [24]:
success_rate = (len(df_predictions[df_predictions['status']=='succeeded']) / len(df_predictions)) * 100
print(f"Success rate: {success_rate}%")

Success rate: 100.0%


In [29]:
df_predictions["metrics"] = df_predictions["metrics"].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)

total_input_tokens = df_predictions["metrics"].apply(lambda m: m.get("input_token_count", 0)).sum()
total_output_tokens = df_predictions["metrics"].apply(lambda m: m.get("output_token_count", 0)).sum()
avg_predict_time = df_predictions["metrics"].apply(lambda m: m.get("predict_time", 0)).mean()

print("Total input tokens:", total_input_tokens)
print("Total output tokens:", total_output_tokens)
print("Average predict time:", avg_predict_time, "seconds")

Total input tokens: 628047
Total output tokens: 26152
Average predict time: 1.5275120007172416 seconds


In [31]:
one_million = 1000000
output_token_price = 0.25/one_million
input_token_price = 0.03/one_million

input_cost = total_input_tokens * input_token_price
output_cost = total_output_tokens * output_token_price
total_cost = input_cost + output_cost

print(f"Usage cost: ${total_cost:.6f}")

Usage cost: $0.025379
