# 🎓 FrugalGPT: Performance and Cost Tradeoffs

This notebook illustrates the FrugalGPT framework for _building LLM Applications with budget constraints._

In particular, we will focus on evaluating the performance and cost tradeoffs enabled by FrugalGPT.

NB: You are highly suggested to use accelerated hardware (GPU/TPU) to run this notebook.

In [1]:
%load_ext autoreload
%autoreload 2
import sys, json, copy
import pandas as pd
import logging
logging.disable(logging.CRITICAL)
sys.path.append("src/")

In [2]:
import os
from IPython.display import display
import FrugalGPT
from tqdm import tqdm

supported_LLM = FrugalGPT.getservicename()
print("supported LLMs:",supported_LLM)
supported_LLM_names = [llm.split("/")[1] for llm in supported_LLM]
print("supported_LLM_names:", supported_LLM_names)

  from .autonotebook import tqdm as notebook_tqdm


supported LLMs: ['google/gemini-1.5-flash-002', 'google/gemini-1.5-pro-002', 'google/gemini-1.0-pro', 'openaichat/gpt-4o-mini', 'openaichat/gpt-4o', 'azure/Phi-3-mini-4k-instruct', 'azure/Phi-3.5-mini-instruct', 'azure/Phi-3-small-8k-instruct', 'azure/Phi-3-medium-4k-instruct', 'deepinfra/llama-3-8B', 'deepinfra/llama-3-70B', 'deepinfra/mixtral-8x7B']
supported_LLM_names: ['gemini-1.5-flash-002', 'gemini-1.5-pro-002', 'gemini-1.0-pro', 'gpt-4o-mini', 'gpt-4o', 'Phi-3-mini-4k-instruct', 'Phi-3.5-mini-instruct', 'Phi-3-small-8k-instruct', 'Phi-3-medium-4k-instruct', 'llama-3-8B', 'llama-3-70B', 'mixtral-8x7B']


## Generating the tradeoffs involves three major steps: (i) prepare the dataset, (ii) train the FrugalGPT strategy, and (iii) evaluate and save the performance.

## Step 1: Prepare the dataset

In [3]:
# dataname = "HEADLINES"
# dataname = "OVERRULING"
dataname = "AGNEWS"

In [4]:
# read from data/{dataname}/Queried_{dataname}_all_models_clean_train.csv and data/{dataname}/Queried_{dataname}_all_models_clean_test.csv
dataset_df = pd.read_csv(f'data/{dataname}/Queried_{dataname}_all_models_clean_train.csv', header=0)
dataset_df.head()

Unnamed: 0,query_raw,query,ref_answer,gpt-4o-mini,gpt-4o,llama-3-8B,llama-3-70B,mixtral-8x7B,gemini-1.5-flash-002,gemini-1.0-pro,gemini-1.5-pro-002,Phi-3.5-mini-instruct,Phi-3-small-8k-instruct,Phi-3-mini-4k-instruct,Phi-3-medium-4k-instruct
0,Q: #39;Breakthrough #39; on hydrogen fuel US ...,"Please answer which category (World, Sports, B...",sci/tech,sci/tech,sci/tech,sci/tech,sci/tech,sci/tech,sci/tech,sci/tech,sci/tech,sci/tech,sci/tech,sci/tech,sci/tech
1,Q: Firefox - Ready To Take On Internet Explore...,"Please answer which category (World, Sports, B...",sci/tech,sci/tech,sci/tech,sci/tech,sci/tech,business,sci/tech,sci/tech,sci/tech,business,sports,sci/tech,sci/tech
2,"Q: Facing a fund gap Lucent Technologies"" popu...","Please answer which category (World, Sports, B...",business,business,business,business,business,business,business,business,business,business,business,business,business
3,Q: PeopleSofts big bash See you next year in L...,"Please answer which category (World, Sports, B...",business,business,business,business,business,business,business,business,business,business,business,business,business
4,"Q: Attackers shoot, burn villagers in east Con...","Please answer which category (World, Sports, B...",world,world,world,world,world,world,world,world,world,world,world,world,world


In [5]:
train_data = []
for index, row in dataset_df.iterrows():
    query = row['query']
    ref_answer = row['ref_answer']
    _id = index
    model_answer = {}
    for model_name in supported_LLM_names:
        model_answer[model_name] = row[model_name]
    train_data.append([query, ref_answer, _id, model_answer])

In [6]:
train_data[3]

['Please answer which category (World, Sports, Business or Sci/Tech) a provided news follows into.\n\nQ: Five-year ban for Blackburn fan One of the two Blackburn Rovers Football Club fans charged with public disorder for racially abusing Dwight Yorke has been handed a five-year ban.\nA: Sports\n\nQ: Major software pirates caught A multimillion-euro software piracy ring has been broken following synchronized raids in Athens and London yesterday, Attica police said.\nA: Sci/Tech\n\nQ: PeopleSofts big bash See you next year in Las Vegas , proclaimed a marquee at the PeopleSoft user conference in San Francisco in late September. It was one of many not-so-subtle attempts by the company to reassure its customers \nA:',
 'business',
 3,
 {'gemini-1.5-flash-002': 'business',
  'gemini-1.5-pro-002': 'business',
  'gemini-1.0-pro': 'business',
  'gpt-4o-mini': 'business',
  'gpt-4o': 'business',
  'Phi-3-mini-4k-instruct': 'business',
  'Phi-3.5-mini-instruct': 'business',
  'Phi-3-small-8k-inst

In [7]:
# get the answer of the model llama-3-8B
train_data[3][3]['llama-3-8B']

'business'

In [8]:
print(len(train_data))

6080


## Step 2: Train the FrugalGPT strategy for different budgets

In [9]:
service_names = ['openaichat/gpt-4o-mini',
                'openaichat/gpt-4o',
                'google/gemini-1.5-flash-002',
                'google/gemini-1.5-pro-002',
                'google/gemini-1.0-pro',
                'azure/Phi-3-mini-4k-instruct',
                'azure/Phi-3.5-mini-instruct',
                'azure/Phi-3-small-8k-instruct',
                'azure/Phi-3-medium-4k-instruct',
                'deepinfra/llama-3-8B',
                'deepinfra/llama-3-70B',
                'deepinfra/mixtral-8x7B',
                ]

### 2-1. Now let us train FrugalGPT on this dataset.

In [10]:
genparams=FrugalGPT.GenerationParameter(max_tokens=50, temperature=0.1, stop=['\n'])

In [11]:
def compute_tradeoffs(
    train_data,
    budget_list,
    name = "HEADLINES", # test
    service_names = ['openaichat/gpt-4o-mini',
                      'openaichat/gpt-4o',
                      'openaichat/gpt-4-turbo',
                      'togetherai/meta-llama/Meta-Llama-3-70B-Instruct-Turbo',
                      'togetherai/google/gemma-2-9b-it',
                    ],
    prefix="",
    skip=0,
    MyCascade = FrugalGPT.LLMCascade(
          score_noise_injection=False,
          db_path="db/SCIQ.sqlite",
          ),
    cascade_depth=3,
    ):

  for idx,budget in tqdm(enumerate(budget_list)):
    # train the model
    user_budget = budget
    # MyCascade.load(loadpath=f"strategy/{name}/",budget=user_budget)

    try:
      MyCascade.load(loadpath=f"strategy/{name}/",budget=user_budget)
      print("Already trained. Skipped.")
      continue
    except:
      print("cannot find, start new training")
    if(idx<skip):
      continue
    if(idx==0):
        result = MyCascade.train(train_data,budget=user_budget,
                                 service_names=service_names,
                                 no_scorer_train=False,
                                 prefix=prefix,
                                 cascade_depth=cascade_depth,
                                 )
    else:
      result = MyCascade.train(train_data,budget=user_budget,
                               service_names=service_names,
                               no_scorer_train=True,
                               prefix=prefix,
                               cascade_depth=cascade_depth,
                               )
    MyCascade.save(savepath=f"strategy/{name}/")
  return

In [12]:
name = f'{dataname}_1015'
budget_list = [0.00005, 0.0001, 0.0005, 0.001, 0.0015] # 

MyCascade= FrugalGPT.LLMCascade(
          score_noise_injection=False,
  db_path=f"db/{dataname}.sqlite",
  batch_build=True,
  )

In [13]:
train_data_sample = train_data[0:] # [0:100]
print(len(train_data_sample))

6080


In [None]:
!jupyter notebook --ZMQChannelsWebsocketConnection.iopub_msg_rate_limit=200000000 --ZMQChannelsWebsocketConnection.rate_limit_window=1000

[32m[I 2024-10-16 12:13:45.857 ServerApp][m jupyter_lsp | extension was successfully linked.
[32m[I 2024-10-16 12:13:45.860 ServerApp][m jupyter_server_terminals | extension was successfully linked.
[32m[I 2024-10-16 12:13:45.863 ServerApp][m jupyterlab | extension was successfully linked.
[32m[I 2024-10-16 12:13:45.866 ServerApp][m notebook | extension was successfully linked.
[32m[I 2024-10-16 12:13:45.996 ServerApp][m notebook_shim | extension was successfully linked.
[32m[I 2024-10-16 12:13:46.018 ServerApp][m notebook_shim | extension was successfully loaded.
[32m[I 2024-10-16 12:13:46.020 ServerApp][m jupyter_lsp | extension was successfully loaded.
[32m[I 2024-10-16 12:13:46.021 ServerApp][m jupyter_server_terminals | extension was successfully loaded.
[32m[I 2024-10-16 12:13:46.022 LabApp][m JupyterLab extension loaded from /home/feiy/anaconda3/envs/FrugalGPT/lib/python3.9/site-packages/jupyterlab
[32m[I 2024-10-16 12:13:46.022 LabApp][m JupyterLab applicati

In [14]:
compute_tradeoffs(train_data=train_data_sample,
                  budget_list=budget_list,
                  name=name,
                  service_names=service_names,
                #   prefix=prefix,
                  skip=0, # you can manually skip the first few budgets if they have already been trained.
                  MyCascade=MyCascade,
                  cascade_depth=3,
                  )

0it [00:00, ?it/s]

cannot find, start new training
train and test size 4864 1216


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3823,0.330703,0.852518
2,0.3274,0.266103,0.909044
3,0.1839,0.345105,0.897225
4,0.1669,0.369633,0.899281
5,0.0591,0.469274,0.898767
6,0.0658,0.502181,0.903392


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3579,0.325005,0.877698
2,0.1957,0.319202,0.893114
3,0.3182,0.274117,0.901336
4,0.1481,0.325849,0.903392
5,0.1452,0.374215,0.89517
6,0.0111,0.446806,0.896711


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3695,0.352346,0.84224
2,0.2151,0.318306,0.889517
3,0.3835,0.287025,0.8926
4,0.1194,0.380847,0.890545
5,0.1182,0.424413,0.89517
6,0.0303,0.486425,0.89517


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3321,0.317176,0.892086
2,0.2194,0.295054,0.886434
3,0.2887,0.285478,0.904419
4,0.1182,0.367274,0.897225
5,0.079,0.474284,0.89517
6,0.0023,0.525296,0.89517


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3529,0.328284,0.873587
2,0.2457,0.30479,0.884892
3,0.3428,0.334274,0.896197
4,0.1292,0.36949,0.901336
5,0.0495,0.431613,0.901336
6,0.082,0.463566,0.905447


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3555,0.3587,0.857657
2,0.2806,0.344022,0.859712
3,0.2828,0.452922,0.856115
4,0.1545,0.353648,0.903905
5,0.06,0.47201,0.897739
6,0.0225,0.483955,0.904933


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3284,0.371529,0.861254
2,0.2465,0.343479,0.867934


IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



## Step 3: Evaluate and save the performance

In [None]:
# read from data/{dataname}/Queried_{dataname}_all_models_clean_train.csv and data/{dataname}/Queried_{dataname}_all_models_clean_test.csv
dataset_df_test = pd.read_csv(f'data/{dataname}/Queried_{dataname}_all_models_clean_test.csv', header=0)
dataset_df_test.head()

In [None]:
test_data = []
for index, row in dataset_df_test.iterrows():
    query = row['query']
    ref_answer = row['ref_answer']
    _id = index
    model_answer = {}
    for model_name in supported_LLM_names:
        model_answer[model_name] = row[model_name]
    test_data.append([query, ref_answer, _id, model_answer])

In [None]:
test_data[3]

In [None]:
# get the answer of the model llama-3-8B
test_data[3][3]['llama-3-8B']

In [None]:
print(len(test_data))

In [None]:
def generate_dataframe_from_cascade(MyCascade,budget_list, train_data, test_data, genparams,name):
    # Initialize an empty list to store the rows for the DataFrame
    data = []

    # Iterate through the budget list
    for budget in tqdm(budget_list):
        # Load the strategy for the given budget
        MyCascade.load(loadpath=f"strategy/{name}/", budget=budget)
        print("loaded from path:",f"strategy/{name}/")
        print("now the budget is:",budget)

        # Get the completion batch for train data
        print("start train data")
        train_result = MyCascade.get_completion_batch(queries=train_data, genparams=genparams)
        print("train_result:",train_result)
        # Compute the ACC and cost for train data
        train_acc_cost = FrugalGPT.compute_score(train_result)

        # Get the completion batch for test data
        test_result = MyCascade.get_completion_batch(queries=test_data, genparams=genparams)

        # Compute the ACC and cost for test data
        test_acc_cost = FrugalGPT.compute_score(test_result)

        # Create a row with the schema
        row = {
            "Test_acc": test_acc_cost['em'],
            "Test_cost": test_acc_cost['cost'],
            "Test_size": len(test_data),
            "Train_acc": train_acc_cost['em'],
            "Train_cost": train_acc_cost['cost'],
            "Train_size": len(train_data),
            "Budget": budget,
            "Method": "FrugalGPT",
            "Provider": "FrugalGPT",
            "Marker": 1,  # Marker is always 1 for this function
        }

        # Append the row to the data list
        data.append(row)
        display(row)

    # Create the DataFrame from the data list
    df = pd.DataFrame(data)

    return df

In [None]:
MyCascade_eval = FrugalGPT.LLMCascade()
frugalgpt_df = generate_dataframe_from_cascade(MyCascade_eval,
                                               budget_list, train_data, test_data, genparams,
                                               name)
display(frugalgpt_df)
frugalgpt_df.to_csv(f"summary/summary_{dataname}_e8_frugalgpt_2024.csv")

Now let us save the results to local disk.

In [None]:
display(frugalgpt_df)