# 🎓 FrugalGPT: Performance and Cost Tradeoffs

This notebook illustrates the FrugalGPT framework for _building LLM Applications with budget constraints._

In particular, we will focus on evaluating the performance and cost tradeoffs enabled by FrugalGPT.

NB: You are highly suggested to use accelerated hardware (GPU/TPU) to run this notebook.

## Installation
Let us start by installing FrugalGPT (if you haven't yet!).

In [1]:
# set up the environment
%%capture
! git clone https://github.com/stanford-futuredata/FrugalGPT
! pip install git+https://github.com/stanford-futuredata/FrugalGPT
!mkdir -p FrugalGPT/strategy
! wget  https://github.com/lchen001/DataHolder/releases/download/v0.0.3/SCIQ_Model20241128.zip
! unzip SCIQ_Model20241128.zip -d FrugalGPT/strategy/SCIQ_Model20241128
! rm SCIQ_Model20241128.zip
! wget https://github.com/lchen001/DataHolder/releases/download/v0.0.3/SCIQ.sqlite.zip
! unzip SCIQ.sqlite.zip -d FrugalGPT/db
!mkdir -p FrugalGPT/data/SCIQ
!wget -P FrugalGPT/data/SCIQ https://github.com/stanford-futuredata/FrugalGPT/releases/download/0.0.1/SCIQ_train.csv
!wget -P FrugalGPT/data/SCIQ https://github.com/stanford-futuredata/FrugalGPT/releases/download/0.0.1/SCIQ_test.csv

In [2]:
%cd FrugalGPT

/content/FrugalGPT


In [3]:
%load_ext autoreload
%autoreload 2
import sys, json, copy
import logging
logging.disable(logging.CRITICAL)
sys.path.append("src/")

## Setup
Next, let us set up the environment and API keys. You do _not_ need API keys to run the notebook! They are only needed if you want to use FrugalGPT for your own queries.
#### NB: _For your own queries, not all API keys are needed, too. If you only want to leverage LLMs from, e.g., OpenAI and AI21, setting up API keys for them is sufficient._

In [4]:
import os
os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_KEY'
os.environ['AI21_STUDIO_API_KEY'] = 'YOUR_AI21_KEY'
os.environ['COHERE_STUDIO_API_KEY'] = 'YOUR_COHERE_KEY'
os.environ['TEXTSYNTH_API_SECRET_KEY'] = 'YOUR_TEXTSYNTH_KEY'
os.environ['ANTHROPIC_API_KEY'] = 'YOUR_ANTHROPIC_KEY'
os.environ['TOGETHER_API_KEY'] = 'YOUR_TOGETHER_KEY'
os.environ['GEMINI_API_KEY'] = 'YOUR_GEMINI_KEY'

from IPython.display import display
import FrugalGPT
supported_LLM = FrugalGPT.getservicename()
print("supported LLMs:",supported_LLM)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

supported LLMs: ['textsynth/gptneox_20B', 'textsynth/fairseq_gpt_13B', 'textsynth/gptj_6B', 'openai/text-davinci-002', 'openai/text-davinci-003', 'openai/text-curie-001', 'openai/text-babbage-001', 'openai/text-ada-001', 'openaichat/gpt-4o-mini', 'openaichat/gpt-4o-mini-2024-07-18', 'openaichat/gpt-4o', 'openaichat/gpt-4o-2024-05-13', 'openaichat/gpt-4-turbo', 'openaichat/gpt-4o-2024-08-06', 'openaichat/gpt-3.5-turbo', 'openaichat/gpt-4', 'ai21/jamba-1.5-mini', 'ai21/jamba-1.5-large', 'ai21/j1-jumbo', 'ai21/j1-grande', 'ai21/j1-large', 'ai21/j2-ultra', 'ai21/j2-mid', 'ai21/j2-light', 'cohere/command', 'cohere/base', 'cohere/xlarge', 'cohere/medium', 'togetherai/Qwen/Qwen2-72B-Instruct', 'togetherai/mistralai/Mistral-7B-Instruct-v0.1', 'togetherai/google/gemma-2b-it', 'togetherai/google/gemma-2-9b-it', 'togetherai/google/gemma-2-27b-it', 'togetherai/meta-llama/Meta-Llama-3-8B-Instruct-Lite', 'togetherai/Qwen/Qwen1.5-110B-Chat', 'togetherai/mistralai/Mistral-7B-Instruct-v0.3', 'togethera

## Generating the tradeoffs involves three major steps: (i) prepare the dataset, (ii) train the FrugalGPT strategy, and (iii) evaluate and save the performance.

## Step 1: Prepare the dataset

In [5]:
import pandas as pd
def list_to_dataframe(data_list):
    # The first sublist is the header
    headers = data_list[0]
    # The rest are the data rows
    data = data_list[1:]
    # Create the dataframe
    df = pd.DataFrame(data, columns=headers)
    return df

def convert_and_merge_dataframes(train_df, test_df):
    def extract_last_query_part(query):
        # Split the query by '\n\n' and take the last part
        return query.split('\n\n')[-1]

    def create_converted_df(df, start_query_id=1):
        # Extract the new 'query' and keep 'ref_answer' the same
        df['new_query'] = df['query'].apply(extract_last_query_part)

        # Group by 'new_query' and 'ref_answer' to merge identical queries
        merged_df = df.groupby(['new_query', 'ref_answer'], as_index=False).first()

        # Create a new dataframe with the three columns
        converted_df = pd.DataFrame({
            'query': merged_df['new_query'],
            'ref_answer': merged_df['ref_answer'],
            'query_id': range(start_query_id, start_query_id + len(merged_df))
        })

        return converted_df

    # Convert and merge the train dataframe
    converted_train_df = create_converted_df(train_df)

    # Find the last query_id from the training data
    last_train_query_id = converted_train_df['query_id'].max()

    # Convert and merge the test dataframe, starting query_id from the last training id + 1
    converted_test_df = create_converted_df(test_df, start_query_id=last_train_query_id + 1)

    return converted_train_df, converted_test_df


In [6]:
train_raw = FrugalGPT.loadcsvdata("data/SCIQ/SCIQ_train.csv")
test_raw = FrugalGPT.loadcsvdata("data/SCIQ/SCIQ_test.csv")
train_df = list_to_dataframe(train_raw)
test_df = list_to_dataframe(test_raw)
converted_train, converted_test = convert_and_merge_dataframes(train_df, test_df)
columns_to_save = ['query', 'ref_answer', 'query_id']
converted_train[columns_to_save].to_csv("data/SCIQ/train.csv", index=False, header=False)
converted_test[columns_to_save].to_csv("data/SCIQ/test.csv",index=False, header=False)


## Step 2: Train the FrugalGPT strategy for different budgets

Let us first evaluate individual models.

In [7]:
import pandas as pd

def generate_dataframe(service_names, train_data, test_data, genparams,db_path="db/SCIQ.sqlite",
                       max_workers=2):
    # Initialize an empty list to store the rows for the DataFrame
    data = []
    MyLLMforAll = FrugalGPT.LLMforAll(
                     db_path=db_path,
                     max_workers=max_workers,

)
    # Dictionary to keep track of markers for each provider
    provider_marker = {}

    # Iterate through the service names
    for name in service_names:
        # Extract provider and method
        provider = name.split('/')[0]
        method = name.split('/')[-1]

        # If the provider is seen for the first time, initialize its marker
        if provider not in provider_marker:
            provider_marker[provider] = 1
        else:
            provider_marker[provider] += 1
        # Get the completion batch for train and test data
        r1_train = MyLLMforAll.get_completion_batch(queries=train_data, genparams=genparams, service_name=name)
        r2_train = FrugalGPT.compute_score(r1_train)
        r1_test = MyLLMforAll.get_completion_batch(queries=test_data, genparams=genparams, service_name=name)
        r2_test = FrugalGPT.compute_score(r1_test)

        # Extract accuracy and cost
        train_acc = r2_train['em']
        train_cost = r2_train['cost']
        test_acc = r2_test['em']
        test_cost = r2_test['cost']

        # Create a row with the schema
        row = {
            "Test_acc": test_acc,
            "Test_cost": test_cost,
            "Test_size": len(test_data),
            "Train_acc": train_acc,
            "Train_cost": train_cost,
            "Train_size": len(train_data),
            "Budget": 10,
            "Method": method,
            "Provider": provider,
            "Marker": provider_marker[provider],
        }

        # Append the row to the data list
        data.append(row)

    # Create the DataFrame from the data list
    df = pd.DataFrame(data)

    return df

In [8]:
dataname = "SCIQ"
service_names = [
    'ai21/jamba-1.5-large',

    'togetherai/meta-llama/Meta-Llama-3-70B-Instruct-Turbo',
    'togetherai/google/gemma-2-9b-it',

    'google/gemini-1.5-flash',
    'google/gemini-1.5-pro',
    'google/gemini-1.5-flash-8b',

    'openaichat/gpt-4o-2024-05-13',
    'openaichat/gpt-4o-mini',
    'openaichat/gpt-4-turbo',

    'anthropic/claude-3-5-sonnet-20240620',
                 ]
genparams=FrugalGPT.GenerationParameter(max_tokens=50, temperature=0.1, stop=['\n'])

test_data = FrugalGPT.loadcsvdata(f"data/{dataname}/test.csv")
prefix = open(f'config/prompt/{dataname}/prefix_e8.txt').read()
test_data = FrugalGPT.formatdata(test_data,prefix)

train_data = FrugalGPT.loadcsvdata(f"data/{dataname}/train.csv")
prefix = open(f'config/prompt/{dataname}/prefix_e8.txt').read()
train_data = FrugalGPT.formatdata(train_data,prefix)

sample_size = 10000
individualmodel_df = generate_dataframe(service_names,
                                        train_data[0:sample_size], test_data[0:sample_size],
                                        genparams,
                                        db_path=f"db/{dataname}.sqlite",
                                        max_workers=2)
display(individualmodel_df)
individualmodel_df.to_csv(f"summary_{dataname}_e8_2024.csv")


5838it [00:08, 693.44it/s]
5839it [00:08, 708.91it/s]
5838it [00:07, 783.82it/s]
5839it [00:07, 777.25it/s]
5838it [00:07, 768.42it/s]
5839it [00:07, 788.16it/s]
5838it [00:07, 732.51it/s]
5839it [00:08, 711.05it/s]
5838it [00:08, 720.52it/s]
5839it [00:07, 741.82it/s]
5838it [00:07, 733.29it/s]
5839it [00:08, 725.65it/s]
5838it [00:07, 783.56it/s]
5839it [00:07, 781.43it/s]
5838it [00:07, 764.24it/s]
5839it [00:07, 785.91it/s]
5838it [00:07, 769.50it/s]
5839it [00:07, 769.46it/s]
5838it [00:07, 731.68it/s]
5839it [00:08, 715.91it/s]


Unnamed: 0,Test_acc,Test_cost,Test_size,Train_acc,Train_cost,Train_size,Budget,Method,Provider,Marker
0,0.730262,0.002281,5839,0.715142,0.002285,5838,10,jamba-1.5-large,ai21,1
1,0.241137,0.000959,5839,0.239466,0.00096,5838,10,Meta-Llama-3-70B-Instruct-Turbo,togetherai,1
2,0.753554,0.000318,5839,0.750257,0.000319,5838,10,gemma-2-9b-it,togetherai,2
3,0.730262,8e-05,5839,0.713772,8e-05,5838,10,gemini-1.5-flash,google,1
4,0.539818,0.001337,5839,0.534601,0.00134,5838,10,gemini-1.5-pro,google,2
5,0.436205,4e-05,5839,0.430284,4e-05,5838,10,gemini-1.5-flash-8b,google,3
6,0.77479,0.005405,5839,0.77184,0.00541,5838,10,gpt-4o-2024-05-13,openaichat,1
7,0.75578,0.000163,5839,0.739979,0.000163,5838,10,gpt-4o-mini,openaichat,2
8,0.778558,0.016268,5839,0.764303,0.016284,5838,10,gpt-4-turbo,openaichat,3
9,0.721356,0.00365,5839,0.720281,0.003654,5838,10,claude-3-5-sonnet-20240620,anthropic,1


In [9]:
from google.colab import files
#files.download(f'db/SCIQ.sqlite')

Now let us train FrugalGPT on this dataset.

In [10]:
import numpy
from tqdm import tqdm
def compute_tradeoffs(
    train_data,
                      budget_list,
                      name = "test",

                      service_names = ['openaichat/gpt-4o-mini',
                                       'openaichat/gpt-4o',
                                      'openaichat/gpt-4-turbo',
                 'togetherai/meta-llama/Meta-Llama-3-70B-Instruct-Turbo',
                                      'togetherai/google/gemma-2-9b-it',
                 ],
                      prefix="",
                      skip=0,
    MyCascade = FrugalGPT.LLMCascade(
          score_noise_injection=False,
  db_path="db/SCIQ.sqlite",
  ),

    cascade_depth=3,
                  score_test_size=0.55,

                      ):

  for idx,budget in tqdm(enumerate(budget_list)):
    # train the model
    user_budget = budget
    try:
      MyCascade.load(loadpath=f"strategy/{name}/",budget=user_budget)
      print("Already trained. Skipped.")
      continue
    except:
      print("cannot find, start new training")
    if(idx<skip):
      continue
    if(idx==0):
        result = MyCascade.train(train_data,budget=user_budget,
                                 service_names=service_names,
                                 no_scorer_train=False,
                                 prefix=prefix,
                                 cascade_depth=cascade_depth,
                                score_test_size=score_test_size,
                                 )
    else:
      result = MyCascade.train(train_data,budget=user_budget,
                               service_names=service_names,
                               no_scorer_train=True,
                               prefix=prefix,
                               cascade_depth=cascade_depth,
                              score_test_size=score_test_size,

                               )
    MyCascade.save(savepath=f"strategy/{name}/")
  return

In [11]:
start_budget = 0.000085
end_budget = 0.018
num_eval = 20

name = f'{dataname}_Model20241128'
budget_list = numpy.linspace(start_budget, end_budget, num_eval)
budget_list[0] = 0.00025
# load data
dev = FrugalGPT.loadcsvdata(f"data/{dataname}/train.csv")
train_data = FrugalGPT.formatdata(dev,prefix)
MyCascade= FrugalGPT.LLMCascade(
          score_noise_injection=False,
  db_path=f"db/{dataname}.sqlite",
  batch_build=True,
  )
budget_list
#MyCascade.load(loadpath=f"strategy/{name}/",budget=0.00017)

array([0.00025   , 0.00102789, 0.00197079, 0.00291368, 0.00385658,
       0.00479947, 0.00574237, 0.00668526, 0.00762816, 0.00857105,
       0.00951395, 0.01045684, 0.01139974, 0.01234263, 0.01328553,
       0.01422842, 0.01517132, 0.01611421, 0.01705711, 0.018     ])

In [12]:
service_names_train = [
    'togetherai/meta-llama/Meta-Llama-3-70B-Instruct-Turbo',

    'google/gemini-1.5-flash',
    'google/gemini-1.5-flash-8b',

    #'openaichat/gpt-4o-mini',
    'openaichat/gpt-4-turbo',

    'anthropic/claude-3-5-sonnet-20240620',
                 ]

compute_tradeoffs(train_data=train_data,
                  budget_list=budget_list,
                  name=name,
                  service_names=service_names_train,
                  prefix=prefix,
                  skip=0, # you can manually skip the first few budgets if they have already been trained.
                  MyCascade=MyCascade,
                  cascade_depth=3,
                  score_test_size=0.55,
                  )

0it [00:00, ?it/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

1it [00:02,  2.10s/it]

Already trained. Skipped.


2it [00:03,  1.57s/it]

Already trained. Skipped.


3it [00:04,  1.38s/it]

Already trained. Skipped.


4it [00:05,  1.33s/it]

Already trained. Skipped.


5it [00:06,  1.27s/it]

Already trained. Skipped.


6it [00:07,  1.22s/it]

Already trained. Skipped.


7it [00:09,  1.19s/it]

Already trained. Skipped.


8it [00:10,  1.17s/it]

Already trained. Skipped.


9it [00:11,  1.15s/it]

Already trained. Skipped.


10it [00:12,  1.19s/it]

Already trained. Skipped.


11it [00:13,  1.18s/it]

Already trained. Skipped.


12it [00:14,  1.19s/it]

Already trained. Skipped.


13it [00:16,  1.16s/it]

Already trained. Skipped.


14it [00:17,  1.17s/it]

Already trained. Skipped.


15it [00:19,  1.34s/it]

Already trained. Skipped.


16it [00:20,  1.30s/it]

Already trained. Skipped.


17it [00:21,  1.30s/it]

Already trained. Skipped.


18it [00:22,  1.27s/it]

Already trained. Skipped.


19it [00:23,  1.24s/it]

Already trained. Skipped.


20it [00:25,  1.25s/it]

Already trained. Skipped.





In [13]:
budget_list = budget_list[::-1]
budget_list

array([0.018     , 0.01705711, 0.01611421, 0.01517132, 0.01422842,
       0.01328553, 0.01234263, 0.01139974, 0.01045684, 0.00951395,
       0.00857105, 0.00762816, 0.00668526, 0.00574237, 0.00479947,
       0.00385658, 0.00291368, 0.00197079, 0.00102789, 0.00025   ])

In [14]:
import shutil
from google.colab import files
folder_to_zip = f'strategy/{name}'
output_zip_file = f'{name}.zip'
shutil.make_archive(output_zip_file.replace('.zip', ''), 'zip', folder_to_zip)
print(f"Folder '{folder_to_zip}' zipped as '{output_zip_file}'.")
files.download(output_zip_file)

Folder 'strategy/SCIQ_Model20241128' zipped as 'SCIQ_Model20241128.zip'.


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Step 3: Evaluate and save the performance

In [15]:
def generate_dataframe_from_cascade(MyCascade,budget_list, train_data, test_data, genparams,name):
    # Initialize an empty list to store the rows for the DataFrame
    data = []

    # Iterate through the budget list
    for budget in tqdm(budget_list):
        # Load the strategy for the given budget
        MyCascade.load(loadpath=f"strategy/{name}/", budget=budget)

        # Get the completion batch for train data
        train_result = MyCascade.get_completion_batch(queries=train_data, genparams=genparams)

        # Compute the ACC and cost for train data
        train_acc_cost = FrugalGPT.compute_score(train_result)


        # Get the completion batch for test data
        test_result = MyCascade.get_completion_batch(queries=test_data, genparams=genparams)

        # Compute the ACC and cost for test data
        test_acc_cost = FrugalGPT.compute_score(test_result)

        # Create a row with the schema
        row = {
            "Test_acc": test_acc_cost['em'],
            "Test_cost": test_acc_cost['cost'],
            "Test_size": len(test_data),
            "Train_acc": train_acc_cost['em'],
            "Train_cost": train_acc_cost['cost'],
            "Train_size": len(train_data),
            "Budget": budget,
            "Method": "FrugalGPT",
            "Provider": "FrugalGPT",
            "Marker": 1,  # Marker is always 1 for this function
        }

        # Append the row to the data list
        data.append(row)
        display(row)

    # Create the DataFrame from the data list
    df = pd.DataFrame(data)

    return df

In [None]:
MyCascade_eval = FrugalGPT.LLMCascade()
MyCascade_eval.prefix = prefix
frugalgpt_df = generate_dataframe_from_cascade(MyCascade_eval,
                                               budget_list, train_data, test_data, genparams,
                                               name)
display(frugalgpt_df)
frugalgpt_df.to_csv(f"summary_{dataname}_e8_frugalgpt_2024.csv")

  0%|          | 0/20 [00:00<?, ?it/s]

Now let us save the results to local disk.

In [None]:
from google.colab import files
import copy
individualmodel_df2 = copy.copy(individualmodel_df)
#individualmodel_df2['Test_cost'] = individualmodel_df2['Test_cost'] * individualmodel_df2['Test_size']
full_pd = pd.concat([frugalgpt_df,individualmodel_df2,])
full_pd.to_csv(f"summary_{dataname}_e8_full_2024.csv")
files.download(f'summary_{dataname}_e8_full_2024.csv')
display(full_pd)

In [None]:
'''
import time
i=0
while(1):
  print(i)
  i+=1
  time.sleep(60)
'''