# 🎓 FrugalGPT Experiment on OVERRULING: Performance and Cost Tradeoffs

This notebook illustrates the FrugalGPT framework for _building LLM Applications with budget constraints._

In particular, we will focus on evaluating the performance and cost tradeoffs enabled by FrugalGPT.

NB: You are highly suggested to use accelerated hardware (GPU/TPU) to run this notebook.

## Installation

In [1]:
%load_ext autoreload
%autoreload 2
import sys, json, copy
import pandas as pd
import logging
logging.disable(logging.CRITICAL)
sys.path.append("src/")

In [11]:
import importlib
importlib.reload(FrugalGPT)

<module 'FrugalGPT' from '/home/feiy/My_FrugalGPT/src/FrugalGPT/__init__.py'>

## Setup
Next, let us set up the environment and API keys. You do _not_ need API keys to run the notebook! They are only needed if you want to use FrugalGPT for your own queries.

NB: For your own queries, not all API keys are needed, too. If you only want to leverage LLMs from, e.g., OpenAI and AI21, setting up API keys for them is sufficient.

In [3]:
import os
from IPython.display import display
import FrugalGPT
import numpy
from tqdm import tqdm

supported_LLM = FrugalGPT.getservicename()
print("supported LLMs:",supported_LLM)
supported_LLM_names = [llm.split("/")[1] for llm in supported_LLM]
print("supported_LLM_names:", supported_LLM_names)

supported LLMs: ['google/gemini-1.5-flash-002', 'google/gemini-1.5-pro-002', 'google/gemini-1.0-pro', 'openaichat/gpt-4o-mini', 'openaichat/gpt-4o', 'azure/Phi-3-mini-4k-instruct', 'azure/Phi-3.5-mini-instruct', 'azure/Phi-3-small-8k-instruct', 'azure/Phi-3-medium-4k-instruct', 'deepinfra/llama-3-8B', 'deepinfra/llama-3-70B', 'deepinfra/mixtral-8x7B']
supported_LLM_names: ['gemini-1.5-flash-002', 'gemini-1.5-pro-002', 'gemini-1.0-pro', 'gpt-4o-mini', 'gpt-4o', 'Phi-3-mini-4k-instruct', 'Phi-3.5-mini-instruct', 'Phi-3-small-8k-instruct', 'Phi-3-medium-4k-instruct', 'llama-3-8B', 'llama-3-70B', 'mixtral-8x7B']


## Generating the tradeoffs involves three major steps: (i) prepare the dataset, (ii) train the FrugalGPT strategy, and (iii) evaluate and save the performance.

## Step 1: Prepare the dataset

In [4]:
# dataname = "HEADLINES"
dataname = "OVERRULING"


In [5]:
# read from data/{dataname}/Queried_{dataname}_all_models_clean_train.csv and data/{dataname}/Queried_{dataname}_all_models_clean_test.csv
dataset_df = pd.read_csv(f'data/{dataname}/Queried_{dataname}_all_models_clean_train.csv', header=0)
dataset_df.head()

Unnamed: 0,query_raw,query,ref_answer,gpt-4o-mini,gpt-4o,llama-3-8B,llama-3-70B,mixtral-8x7B,gemini-1.5-flash-002,gemini-1.0-pro,gemini-1.5-pro-002,Phi-3.5-mini-instruct,Phi-3-small-8k-instruct,Phi-3-mini-4k-instruct,Phi-3-medium-4k-instruct
0,Context: to the extent that these cases are in...,Please determine whether a sentence is overrul...,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes
1,Context: we therefore reverse the order denyin...,Please determine whether a sentence is overrul...,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes
2,"Context: see brown v. state,\nQuestion: Is it ...",Please determine whether a sentence is overrul...,no,no,no,no,no,no,no,no,no,no,no,no,no
3,"Context: at the very least, this court ought t...",Please determine whether a sentence is overrul...,no,no,no,no,no,no,no,no,no,yes,no,no,no
4,Context: the federal immigration judge and the...,Please determine whether a sentence is overrul...,yes,no,no,yes,no,no,yes,no,no,yes,no,yes,yes


In [6]:
train_data = []
for index, row in dataset_df.iterrows():
    query = row['query']
    ref_answer = row['ref_answer']
    _id = index
    model_answer = {}
    for model_name in supported_LLM_names:
        model_answer[model_name] = row[model_name]
    train_data.append([query, ref_answer, _id, model_answer])

In [7]:
train_data[3]

['Please determine whether a sentence is overruling a prior decision (Yes or No) in the following statements.\n\nContext: because jones/walker relates only to sufficiency of the evidence, we hereby disavow the language holding otherwise in sandoval.\nQuestion: Is it overruling?\nAnswer: Yes\n\nContext: according to napa auto parts, the straws drove the vehicle """"for approximately six [] weeks and [] for between 500 to 600 miles prior to the accident with no incidents.""""\nQuestion: Is it overruling?\nAnswer: No\n\nContext: at the very least, this court ought to address the problem created by kar because, as this case illustrates, kar is distorting the burden of proof in this important area of the law.\nQuestion: Is it overruling?\nAnswer:',
 'no',
 3,
 {'gemini-1.5-flash-002': 'no',
  'gemini-1.5-pro-002': 'no',
  'gemini-1.0-pro': 'no',
  'gpt-4o-mini': 'no',
  'gpt-4o': 'no',
  'Phi-3-mini-4k-instruct': 'no',
  'Phi-3.5-mini-instruct': 'yes',
  'Phi-3-small-8k-instruct': 'no',
  '

In [8]:
# get the answer of the model llama-3-8B
train_data[3][3]['llama-3-8B']

'no'

## Step 2: Train the FrugalGPT strategy for different budgets

In [9]:
service_names = ['openaichat/gpt-4o-mini',
                'openaichat/gpt-4o',
                'google/gemini-1.5-flash-002',
                'google/gemini-1.5-pro-002',
                'google/gemini-1.0-pro',
                'azure/Phi-3-mini-4k-instruct',
                'azure/Phi-3.5-mini-instruct',
                'azure/Phi-3-small-8k-instruct',
                'azure/Phi-3-medium-4k-instruct',
                'deepinfra/llama-3-8B',
                'deepinfra/llama-3-70B',
                'deepinfra/mixtral-8x7B',
                ]

### 2-1. Now let us train FrugalGPT on this dataset.

In [10]:
genparams=FrugalGPT.GenerationParameter(max_tokens=50, temperature=0.1, stop=['\n'])

In [12]:
def compute_tradeoffs(
    train_data,
    budget_list,
    name = "HEADLINES", # test
    service_names = ['openaichat/gpt-4o-mini',
                      'openaichat/gpt-4o',
                      'openaichat/gpt-4-turbo',
                      'togetherai/meta-llama/Meta-Llama-3-70B-Instruct-Turbo',
                      'togetherai/google/gemma-2-9b-it',
                    ],
    prefix="",
    skip=0,
    MyCascade = FrugalGPT.LLMCascade(
          score_noise_injection=False,
          db_path="db/SCIQ.sqlite",
          ),
    cascade_depth=3,
    ):

  for idx,budget in tqdm(enumerate(budget_list)):
    # train the model
    user_budget = budget
    # MyCascade.load(loadpath=f"strategy/{name}/",budget=user_budget)

    try:
      MyCascade.load(loadpath=f"strategy/{name}/",budget=user_budget)
      print("Already trained. Skipped.")
      continue
    except:
      print("cannot find, start new training")
    if(idx<skip):
      continue
    if(idx==0):
        result = MyCascade.train(train_data,budget=user_budget,
                                 service_names=service_names,
                                 no_scorer_train=False,
                                 prefix=prefix,
                                 cascade_depth=cascade_depth,
                                 )
    else:
      result = MyCascade.train(train_data,budget=user_budget,
                               service_names=service_names,
                               no_scorer_train=True,
                               prefix=prefix,
                               cascade_depth=cascade_depth,
                               )
    MyCascade.save(savepath=f"strategy/{name}/")
  return

In [13]:
# start_budget = 5e-05 # 0.0035 
# end_budget = 0.0001
# num_eval = 2
# budget_list = numpy.linspace(start_budget, end_budget, num_eval)

name = f'{dataname}_1125'
budget_list = [0.00001, 0.00005, 0.0001, 0.0005, 0.001] # , 0.0015

MyCascade= FrugalGPT.LLMCascade(
          score_noise_injection=False,
  db_path=f"db/{dataname}.sqlite",
  batch_build=True,
  )

In [14]:
train_data_sample = train_data[0:] # [0:100]
print(len(train_data_sample))

1728


In [None]:
compute_tradeoffs(train_data=train_data_sample,
                  budget_list=budget_list,
                  name=name,
                  service_names=service_names,
                  skip=0, # you can manually skip the first few budgets if they have already been trained.
                  MyCascade=MyCascade,
                  cascade_depth=3,
                  )

0it [00:00, ?it/s]

cannot find, start new training
train and test size 1382 346


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

Epoch,Training Loss,Validation Loss,Accuracy
1,0.3267,0.231811,0.962025
2,0.1626,0.161168,0.962025
3,0.1865,0.156454,0.962025
4,0.1556,0.148686,0.963834
5,0.0812,0.153994,0.962025
6,0.0732,0.146168,0.9566


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3889,0.277256,0.9566
2,0.1583,0.178027,0.9566
3,0.251,0.178351,0.9566
4,0.118,0.185697,0.9566
5,0.0915,0.202329,0.9566
6,0.1149,0.162677,0.9566


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3732,0.280314,0.954792
2,0.207,0.184101,0.954792
3,0.27,0.188056,0.954792
4,0.0706,0.19656,0.954792
5,0.2047,0.196053,0.954792
6,0.1481,0.179789,0.954792


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3823,0.290204,0.947559
2,0.1691,0.223408,0.947559
3,0.2935,0.208478,0.947559
4,0.0691,0.223197,0.947559
5,0.1979,0.207151,0.947559
6,0.1932,0.190727,0.952984


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3742,0.28383,0.952984
2,0.1784,0.190928,0.952984
3,0.297,0.210642,0.952984
4,0.0762,0.19829,0.952984
5,0.2853,0.189767,0.952984
6,0.1657,0.197102,0.952984


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3982,0.321093,0.929476
2,0.2999,0.262154,0.929476
3,0.341,0.277192,0.929476
4,0.219,0.255981,0.929476
5,0.2949,0.243736,0.929476
6,0.2402,0.252384,0.929476


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3949,0.304569,0.942134
2,0.234,0.231565,0.942134
3,0.3116,0.226528,0.942134
4,0.2682,0.223304,0.942134
5,0.2723,0.219612,0.942134
6,0.245,0.220953,0.942134


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.4138,0.319208,0.929476
2,0.1985,0.259654,0.929476
3,0.2801,0.257296,0.929476
4,0.2644,0.251981,0.929476
5,0.1839,0.25841,0.929476
6,0.2667,0.234862,0.933092


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3864,0.28607,0.952984
2,0.2033,0.190354,0.952984
3,0.2401,0.189822,0.952984
4,0.1393,0.189512,0.952984
5,0.2145,0.184257,0.952984
6,0.1878,0.182618,0.952984


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3744,0.283992,0.952984
2,0.1839,0.191729,0.952984
3,0.2598,0.189552,0.952984
4,0.0707,0.198991,0.952984
5,0.2429,0.189486,0.952984
6,0.219,0.200949,0.952984


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3785,0.287276,0.951175
2,0.1582,0.195256,0.951175
3,0.2801,0.194722,0.951175
4,0.1184,0.197726,0.951175
5,0.2043,0.184361,0.951175
6,0.1665,0.16484,0.951175


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3701,0.292052,0.947559
2,0.2171,0.205687,0.947559
3,0.2578,0.211337,0.947559
4,0.1118,0.219037,0.947559
5,0.2305,0.202005,0.947559
6,0.1649,0.198796,0.947559




first train



[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

finish training!


1it [19:21, 1161.47s/it]

cannot find, start new training
train and test size 1382 346
scores {'openaichat/gpt-4o-mini': {182: 0.9966456, 825: 0.9964353, 1240: 0.9913633, 505: 0.9891929, 1482: 0.99687624, 1549: 0.9892406, 842: 0.9928832, 1605: 0.9941712, 386: 0.9895188, 1622: 0.99058443, 1182: 0.9965084, 1212: 0.99715734, 487: 0.9967289, 148: 0.9917401, 661: 0.9966461, 950: 0.99700123, 393: 0.9962669, 1056: 0.9966329, 259: 0.99258065, 1415: 0.9916762, 523: 0.9968368, 353: 0.99689263, 1391: 0.99710554, 874: 0.99658966, 12: 0.99646336, 1700: 0.9903628, 527: 0.98659736, 496: 0.9911186, 483: 0.92773587, 365: 0.99653745, 674: 0.99253947, 93: 0.99381196, 729: 0.9899939, 370: 0.99668044, 714: 0.9937198, 898: 0.9917321, 1344: 0.9965552, 831: 0.9964833, 1723: 0.99044526, 203: 0.9913645, 345: 0.9896756, 67: 0.99201137, 1383: 0.99082446, 319: 0.99067163, 852: 0.98886955, 1311: 0.9857268, 45: 0.9943699, 582: 0.9907488, 791: 0.991375, 781: 0.9969313, 221: 0.9674942, 316: 0.9731549, 857: 0.88231444, 578: 0.99684614, 137: 0.9


[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A

## Step 3: Evaluate and save the performance

In [14]:
# read from data/{dataname}/Queried_{dataname}_all_models_clean_train.csv and data/{dataname}/Queried_{dataname}_all_models_clean_test.csv
dataset_df_test = pd.read_csv(f'data/{dataname}/Queried_{dataname}_all_models_clean_test.csv', header=0)
dataset_df_test.head()

Unnamed: 0,query_raw,query,ref_answer,gpt-4o-mini,gpt-4o,llama-3-8B,llama-3-70B,mixtral-8x7B,gemini-1.5-flash-002,gemini-1.0-pro,gemini-1.5-pro-002,Phi-3.5-mini-instruct,Phi-3-small-8k-instruct,Phi-3-mini-4k-instruct,Phi-3-medium-4k-instruct
0,Context: we disapprove orange county v. sealy ...,Please determine whether a sentence is overrul...,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes
1,"Context: he also left the scene of the crime, ...",Please determine whether a sentence is overrul...,no,no,no,no,no,no,no,no,no,no,no,no,no
2,Context: contrary statements in our opinions a...,Please determine whether a sentence is overrul...,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,yes,no,yes
3,"Context: """"[a] prima facie case of good faith ...",Please determine whether a sentence is overrul...,no,no,no,no,no,no,no,no,no,no,no,no,no
4,"Context: as an intermediate appellate court, w...",Please determine whether a sentence is overrul...,no,no,no,no,no,no,no,no,no,no,no,no,no


In [15]:
test_data = []
for index, row in dataset_df_test.iterrows():
    query = row['query']
    ref_answer = row['ref_answer']
    _id = index
    model_answer = {}
    for model_name in supported_LLM_names:
        model_answer[model_name] = row[model_name]
    test_data.append([query, ref_answer, _id, model_answer])

In [16]:
test_data[3]

['Please determine whether a sentence is overruling a prior decision (Yes or No) in the following statements.\n\nContext: because jones/walker relates only to sufficiency of the evidence, we hereby disavow the language holding otherwise in sandoval.\nQuestion: Is it overruling?\nAnswer: Yes\n\nContext: according to napa auto parts, the straws drove the vehicle """"for approximately six [] weeks and [] for between 500 to 600 miles prior to the accident with no incidents.""""\nQuestion: Is it overruling?\nAnswer: No\n\nContext: ""[a] prima facie case of good faith purpose is achieved by the mere allegation . . . that the information sought is for a proper purpose.""\nQuestion: Is it overruling?\nAnswer:',
 'no',
 3,
 {'gemini-1.5-flash-002': 'no',
  'gemini-1.5-pro-002': 'no',
  'gemini-1.0-pro': 'no',
  'gpt-4o-mini': 'no',
  'gpt-4o': 'no',
  'Phi-3-mini-4k-instruct': 'no',
  'Phi-3.5-mini-instruct': 'no',
  'Phi-3-small-8k-instruct': 'no',
  'Phi-3-medium-4k-instruct': 'no',
  'llama-

In [17]:
# get the answer of the model llama-3-8B
test_data[3][3]['llama-3-8B']

'no'

In [18]:
print(len(test_data))

432


In [None]:
def generate_dataframe_from_cascade(MyCascade,budget_list, train_data, test_data, genparams,name):
    # Initialize an empty list to store the rows for the DataFrame
    data = []

    # Iterate through the budget list
    for budget in tqdm(budget_list):
        # Load the strategy for the given budget
        MyCascade.load(loadpath=f"strategy/{name}/", budget=budget)
        print("loaded from path:",f"strategy/{name}/")
        print("now the budget is:",budget)

        # # Get the completion batch for train data
        # print("start train data")
        # train_result = MyCascade.get_completion_batch(queries=train_data, genparams=genparams)
        # print("train_result:",train_result)
        # # Compute the ACC and cost for train data
        # train_acc_cost = FrugalGPT.compute_score(train_result)

        # Get the completion batch for test data
        test_result = MyCascade.get_completion_batch(queries=test_data, genparams=genparams)
        print("cost", test_result['cost'])

        # Compute the ACC and cost for test data
        # test_acc_cost = FrugalGPT.compute_score(test_result)

        # Create a row with the schema
        row = {
            # "Test_acc": test_acc_cost['em'],
            # "Test_cost": test_acc_cost['cost'],
            "Test_cost": test_result['cost'],
            "Test_size": len(test_data),
            # "Train_acc": train_acc_cost['em'],
            # "Train_cost": train_acc_cost['cost'],
            "Train_size": len(train_data),
            "Budget": budget,
            "Method": "FrugalGPT",
            "Provider": "FrugalGPT",
            "Marker": 1,  # Marker is always 1 for this function
        }

        # Append the row to the data list
        data.append(row)
        display(row)

    # Create the DataFrame from the data list
    df = pd.DataFrame(data)

    return df

In [None]:
# def generate_dataframe_from_cascade(MyCascade,budget_list, train_data, test_data, genparams,name):
#     # Initialize an empty list to store the rows for the DataFrame
#     data = []

#     # Iterate through the budget list
#     for budget in tqdm(budget_list):
#         # Load the strategy for the given budget
#         MyCascade.load(loadpath=f"strategy/{name}/", budget=budget)
#         print("loaded from path:",f"strategy/{name}/")
#         print("now the budget is:",budget)

#         # Get the completion batch for train data
#         print("start train data")
#         train_result = MyCascade.get_completion_batch(queries=train_data, genparams=genparams)
#         print("train_result:",train_result)
#         # Compute the ACC and cost for train data
#         train_acc_cost = FrugalGPT.compute_score(train_result)

#         # Get the completion batch for test data
#         test_result = MyCascade.get_completion_batch(queries=test_data, genparams=genparams)

#         # Compute the ACC and cost for test data
#         test_acc_cost = FrugalGPT.compute_score(test_result)

#         # Create a row with the schema
#         row = {
#             "Test_acc": test_acc_cost['em'],
#             "Test_cost": test_acc_cost['cost'],
#             "Test_size": len(test_data),
#             "Train_acc": train_acc_cost['em'],
#             "Train_cost": train_acc_cost['cost'],
#             "Train_size": len(train_data),
#             "Budget": budget,
#             "Method": "FrugalGPT",
#             "Provider": "FrugalGPT",
#             "Marker": 1,  # Marker is always 1 for this function
#         }

#         # Append the row to the data list
#         data.append(row)
#         display(row)

#     # Create the DataFrame from the data list
#     df = pd.DataFrame(data)

#     return df

In [None]:
data = test_data
llm_vanilla = FrugalGPT.llmvanilla.LLMVanilla()  # 创建 LLMVanilla 类的实例

for i in range(len(data)):
    for name in service_names:
        service_name = name
        query = data[i][0]
        cost = llm_vanilla.compute_cost(input_text=query, output_text="no", service_name=service_name)
        print("data index is: ", data[i][2], "and cost for", service_name, " is: ", cost)

In [20]:
MyCascade_eval = FrugalGPT.LLMCascade()
# MyCascade_eval.prefix = prefix

frugalgpt_df = generate_dataframe_from_cascade(MyCascade_eval,
                                               budget_list, train_data, test_data, genparams,
                                               name)
display(frugalgpt_df)
frugalgpt_df.to_csv(f"summary/summary_{dataname}_e8_frugalgpt_2024.csv")

  0%|                                                            | 0/5 [00:03<?, ?it/s]


OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 23.68 GiB of which 7.25 MiB is free. Including non-PyTorch memory, this process has 13.52 GiB memory in use. Process 1706292 has 10.14 GiB memory in use. Of the allocated memory 11.78 GiB is allocated by PyTorch, and 520.09 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Now let us save the results to local disk.

In [None]:
display(frugalgpt_df)