Conversational Automation for Enterprises
enterprisebot.ai
These instructions will get you a copy of the BASIC benchmarking tool up and running on your local machine for evaluation purposes.
For more information on how the benchmarking tool works, refer to the documentation page
Clone the repository to your local machine. Install the required libraries using the following command:
pip install -r requirements.txt
Create a .env
file to store your API keys. Add the following lines to the .env
file:
OPENAI_API_KEY=<your_openai_api_key>
ANTHROPIC_API_KEY=<your_anthropic_api_key>
GOOGLE_API_KEY=<your_google_api_key>
You can run the project using the following command:
python basic.py <model>
You can replace <model>
with the name of the model you want to evaluate. The available models are:
- claude-3-opus-20240229
- gpt-4-1106-preview
- gpt-3.5-turbo-0125
- gpt-4
If you want to evaluate all available models, you can run the project using the following command:
python basic.py
You can run the benchmark using your own datasets by adding the dataset to the dataset
folder. The dataset should
be in a .csv
file with each line containing a question
, answer
and context
in that order. You
will then be prompted to choose the dataset you want to use when running the benchmark.
To add a new model, you can add the model to the available_models
array in the basic.py
file. The key should be the model name.
available_models = ["claude-3-opus-20240229", "gpt-4-1106-preview", "gpt-3.5-turbo-0125", "gpt-4"]
You will also need to add the model to the calculateModelCost
function. The function should return the cost of the model based on the AI providers pricing.
def calculateModelCost(model, token_usage):
if model == "gpt-4-0125-preview" or model == "gpt-4-1106-preview":
cost = token_usage * 0.00003
elif model == "gpt-4":
cost = token_usage * 0.00006
elif model == "gpt-3.5-turbo-0125":
cost = token_usage * 0.0000015
elif model == "claude-3-opus-20240229":
cost = token_usage * 0.000075
elif model == "<new_model>":
cost = token_usage * <new_price>
Results are added to the /results
folder. You can view our results in the /001-llm-benchmark-results
folder.