Conversational Automation for Enterprises
enterprisebot.ai
Follow these instructions to set up the BASIC benchmarking tool on your local machine to evaluate LLMs on key metrics like accuracy, contextual understanding, compliance, consistency, and performance.
For more information on how the benchmarking tool works, refer to the documentation page
Clone the repository to your local machine.
Install the required libraries using the following command:
pip install -r requirements.txt
Create a .env
file to store your API keys. Add the following lines to the .env
file:
OPENAI_API_KEY=<your_openai_api_key>
ANTHROPIC_API_KEY=<your_anthropic_api_key>
GOOGLE_API_KEY=<your_google_api_key>
Run the project using the following command:
python basic.py <model>
Replace <model>
with the name of the model you want to evaluate. The available models are:
- claude-3-opus-20240229
- gpt-4-1106-preview
- gpt-3.5-turbo-0125
- gpt-4
To evaluate all available models, run the project using the following command:
python basic.py
You can run the benchmark using your own datasets by adding the dataset to the dataset
folder. The dataset should
be a .csv
file, with each line containing a question
, answer
, and context
, in that order.
Run the benchmark with your dataset using the following command:
python basic.py <dataset_name>
To run the benchmark with a specific model and dataset:
python basic.py <model> <dataset_name>
Add a new model to the available_models
array in the basic.py
file. The key should be the model name.
available_models = ["claude-3-opus-20240229", "gpt-4-1106-preview", "gpt-3.5-turbo-0125", "gpt-4"]
You also need to add the model to the calculateModelCost
function. The function should return the cost of the model based on the AI provider's pricing.
def calculateModelCost(model, token_usage):
if model == "gpt-4-0125-preview" or model == "gpt-4-1106-preview":
cost = token_usage * 0.00003
elif model == "gpt-4":
cost = token_usage * 0.00006
elif model == "gpt-3.5-turbo-0125":
cost = token_usage * 0.0000015
elif model == "claude-3-opus-20240229":
cost = token_usage * 0.000075
elif model == "<new_model>":
cost = token_usage * <new_price>
Results are added to the /results
folder. You can view our results in the /001-llm-benchmark-results
folder.