Skip to content

enterprisebot-community/BASIC-genai-benchmark

Repository files navigation

Enterprise Bot

Conversational Automation for Enterprises
enterprisebot.ai

Benchmarking Enterprise AI


Getting Started

These instructions will get you a copy of the BASIC benchmarking tool up and running on your local machine for evaluation purposes.

How it works

For more information on how the benchmarking tool works, refer to the documentation page

Installing

Clone the repository to your local machine. Install the required libraries using the following command:

pip install -r requirements.txt

Create a .env file to store your API keys. Add the following lines to the .env file:

OPENAI_API_KEY=<your_openai_api_key>
ANTHROPIC_API_KEY=<your_anthropic_api_key>
GOOGLE_API_KEY=<your_google_api_key>

Running the benchmark

You can run the project using the following command:

python basic.py <model>

You can replace <model> with the name of the model you want to evaluate. The available models are:

  • claude-3-opus-20240229
  • gpt-4-1106-preview
  • gpt-3.5-turbo-0125
  • gpt-4

If you want to evaluate all available models, you can run the project using the following command:

python basic.py

Running using custom datasets

You can run the benchmark using your own datasets by adding the dataset to the dataset folder. The dataset should be in a .csv file with each line containing a question, answer and context in that order. You will then be prompted to choose the dataset you want to use when running the benchmark.

Adding new models

To add a new model, you can add the model to the available_models array in the basic.py file. The key should be the model name.

available_models = ["claude-3-opus-20240229", "gpt-4-1106-preview", "gpt-3.5-turbo-0125", "gpt-4"]

You will also need to add the model to the calculateModelCost function. The function should return the cost of the model based on the AI providers pricing.

def calculateModelCost(model, token_usage):
	if model == "gpt-4-0125-preview" or model == "gpt-4-1106-preview":
		cost = token_usage * 0.00003
	elif model == "gpt-4":
		cost = token_usage * 0.00006
	elif model == "gpt-3.5-turbo-0125":
		cost = token_usage * 0.0000015
	elif model == "claude-3-opus-20240229":
		cost = token_usage * 0.000075
	elif model == "<new_model>":
		cost = token_usage * <new_price>

Results

Results are added to the /results folder. You can view our results in the /001-llm-benchmark-results folder.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages