Benchmarking Enterprise AI

Conversational Automation for Enterprises
enterprisebot.ai

Benchmarking Enterprise AI

Getting Started

Follow these instructions to set up the BASIC benchmarking tool on your local machine to evaluate LLMs on key metrics like accuracy, contextual understanding, compliance, consistency, and performance.

How it works

For more information on how the benchmarking tool works, refer to the documentation page

Installing the tool

Clone the repository to your local machine.

Install the required libraries using the following command:

pip install -r requirements.txt

Create a .env file to store your API keys. Add the following lines to the .env file:

OPENAI_API_KEY=<your_openai_api_key>
ANTHROPIC_API_KEY=<your_anthropic_api_key>
GOOGLE_API_KEY=<your_google_api_key>

Running the benchmark

Run the project using the following command:

python basic.py <model>

Replace <model> with the name of the model you want to evaluate. The available models are:

claude-3-opus-20240229
gpt-4-1106-preview
gpt-3.5-turbo-0125
gpt-4

To evaluate all available models, run the project using the following command:

python basic.py

Running using custom datasets

You can run the benchmark using your own datasets by adding the dataset to the dataset folder. The dataset should be a .csv file, with each line containing a question, answer, and context, in that order.

Run the benchmark with your dataset using the following command:

python basic.py <dataset_name>

To run the benchmark with a specific model and dataset:

python basic.py <model> <dataset_name>

Adding new models

Add a new model to the available_models array in the basic.py file. The key should be the model name.

available_models = ["claude-3-opus-20240229", "gpt-4-1106-preview", "gpt-3.5-turbo-0125", "gpt-4"]

You also need to add the model to the calculateModelCost function. The function should return the cost of the model based on the AI provider's pricing.

def calculateModelCost(model, token_usage):
	if model == "gpt-4-0125-preview" or model == "gpt-4-1106-preview":
		cost = token_usage * 0.00003
	elif model == "gpt-4":
		cost = token_usage * 0.00006
	elif model == "gpt-3.5-turbo-0125":
		cost = token_usage * 0.0000015
	elif model == "claude-3-opus-20240229":
		cost = token_usage * 0.000075
	elif model == "<new_model>":
		cost = token_usage * <new_price>

Results

Results are added to the /results folder. You can view our results in the /001-llm-benchmark-results folder.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
dataset		dataset
dataset_generation		dataset_generation
images		images
results		results
.gitignore		.gitignore
Docmentation.md		Docmentation.md
LICENSE.md		LICENSE.md
README.md		README.md
basic.py		basic.py
custom_model.py		custom_model.py
final_evaluation.py		final_evaluation.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarking Enterprise AI

Getting Started

How it works

Installing the tool

Running the benchmark

Running using custom datasets

Adding new models

Results

About

Releases

Packages

Contributors 4

Languages

License

enterprisebot-community/BASIC-genai-benchmark

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Enterprise AI

Getting Started

How it works

Installing the tool

Running the benchmark

Running using custom datasets

Adding new models

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages