This repository contains a web-based application for evaluating Large Language Models.
- Generate test questions on a given topic.
- Provide your own custom questions.
- Customize the evaluation prompt.
- View evaluation results and average scores.
-
Clone the repository:
git clone https://github.com/BurnyCoder/llm-evals.git cd llm-evals -
Create a virtual environment and install dependencies:
python -m venv venv venv\Scripts\activate # On Windows # source venv/bin/activate # On macOS/Linux pip install -r requirements.txt
-
Create a
.envfile in the root directory and add your OpenAI API key:OPENAI_API_KEY=your_api_key_here
Run the Flask application:
python app.pyOpen your web browser and go to http://127.0.0.1:5000.
