leettools/eval at main · leettools-dev/leettools

History

Name		Name	Last commit message	Last commit date
parent directory ..
data_preprocess		data_preprocess
strategies		strategies
test		test
utils		utils
Readme.md		Readme.md
__init__.py		__init__.py
download_financebench.sh		download_financebench.sh
eval-v2.py		eval-v2.py
eval.py		eval.py
eval_package_structure.md		eval_package_structure.md
evaluation-on-benchmarks.md		evaluation-on-benchmarks.md
pytest.ini		pytest.ini

Readme.md

Important! A newer version of the evaluation program is under development. Please refer to evaluation-on-financebench for more info

How to run the evaluation program

First we need to install the required packages. To do this, run the following command:

% pip install -r dev-requirements.txt

We assume the enviroment variables or .env file are set up correctly. If not, please refer to the main README file.

To get an example configuration file, run the following command:

# to save the file in the current directory
% python eval.py -p > eval.conf.json

Check the eval.conf.json to change input files and sample questions as needed.

To run the evaluation program, run the following command:

% python eval.py -f eval.conf.json

By default, the evaluation program will

ingest the file specified in the input_files property, we can specify directories in the list as well.
run the queries specified in the sample_questions property against the ingested data as a single knowledge base
evaluate the query out using the RAGAS framework and output the metrics, currently default to context_recall, faithfulness, factual_correctness

How to run the standalone pipeline

To run the standalone pipeline, run the following command under the root leettools directory

create a new eval.conf.json file with the following content:

% python -m eval.eval-v2 -c eval.conf.json

run the standalone conversion pipeline

% python -m eval.eval-v2 -i convert

run the standalone chunker and embedder pipeline

% python -m eval.eval-v2 -i embed

run the evaluation after ingestion

# if you skip chunker and embedder, you can still run evaluation but get all zeros for metrics. Otherwise, you should get metrics with values all close to 1.
% python -m eval.eval-v2 -e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval

eval

Readme.md

How to run the evaluation program

How to run the standalone pipeline

Files

eval

Directory actions

More options

Directory actions

More options

Latest commit

History

eval

Folders and files

parent directory

Readme.md

How to run the evaluation program

How to run the standalone pipeline