GitHub - deepset-ai/biqa-llm

BIQA with LLMs

Explorations in using latest LLMs for BIQA.

Data

The data is from the Stack Overflow Developer Survey 2023.

data/eval_set_multi_answers_res.json: Question and query pairs as list of SQLSamples with possibly more than one valid SQL for a question. Also results included.
data/survey_results_normalized_v2.db: The main sqlite file. Download from here: deepset/stackoverflow-survey-2023-text-sql.

Or download as:

wget -O data/survey_results_normalized_v2.db "https://drive.google.com/uc?export=download&id=1e_knoK9rYgWe8ADUw3PC8Fp6Jhnjgoms&confirm=t"

Environment setup

pip install -r requirements.txt

Running the Evaluation

Note: To use the OpenAI models, the OPENAI_API_KEY environment variable needs to be set. Can also put in .env file to be loaded by python-dotenv

Create annotations i.e. fill in the predictions for the eval set:

python create_annotations.py \
       -i data/eval_set_multi_answers_res.json \
       -o eval_preds.json \

Evaluate manually i.e. go through each evaluation (where pred available) and label them correct or not. If labelled correct, the prediction is added to the labels/answers.

python evaluate_manually.py eval_preds.json eval_preds_manual.json

Calculate the final performance metrics:

python calculate_metrics.py eval_preds_manual.json

Different Approaches

Schema + Examples:

python create_annotations.py \
       -i data/eval_set_multi_answers_res.json \
       -o eval_preds.json \

Schema + raw description

python create_annotations.py \
 -i data/eval_set_multi_answers_res.json\
 -o eval_preds_base_raw_desc.json \
 -g base --raw-description \

Schema + column descriptions + few shot

python create_annotations.py \
-i data/eval_set_multi_answers_res.json \
-o eval_preds_base_col_desc_fs.json \
-g base -d per-column --few-shot

Agents:

python create_annotations.py \
 -i data/eval_set_multi_answers_res.json \
 -o eval_preds_agents.json \
 -g agent

(Perfect) Retrieval:

python create_annotations.py \ 
 -i data/eval_set_multi_answers_res.json \
 -o eval_preds_retriever.json \
 -g retriever -d per-column \

Running the API

For the application to work with OpenAI models, the OPENAI_API_KEY environment variable needs to be set.

You can set it directly or put it in the .env file.

uvicorn api.main:app --reload --host="0.0.0.0" --port=8000

Helper scripts

column_descriptions.py: To generate the "seed" column-level description file (to be completed manually) count_tokens.py: Count number of tokens or to view the final prompt retriever_analysis.py: Analysis of the retriever performance + plot generation view_eval_set.py: View the data in the eval (or prediction) set

Appendix

Data Creation

Created with this Notebook; uses this spreadsheet defining manual adjustments.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
api		api
data		data
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
calculate_metrics.py		calculate_metrics.py
column_descriptions.py		column_descriptions.py
components.py		components.py
config.py		config.py
count_tokens.py		count_tokens.py
create_annotations.py		create_annotations.py
docker-compose.yml		docker-compose.yml
eval.py		eval.py
evaluate_manually.py		evaluate_manually.py
requirements-min.txt		requirements-min.txt
requirements.txt		requirements.txt
retrieval.py		retrieval.py
retriever_analysis.py		retriever_analysis.py
sql_common.py		sql_common.py
sql_generation.py		sql_generation.py
sql_generation_agents.py		sql_generation_agents.py
view_eval_set.py		view_eval_set.py

deepset-ai/biqa-llm

Folders and files

Latest commit

History

Repository files navigation

BIQA with LLMs

Data

Environment setup

Running the Evaluation

Different Approaches

Running the API

Helper scripts

Appendix

Data Creation

About

Resources

Stars

Watchers

Forks

Languages