Skip to content

deepset-ai/biqa-llm

Repository files navigation

BIQA with LLMs

Explorations in using latest LLMs for BIQA.

Data

The data is from the Stack Overflow Developer Survey 2023.

  • data/eval_set_multi_answers_res.json: Question and query pairs as list of SQLSamples with possibly more than one valid SQL for a question. Also results included.
  • data/survey_results_normalized_v2.db: The main sqlite file. Download from here: deepset/stackoverflow-survey-2023-text-sql.

Or download as:

wget -O data/survey_results_normalized_v2.db "https://drive.google.com/uc?export=download&id=1e_knoK9rYgWe8ADUw3PC8Fp6Jhnjgoms&confirm=t"

Environment setup

pip install -r requirements.txt

Running the Evaluation

Note: To use the OpenAI models, the OPENAI_API_KEY environment variable needs to be set. Can also put in .env file to be loaded by python-dotenv

Create annotations i.e. fill in the predictions for the eval set:

python create_annotations.py \
       -i data/eval_set_multi_answers_res.json \
       -o eval_preds.json \

Evaluate manually i.e. go through each evaluation (where pred available) and label them correct or not. If labelled correct, the prediction is added to the labels/answers.

python evaluate_manually.py eval_preds.json eval_preds_manual.json

Calculate the final performance metrics:

python calculate_metrics.py eval_preds_manual.json

Different Approaches

Schema + Examples:

python create_annotations.py \
       -i data/eval_set_multi_answers_res.json \
       -o eval_preds.json \

Schema + raw description

python create_annotations.py \
 -i data/eval_set_multi_answers_res.json\
 -o eval_preds_base_raw_desc.json \
 -g base --raw-description \

Schema + column descriptions + few shot

python create_annotations.py \
-i data/eval_set_multi_answers_res.json \
-o eval_preds_base_col_desc_fs.json \
-g base -d per-column --few-shot

Agents:

python create_annotations.py \
 -i data/eval_set_multi_answers_res.json \
 -o eval_preds_agents.json \
 -g agent

(Perfect) Retrieval:

python create_annotations.py \ 
 -i data/eval_set_multi_answers_res.json \
 -o eval_preds_retriever.json \
 -g retriever -d per-column \

Running the API

For the application to work with OpenAI models, the OPENAI_API_KEY environment variable needs to be set.

You can set it directly or put it in the .env file.

uvicorn api.main:app --reload --host="0.0.0.0" --port=8000

Helper scripts

column_descriptions.py: To generate the "seed" column-level description file (to be completed manually) count_tokens.py: Count number of tokens or to view the final prompt retriever_analysis.py: Analysis of the retriever performance + plot generation view_eval_set.py: View the data in the eval (or prediction) set

Appendix

Data Creation

Created with this Notebook; uses this spreadsheet defining manual adjustments.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published