Explorations in using latest LLMs for BIQA.
The data is from the Stack Overflow Developer Survey 2023.
data/eval_set_multi_answers_res.json
: Question and query pairs as list ofSQLSample
s with possibly more than one valid SQL for a question. Also results included.data/survey_results_normalized_v2.db
: The main sqlite file. Download from here: deepset/stackoverflow-survey-2023-text-sql.
Or download as:
wget -O data/survey_results_normalized_v2.db "https://drive.google.com/uc?export=download&id=1e_knoK9rYgWe8ADUw3PC8Fp6Jhnjgoms&confirm=t"
pip install -r requirements.txt
Note: To use the OpenAI models, the OPENAI_API_KEY
environment variable needs to be set. Can also put in .env
file to be loaded by python-dotenv
Create annotations i.e. fill in the predictions for the eval set:
python create_annotations.py \
-i data/eval_set_multi_answers_res.json \
-o eval_preds.json \
Evaluate manually i.e. go through each evaluation (where pred available) and label them correct or not. If labelled correct, the prediction is added to the labels/answers.
python evaluate_manually.py eval_preds.json eval_preds_manual.json
Calculate the final performance metrics:
python calculate_metrics.py eval_preds_manual.json
Schema + Examples:
python create_annotations.py \
-i data/eval_set_multi_answers_res.json \
-o eval_preds.json \
Schema + raw description
python create_annotations.py \
-i data/eval_set_multi_answers_res.json\
-o eval_preds_base_raw_desc.json \
-g base --raw-description \
Schema + column descriptions + few shot
python create_annotations.py \
-i data/eval_set_multi_answers_res.json \
-o eval_preds_base_col_desc_fs.json \
-g base -d per-column --few-shot
Agents:
python create_annotations.py \
-i data/eval_set_multi_answers_res.json \
-o eval_preds_agents.json \
-g agent
(Perfect) Retrieval:
python create_annotations.py \
-i data/eval_set_multi_answers_res.json \
-o eval_preds_retriever.json \
-g retriever -d per-column \
For the application to work with OpenAI models, the OPENAI_API_KEY
environment variable needs to be set.
You can set it directly or put it in the .env
file.
uvicorn api.main:app --reload --host="0.0.0.0" --port=8000
column_descriptions.py
: To generate the "seed" column-level description file (to be completed manually)
count_tokens.py
: Count number of tokens or to view the final prompt
retriever_analysis.py
: Analysis of the retriever performance + plot generation
view_eval_set.py
: View the data in the eval (or prediction) set
Created with this Notebook; uses this spreadsheet defining manual adjustments.