Skip to content

VectorInstitute/masksql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MaskSQL

Table of Contents

Installation and Setup Instructions

System Requirements

The development environment (tested on python 3.11) can be set up using uv. Hence, make sure it is installed and then run:

uv sync --dev
source .venv/bin/activate

Download Dataset

Download this zip file and extract it to the data directory:

wget -O data.zip "https://www.dropbox.com/scl/fi/vtraf79vfi1x105veaflk/data.zip?rlkey=7yq6d46aer6h45pdihrc9rht1&st=zdac3rqx&dl=0"
unzip data.zip

Your data directory should look like this:

data/
├── databases/
├── 1_input.json
.
.
.

Set Environment Variables

cp .env.example .env

The only required variable to set is OPENAI_API_KEY. By default, we are using OpenRouter, so you need to set the api key for OpenRouter.

You may also change the LIMIT variable to modify the number of entries to be read from the dataset. START specifies the start index for reading from the dataset.

For instance, set LIMIT=10 to run the pipeline for a dataset of size 10.

SLM_MODEL and LLM_MODEL specify the ID of small/large language models to be used in the pipeline. These IDs should be set based on the LM provider being used. For instance, since we are using OpenRouter, model identifiers should be specified accordingly, e.g., openai/gpt-4.1 for GPT-4.1.

Run RESDSQL

To run MaskSQL, first we need to filter the schema items using RESDSQL. Follow these instructions to run the RESDSQL and generated the file needed for the MaskSQL pipeline. Then, you need to run the MaskSQL with the --resd option.

Run MaskSQL

Then you can run MaskSQL pipline as follows:

python3 main.py --resd

MaskSQL saves the intermediate results to files for later user. So, in order to run the pipeline from scratch you need to clean the data directory:

./clean.sh data

About

MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •