Quesdadiya is a data annotation project management platform where you can manage a project through Command Line Interface (CLI) and annotate data on Web GUI to generate a triplet data set for developing Siamese models.
Let's say you have a section as anchor and an article (i.e. a group of sections) as candidate. With quesadiya, you can visually select a section in the candidate, which is the most similar to the anchor.
You can install quesadiya
by running
$ pip install quesadiya
Check installation by
$ quesadiya
git clone
this repo.cd quesadiya
.- run
pip install .
. - check installation by running
quesadiya
on your terminal.
Quesadiya provides the command-line interface (CLI) to manage data annotation projects.
You can create a data annotation project by
$ quesadiya create <project_name> <admin_name> <datapath> [OPTIONS]
For example,
$ quesadiya create queso me data/sample_triplets.jsonl
Loading input data: 5 row [00:00, 1495.40 row/s]
Admin password:
Repeat for confirmation:
Inserting data. This may take a while...
Finish creating a new project 'queso'
Caution: <datapath>
must be a jsonline file, where each row must follow the format below:
{
"anchor_sample_id": "string (max 100 char)",
"anchor_sample_text": "list of text", // each element is a paragraph
"anchor_sample_title": "text (nullable)",
"candidate_group_id": "string (max 100 char)",
"candidates": [
"item": {
"candidate_sample_id": "string (max 100 char)",
"candidate_sample_text": "list of text", // each element is a paragraph
"candidate_sample_title": "text (nullable)"
}
]
}
anchor
is the sample you want to compare to the positive sample and the negative sample. candidates
is a list of candidates for a positive and a negative sample. The sample collaborator selects is recorded as a positive sample and quesadiya
chooses a negative sample from the rest.
Tips: You can add collaborators from a jsonline file when you create a project by
$ quesadiya create queso me data/triplets.jsonl -a data/sample_collaborators1.jsonl
You can view sample data here.
Note that <collaborator_path>
must be a jsonline file, where each row must follow the format below:
{
'name': "string (max 150 char)",
'password': "string (max 128 char)",
'contact': "string (max 254 char)"
}
See Command Line Interface Guide for more details.
You can annotate a data set by running quesadiya:
$ quesadiya run [OPTION]
You can specify the port number to run the quesadiya server by option. For example,
$ quesadiya run -p 4000
Quesadiya's default port number is 1133
.
Once you run a project, open your browser and access http://localhost:1133/.
Then, select a project and type admin name and password.
- This leads you to the admin page. In the admin page, you can do the followings:
- view discarded samples
- view progress of each collaborator
- edit collaborators
Tips: Admin user cannot annotate data. If you're the admin and like to annotate samples, make a collaborator account for yourself and login with the account.
See Admin Guide for more details.
Data annotation is very simple and intuitive in Quesadiya. Anchor text is shown on the left hand side of the screen and Candidates are on the right. Collaborators can either select
positive sample among candidates or discard a sample if the sample is corrupted for some reason. Admin can view discarded samples and push a sample back to the project in the admin page.
You can export a snapshot of annotated data set by
$ quesadiya export <project_name> <output_path>
The output path must be a jsonline file. Each row follows the format below:
{
"anchor_sample_id": "text",
"positive_sample_id": "text",
"negative_sample_id": "text"
}
Note that this operation requires the admin privilege.
The operation above only generates a triplet data set with samples ids. If you'd like to include text for each sample, add -i option. For example,
$ quesadiya export queso data.jsonl -i
This will generate a jsonline file, where each row follows:
{
"anchor_sample_id": "text",
"positive_sample_id": "text",
"negative_sample_id": "text",
"anchor_sample_text": "list of text" // each element is a paragraph,
"positive_sample_text": "list of text",
"negative_sample_text": "list of text"
}
A disclaimer: Quesadiya and its contributors take no responsibility for protecting your data.
That said, we encrypt all passwords using argon2.
If you'd like to prohibit any other user on your environment from accessing your data, we encourage you to change the accessibility of project folder. You can see the path to the quesadiya root by
$ quesadiya path
This command shows the absolute path to quesadiya project folder. Go to the directory, and you'll find your project folder.