Skip to content

SiameseLab/quesadiya

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quesadiya

PyPi

Build Status

Docs

Quesdadiya is a data annotation project management platform where you can manage a project through Command Line Interface (CLI) and annotate data on Web GUI to generate a triplet data set for developing Siamese models.

Demo

When to use?

Let's say you have a section as anchor and an article (i.e. a group of sections) as candidate. With quesadiya, you can visually select a section in the candidate, which is the most similar to the anchor.

Quickstart

Installation

You can install quesadiya by running

$ pip install quesadiya

Check installation by

$ quesadiya

Installation from Source

  1. git clone this repo.
  2. cd quesadiya.
  3. run pip install ..
  4. check installation by running quesadiya on your terminal.

Project Management

Quesadiya provides the command-line interface (CLI) to manage data annotation projects.

Create Project

You can create a data annotation project by

$ quesadiya create <project_name> <admin_name> <datapath> [OPTIONS]

For example,

$ quesadiya create queso me data/sample_triplets.jsonl
Loading input data: 5 row [00:00, 1495.40 row/s]
Admin password:
Repeat for confirmation:
Inserting data. This may take a while...
Finish creating a new project 'queso'

Caution: <datapath> must be a jsonline file, where each row must follow the format below:

{
  "anchor_sample_id": "string (max 100 char)",
  "anchor_sample_text": "list of text", // each element is a paragraph
  "anchor_sample_title": "text (nullable)",
  "candidate_group_id": "string (max 100 char)",
  "candidates": [
    "item": {
      "candidate_sample_id": "string (max 100 char)",
      "candidate_sample_text": "list of text", // each element is a paragraph
      "candidate_sample_title": "text (nullable)"
    }
  ]
}

anchor is the sample you want to compare to the positive sample and the negative sample. candidates is a list of candidates for a positive and a negative sample. The sample collaborator selects is recorded as a positive sample and quesadiya chooses a negative sample from the rest.

Tips: You can add collaborators from a jsonline file when you create a project by

$ quesadiya create queso me data/triplets.jsonl -a data/sample_collaborators1.jsonl

You can view sample data here.

Note that <collaborator_path> must be a jsonline file, where each row must follow the format below:

{
  'name': "string (max 150 char)",
  'password': "string (max 128 char)",
  'contact': "string (max 254 char)"
}

See Command Line Interface Guide for more details.

Run Project

You can annotate a data set by running quesadiya:

$ quesadiya run [OPTION]

You can specify the port number to run the quesadiya server by option. For example,

$ quesadiya run -p 4000

Quesadiya's default port number is 1133.

Once you run a project, open your browser and access http://localhost:1133/.

Then, select a project and type admin name and password.

This leads you to the admin page. In the admin page, you can do the followings:
  • view discarded samples
  • view progress of each collaborator
  • edit collaborators

Tips: Admin user cannot annotate data. If you're the admin and like to annotate samples, make a collaborator account for yourself and login with the account.

See Admin Guide for more details.

Data Annotation

Data annotation is very simple and intuitive in Quesadiya. Anchor text is shown on the left hand side of the screen and Candidates are on the right. Collaborators can either select positive sample among candidates or discard a sample if the sample is corrupted for some reason. Admin can view discarded samples and push a sample back to the project in the admin page.

Export Data

You can export a snapshot of annotated data set by

$ quesadiya export <project_name> <output_path>

The output path must be a jsonline file. Each row follows the format below:

{
  "anchor_sample_id": "text",
  "positive_sample_id": "text",
  "negative_sample_id": "text"
}

Note that this operation requires the admin privilege.

The operation above only generates a triplet data set with samples ids. If you'd like to include text for each sample, add -i option. For example,

$ quesadiya export queso data.jsonl -i

This will generate a jsonline file, where each row follows:

{
    "anchor_sample_id": "text",
    "positive_sample_id": "text",
    "negative_sample_id": "text",
    "anchor_sample_text": "list of text" // each element is a paragraph,
    "positive_sample_text": "list of text",
    "negative_sample_text": "list of text"
}

Security

A disclaimer: Quesadiya and its contributors take no responsibility for protecting your data.

That said, we encrypt all passwords using argon2.

If you'd like to prohibit any other user on your environment from accessing your data, we encourage you to change the accessibility of project folder. You can see the path to the quesadiya root by

$ quesadiya path

This command shows the absolute path to quesadiya project folder. Go to the directory, and you'll find your project folder.

About

An end-to-end annotation project management platform for Siamese models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published