-
Notifications
You must be signed in to change notification settings - Fork 22
Annotation Tutorial
After the Installation, we can start both the annotation server and the model server. Make sure you can open the link of the annotation server, say it is 127.0.0.1:8080
and you can visit "http://0.0.0.0:8080/login". Note that the link "http://0.0.0.0:8080/" will redirect to our homepage.
First, the administrator should log in the user management system (http://127.0.0.1:8080/admin/) with the admin account name and password (you created this before when you installed the framework).
Then, the admin can add accounts for annotators (with only the permission to annotate sentences) following the guide on the page.
The admin then can go to the project management system (http://127.0.0.1:8080/projects/) to create a new project to host the annotation over a corpus.
The admin needs to convert the dataset into CSV or JSON format. Then upload it to the server.
import json
with open('dataset.txt', 'a') as dataset_file:
for I, sent in enumerate(sentences):
data = {}
data['text'] = sent
data['external_id'] = I
json.dump(data, dataset_file)
dataset_file.write('\n')
When you first create the project, you need to pre-define the labels first for the online learning. Online learning needs mappings of words and labels to feature indices at the initial stage
There are two sections (Annotation, Recommendation section).
- You can directly annotate in the upper section by simply selecting the spans
- To use the recommendation in the lower section, click the suggested span which is underlined. suggested type is bounded with red line. to confirm the annotation, click a suggested type or press a shortcut key.
You can optionally enable the embeddings, recommendation options, active learning methods and set the batch size, epoch and acquire size.
- Embedding supports : Glove, Word2Vec, FastText, ELMO, GPT, BERT
- Recommendation options : Noun Chunk, Model Inference, Dictionary Match
- Active Learning options : MNLP (Maximum Normalized Log-Probability)
Final recommendations are merged from three options with the priority:
Noun Chunk < Model Inference < Dictionary Match