-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interactive usage of Doccano for semisupervised learning and interactive machine teaching #6
Comments
Thank you for a nice feature request! We consider doccano supports active learning/semi-supervised situation. You can call API because doccano works REST API based architecture. But we now refactoring code and documentation, so please give us a little time to show an example 🙏 We also consider the feature for importing labeled dataset. It's easy than labeling by API dynamically. Does this import feature is enough for your request? API Based Labeling
Labeled Data Import
|
Thank you for the answers. The labeled data import is definitely easier like you say, both to develop and to use. On the other hand it would not offer as many opportunities as the other possibility: if doccano could be used for online/dynamic learning I think it could become an even more interesting and flexible application. |
If the API piece is tricky because of security and the inconvenience of needing a separate service, and the labeled data import is static, what about an admin-only page for applying a model to unlabeled examples and potentially also updating the model?
Might be a middle-ground in terms of both functionality and implementation difficulty. |
I implemented a Command class that lets you run The main problem with this implementation is that I don't have a great way to specify a document has been I'm imagining that the simplest way to do that would be to either
How does the feature/autolabeling branch intend on handling the difference between a Here's the implementation. Once I come up with a decent means of iterating on this, I'll PR something similar
|
Thank you for the great suggestion. As you proposed, we considered the way to set project-specific model. One of the demerit of this way is that user have to decide the model structure before starting annotation. You may think we can change the model during the annotation (active-learning), but there is a research that suggests active-learning is not transferable (so autolabeling feature is pending). How transferable are the datasets collected by active learners? Because of this, we consider to implement simple/easy way at first. And additional features support would depends on future reseaches. |
@icoxfog417 The piece of research is really interesting. However, practically speaking IMO people still will be using active learning even in face of these considerations because it is much much less expensive. I would request you to reconsider including this pull request. |
Hii, is there anyone working on this feature? |
The internal processing of doccano is realized by Web API. Therefore, the annotation from the program can be written as follows: import requests
class Client(object):
def __init__(self, entrypoint='http://127.0.0.1:8000', username=None, password=None):
self.entrypoint = entrypoint
self.client = requests.Session()
self.client.auth = (username, password)
def fetch_projects(self):
url = f'{self.entrypoint}/v1/projects'
response = self.client.get(url)
return response
def create_project(self, name, description, project_type):
mapping = {'SequenceLabeling': 'SequenceLabelingProject',
'DocumentClassification': 'TextClassificationProject',
'Seq2seq': 'Seq2seqProject'}
data = {
'name': name,
'project_type': project_type,
'description': description,
'guideline': 'Hello',
'resourcetype': mapping[project_type]
}
url = f'{self.entrypoint}/v1/projects'
response = self.client.post(url, json=data)
return response.json()
def fetch_documents(self, project_id):
url = f'{self.entrypoint}/v1/projects/{project_id}/docs'
response = self.client.get(url)
return response.json()
def add_document(self, project_id, text):
data = {
'text': text
}
url = f'{self.entrypoint}/v1/projects/{project_id}/docs'
response = self.client.post(url, json=data)
return response.json()
def fetch_labels(self, project_id):
url = f'{self.entrypoint}/v1/projects/{project_id}/labels'
response = self.client.get(url)
return response.json()
def add_label(self, project_id, text):
data = {
'text': text
}
url = f'{self.entrypoint}/v1/projects/{project_id}/labels'
response = self.client.post(url, json=data)
return response.json()
def fetch_annotations(self, project_id, doc_id):
url = f'{self.entrypoint}/v1/projects/{project_id}/docs/{doc_id}/annotations'
response = self.client.get(url)
return response.json()
def annotate(self, project_id, doc_id, data):
url = f'{self.entrypoint}/v1/projects/{project_id}/docs/{doc_id}/annotations'
response = self.client.post(url, json=data)
return response.json()
if __name__ == '__main__':
client = Client(username='username', password='password')
project = client.create_project(name='NER project',
description='example',
project_type='SequenceLabeling')
doc = client.add_document(project_id=project['id'],
text='Obama')
label = client.add_label(project_id=project['id'],
text='PERSON')
data = {
'start_offset': 0,
'end_offset': 5,
'label': label['id'],
'prob': 0.8
}
client.annotate(project_id=project['id'],
doc_id=doc['id'],
data=data)
annotations = client.fetch_annotations(project_id=project['id'],
doc_id=doc['id']) In the future, we plan to enable automatic labeling from the web. |
Continue the discussion on #6. |
Hi, I implemented this service to automatically fetch any document that already annotated. But when I run the program, why it always get Response 403? This is quite strange because when I go to the rest Django endpoint in a browser i.e v1/projects, it is not returning any error. |
This is caused by the authorization. Simply modify the
|
Is it possible to use doccano interactively, i.e. have it exchange data with another service, for example with a python program (in isolation or inside a jupyter notebook) through API.
The use case I am envisioning is to connect doccano to a spacy NLP pipeline, where the annotated sequences from spacy NLP processing are fed inside doccano for control and annotation, allowing the user to correct mistakes in the assignments (e.g. NER assignment or POS tagging) and whenever a new sentence is confirmed by the user, it is sent back to the pipeline controller which uses it as part of dynamic training of the internal model to improve it's efficiency.
In other words using doccano for a semi supervised learning with interactive machine teaching.
If yes, please explain a little how to proceed about integrating and data exchange. Right now I only could see it exporting the performed annotations as csv in IOB format.
The text was updated successfully, but these errors were encountered: