Skip to content

Convert Annotation Output (JSONL) From Doccano To Spacy Training Ready BILOU Format.

License

Notifications You must be signed in to change notification settings

abtExp/doccano_to_bilou

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

doccano_to_bilou

Convert Annotation Output (JSONL) From Doccano To Spacy Training Ready BILOU Format.

Problem

Doccano exports the annotation data in JSONL format which isn't directly supported for spacy training. Doccano does have an official tool for conversion called doccano_transformer but it has a lot of issues and isn't being actively maintained.

Solution

This script converts the doccano output from JSONL to spacy compatible json in BILOU(Begin, Inside, Last, Unit, Out) format, which is another form of IOB encoding.

Steps to use

    1. Clone The Repo
    1. Run The Script
> python convert.py 'file_path'

The script will save the output to the same directory by the name annotation_iob.json

About

Convert Annotation Output (JSONL) From Doccano To Spacy Training Ready BILOU Format.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages