Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Spacy version of the rule based tagger #4

Closed
apmoore1 opened this issue Sep 28, 2021 · 2 comments
Closed

Create a Spacy version of the rule based tagger #4

apmoore1 opened this issue Sep 28, 2021 · 2 comments
Labels
enhancement New feature or request license Relevance to the license of the repository.

Comments

@apmoore1
Copy link
Member

We want the Spacy version to allow for multiple languages, following their language-specific factory setup. As the rule based tagger requires a lexicon resource, we need to load in data, to do this we are going to follow option 2: Save data with the pipeline and load it in once on initialization. Option 2 was chosen as it will allow us to ship the models without the user having to specify where the data has come from, we can state in the shipped models where the data has come from with the license for the data which will reflect the license of the model.

With Option 1 we would either have to create download functions that save data to the users file system or we would have to ship the data with the python package, this would then require us to have different licenses for the code and data even though they are in the same repository.

@apmoore1 apmoore1 added enhancement New feature or request license Relevance to the license of the repository. labels Sep 28, 2021
@apmoore1
Copy link
Member Author

Whether or not to add the data for each language through the pymusas package or wether it should be automatically downloaded when creating a resource.

@apmoore1
Copy link
Member Author

Whether or not to add the data for each language through the pymusas package or wether it should be automatically downloaded when creating a resource.

We are going to go for the approach of allowing the users to supply their own resources, of which they could download USAS lexicon files from the Multilingual USAS repository

@apmoore1 apmoore1 closed this as completed Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request license Relevance to the license of the repository.
Projects
None yet
Development

No branches or pull requests

1 participant