Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of augmenty with spacy config files for training #78

Closed
Giles-Billenness opened this issue Apr 30, 2022 · 3 comments
Closed

Use of augmenty with spacy config files for training #78

Giles-Billenness opened this issue Apr 30, 2022 · 3 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@Giles-Billenness
Copy link

Giles-Billenness commented Apr 30, 2022

I didn't see any documentation on how to import these augmenters when using spacy 3.0's config and command line system when training.
Is it possible to use it in this sense?
If so, how?

apon further review, for the command line to register new augmentations, the flag:
-- code <code.py>
Needs to be set when calling the training. I have tried to point to the specific file that contains the keystroke aug that I wanted but it complains about not knowing a parent for relative imports. I also tried the various init.py files but it complained also.
It seems to work when you take the code out and place it in a new file without relative imports and point to that.

image

Which page or section is this issue related to?

https://spacy.io/usage/training#data-augmentation-custom

https://kennethenevoldsen.github.io/augmenty/tutorials/introduction.html#Applying-the-augmentation

@Giles-Billenness Giles-Billenness added the documentation Improvements or additions to documentation label Apr 30, 2022
@KennethEnevoldsen
Copy link
Owner

Hi @Giles-Billenness,

Yes as you correct you indeed need to supply the --code flag e.g. --code my_augmenters.py.

Where I believe the script my_augmenters.py could simply contain the code:

# my_augmenters.py

import augmenty

As importing augmenty will add all the augmenters to the spacy augmenter registry. This should allow you to add the following to your config:

[corpora.train.augmenter]
@augmenters = "keystroke_error.v1"
level=0.1,
keyboard="en_qwerty.v1"

If you want slightly more complex augmentation you can combine multiple augmenters using the augmenty.combine. This could looke something like this:

# my_augmenters.py

import augmenty
import spacy

# add it do the spacy registry such that you can call it from the config
@spacy.registry.augmenters("my_custom_augmenter")
def combined_augmenters():
    """A combined augmenter which add semi-realistic keystroke errors and swaps 2% of tokens. """
    key_aug = augmenty.load("keystroke_error.v1", level=0.02, keyboard="en_qwerty.v1")
    swap_aug = augmenty.load("token_swap.v1", level=0.02)
    augmenters = [key_aug, swap_aug]
    return augmenty.combine(augmenters)

And then you should be able to add to the config:

[corpora.train.augmenter]
@augmenters = "my_custom_augmenter"

For more inspiration I have somes file here were I train the Danish spaCy pipeline DaCy. For the command you can always check out the yml file and for the augmenters you can check the script: danish_augmenter.py.

Let me know if it works otherwise I will have another look at it.

@KennethEnevoldsen KennethEnevoldsen self-assigned this May 2, 2022
@Giles-Billenness
Copy link
Author

Yeah, that worked for me thank you.

@KennethEnevoldsen
Copy link
Owner

Good to know. I will close the issue - do let me know if there is any other issues with the package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants