Skip to content

feature: end-to-end NER pipeline#664

Merged
JulesBelveze merged 17 commits intorelease/1.2.0from
feature/end-to-end-pipelines
Aug 1, 2023
Merged

feature: end-to-end NER pipeline#664
JulesBelveze merged 17 commits intorelease/1.2.0from
feature/end-to-end-pipelines

Conversation

@JulesBelveze
Copy link
Copy Markdown
Contributor

@JulesBelveze JulesBelveze commented Jul 24, 2023

Description

This PR aims at providing an end to end pipeline to perform the following workflow:

- train a model on a given dataset
- evaluate the model on a given test dataset
- test the trained model on a set of tests
- augment the training set based on the tests outcome
- retrain the model on a the freshly generated augmented training set
- evaluate the retrained model on the test dataset
- compare the performance of the two models

This way the user is able to train a model and tests behaviours that matter using langtest. Based on the outcome of those tests langtest will augment the original training set with samples on which the model failed. The model will then be retrained on this augmented dataset and compared to the original on the generated set of tests.

It for now supports the transformers library and the NER task. The datasets can be passed in conll or csv format.

Usage

To use the end to end pipeline you can run the following one liner with your own parameters:

python langtest/pipelines/transformers_pipelines.py run \
    --model-name=MODEL_NAME \
    --train-data=TRAIN_FILE \
    --eval-data=EVAL_FILE \
    --training-args=ARGS_DICT \
    --feature-col=NAME_OF_FEATURE_COL \
    --target-col=NAME_OF_TARGET_COL

for example:

python langtest/pipelines/transformers_pipelines.py run \
    --model-name="bert-base-uncased" \
    --train-data=train.csv \
    --eval-data=tesrt.csv \
    --training-args='{"per_device_train_batch_size": 4}' \
    --feature-col="tokens" \
    --target-col="ner_tags"

Checklist:

  • I've added Google style docstrings to my code.
  • I've used pydantic for typing when/where necessary.
  • I have linted my code
  • I have added tests to cover my changes.

@JulesBelveze JulesBelveze self-assigned this Jul 24, 2023
@JulesBelveze JulesBelveze linked an issue Jul 24, 2023 that may be closed by this pull request
@JulesBelveze
Copy link
Copy Markdown
Contributor Author

I am stuck trying to update the poetry.lock file.. For some reason poetry gets stuck trying to resolve the dependencies

@JulesBelveze JulesBelveze merged commit 1c120c9 into release/1.2.0 Aug 1, 2023
@JulesBelveze JulesBelveze deleted the feature/end-to-end-pipelines branch August 1, 2023 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide users with NER HF end-to-end pipeline

3 participants