Skip to content

Conversation

KoichiYasuoka
Copy link
Contributor

Overview

pytorch-lightning==1.1.7 is too old to support recent torchtext

Notes

as mentioned in Lightning-AI/pytorch-lightning#6211 and #100

@tiberiu44 tiberiu44 merged commit b561d7f into adobe:3.0 Aug 15, 2021
tiberiu44 added a commit that referenced this pull request Aug 27, 2021
* Partial update

* Bugfix

* API update

* Bugfixing and API

* Bugfix

* Fix long words OOM by skipping sentences

* bugfixing and api update

* Added language flavour

* Added early stopping condition

* Corrected naming

* Corrected permissions

* Bugfix

* Added GPU support at runtime

* Wrong config package

* Refactoring

* refactoring

* add lightning to dependencies

* Dummy test

* Dummy test

* Tweak

* Tweak

* Update test

* Test

* Finished loading for UD CONLL-U format

* Working on tagger

* Work on tagger

* tagger training

* tagger training

* tagger training

* Sync

* Sync

* Sync

* Sync

* Tagger working

* Better weight for aux loss

* Better weight for aux loss

* Added save and printing for tagger and shared options class

* Multilanguage evaluation

* Saving multiple models

* Updated ignore list

* Added XLM-Roberta support

* Using custom ro model

* Score update

* Bugfixing

* Code refactor

* Refactor

* Added option to load external config

* Added option to select LM-model from CLI or config

* added option to overwrite config lm from CLI

* Bugfix

* Working on parser

* Sync work on parser

* Parser working

* Removed load limit

* Bugfix in evaluation

* Added bi-affine attention

* Added experimental ChuLiuEdmonds tree decoding

* Better config for parser and bugfix

* Added residuals to tagging

* Model update

* Switched to AdamW optimizer

* Working on tokenizer

* Working on tokenizer

* Training working - validation to do

* Bugfix in language id

* Working on tokenization validation

* Tokenizer working

* YAML update

* Bug in LMHelper

* Tagger is working

* Tokenizer is working

* bfix

* bfix

* Bugfix for bugfix :)

* Sync

* Tokenizer worker

* Tagger working

* Trainer updates

* Trainer process now working

* Added .DS_Store

* Added datasets for Compound Word Expander and Lemmatizer

* Added collate function for lemma+compound

* Added training and validation step

* Updated config for Lemmatizer

* Minor fixes

* Removed duplicate entries from lemma and cwe

* Added training support for lemmatizer

* Removed debug directives

* Lemmatizer in testing phase

* removed unused line

* Bugfix in Lemma dataset

* Corrected validation issue with gs labels being sent to the forward method and removed loss computation during testing

* Lemmatizier training done

* Compound word expander ready

* Sync

* Added support for FastText, Transformers and Languasito LM models

* Added multi-lm support for tokenizer

* Added support for multiword tokens

* Sync

* Bugfix in evaluation

* Added Languasito as a subpackage

* Added path to local Languasito

* Bugfixing all around

* Removed debug printing

* Bugfix for no-space languages that actually contain spaces :)

* Bugfix for no-space languages that actually contain spaces :)

* Fixed GPU support

* Biaffine transform for LAS and relative head location (RHL) for UAS

* Bugfix

* Tweaks

* moved rhl to lower layer

* Added configurable option for RHL

* Safenet for spaces in languages that should use no spaces

* Better defaults

* Sync

* Cleanup parser

* Bilinear xpos and attrs

* Added Biaffine module from Stanza

* Tagger with reduced number of parameters:

* Parser with conditional attrs

* Working on tokenizer runtime

* Tokenizer process 90% done

* Added runtime for parser, tokenizer and tagger

* Added quick test for runtime

* Test for e2e

* Added support for multiple word embeddings at the same time

* Bugfix

* Added multiple word representations for tokenizer

* moved mask_concat to utils.py

* Added XPOS prediction to pipeline

* Bugfix in tokenizer shifted word embeddings

* Using Languasito tokenizer for HF tokenization

* Bugfix

* Bugfixing

* Bugfixing

* Bugfix

* Runtime fixing

* Sync

* Added spa for FT and Languasito

* Added spa for FT and Languasito

* Minor tweaks

* Added configuration for RNN layers

* Bugfix for spa

* HF runtime fix

* Mixed test fasttext+transformer

* Added word reconstruction and MHA

* Sync

* Bugfix

* bugfix

* Added masked attention

* Sync

* Added test for runtime

* Bugfix in mask values

* Updated test

* Added full mask dropout

* Added resume option

* Removed useless printouts

* Removed useless printouts

* Switched to eval at runtime

* multiprocessing added

* Added full mask dropout for word decoder

* Bugfix

* Residual

* Added lexical-contextual cosine loss

* Removed full mask dropout from WordDecoder

* Bugfix

* Training script generation update

* Added residual

* Updated languasito to pickle tokenized lines

* Updated languasito to pickle tokenized lines

* Updated languasito to pickle tokenized lines

* Not training for seq len > max_seq_len

* Added seq limmits for collates

* Passing seq limits from collate to tokenizer

* Skipping complex parsing

* Working on word decomposer

* Model update

* Sync

* Bugfix

* Bugfix

* Bugfix

* Using all reprs

* Dropped immediate context

* Multi train script added

* Changed gpu parameter type to string, for multiple gpus int failed

* Updated pytorch_lightning callback method to work with newer version

* Updated pytorch_lightning callback method to work with newer version

* Transparently pass PL args from the command line; skip over empty compound word datasets

* Fix typo

* Refactoring and on the way to working API

* API load working

* Partial _call_ working

* Partial _call_ working

* Added partly working api and refactored everything back to cube/. Compound not working yet and tokenizer needs retraining.

* api is working

* Fixing api

* Updated readme

* Update Readme to include flavours

* Device support

* api update

* Updated package

* Tweak + results

* Clarification

* Test update

* Update

* Sync

* Update README

* Bugfixing

* Bugfix and api update

* Fixed compound

* Evaluation update

* Bugfix

* Package update

* Bugfix for large sentences

* Pip package update

* Corrected spanish evaluation

* Package version update

* Fixed tokenization issues on transformers

* Removed pinned memory

* Bugfix for GPU tensors

* Update package version

* Automatically detecting hidden state size

* Automatically detecting hidden state size

* Automatically detecting hidden state size

* Sync

* Evaluation update

* Package update

* Bugfix

* Bugfixing

* Package version update

* Bugfix

* Package version update

* Update evaluation for Italian

* tentative support torchtext>=0.9.0 (#127)

as mentioned in Lightning-AI/pytorch-lightning#6211 and #100

* Update package dependencies

Co-authored-by: Stefan Dumitrescu <sdumitre@adobe.com>
Co-authored-by: dumitrescustefan <dumitrescu.stefan@gmail.com>
Co-authored-by: Tiberiu Boros <boros@adobe.com>
Co-authored-by: Tiberiu Boros <boros@boros-macos.local>
Co-authored-by: Koichi Yasuoka <yasuoka@kanji.zinbun.kyoto-u.ac.jp>
tiberiu44 added a commit that referenced this pull request Feb 17, 2023
* Corrected permissions

* Bugfix

* Added GPU support at runtime

* Wrong config package

* Refactoring

* refactoring

* add lightning to dependencies

* Dummy test

* Dummy test

* Tweak

* Tweak

* Update test

* Test

* Finished loading for UD CONLL-U format

* Working on tagger

* Work on tagger

* tagger training

* tagger training

* tagger training

* Sync

* Sync

* Sync

* Sync

* Tagger working

* Better weight for aux loss

* Better weight for aux loss

* Added save and printing for tagger and shared options class

* Multilanguage evaluation

* Saving multiple models

* Updated ignore list

* Added XLM-Roberta support

* Using custom ro model

* Score update

* Bugfixing

* Code refactor

* Refactor

* Added option to load external config

* Added option to select LM-model from CLI or config

* added option to overwrite config lm from CLI

* Bugfix

* Working on parser

* Sync work on parser

* Parser working

* Removed load limit

* Bugfix in evaluation

* Added bi-affine attention

* Added experimental ChuLiuEdmonds tree decoding

* Better config for parser and bugfix

* Added residuals to tagging

* Model update

* Switched to AdamW optimizer

* Working on tokenizer

* Working on tokenizer

* Training working - validation to do

* Bugfix in language id

* Working on tokenization validation

* Tokenizer working

* YAML update

* Bug in LMHelper

* Tagger is working

* Tokenizer is working

* bfix

* bfix

* Bugfix for bugfix :)

* Sync

* Tokenizer worker

* Tagger working

* Trainer updates

* Trainer process now working

* Added .DS_Store

* Added datasets for Compound Word Expander and Lemmatizer

* Added collate function for lemma+compound

* Added training and validation step

* Updated config for Lemmatizer

* Minor fixes

* Removed duplicate entries from lemma and cwe

* Added training support for lemmatizer

* Removed debug directives

* Lemmatizer in testing phase

* removed unused line

* Bugfix in Lemma dataset

* Corrected validation issue with gs labels being sent to the forward method and removed loss computation during testing

* Lemmatizier training done

* Compound word expander ready

* Sync

* Added support for FastText, Transformers and Languasito LM models

* Added multi-lm support for tokenizer

* Added support for multiword tokens

* Sync

* Bugfix in evaluation

* Added Languasito as a subpackage

* Added path to local Languasito

* Bugfixing all around

* Removed debug printing

* Bugfix for no-space languages that actually contain spaces :)

* Bugfix for no-space languages that actually contain spaces :)

* Fixed GPU support

* Biaffine transform for LAS and relative head location (RHL) for UAS

* Bugfix

* Tweaks

* moved rhl to lower layer

* Added configurable option for RHL

* Safenet for spaces in languages that should use no spaces

* Better defaults

* Sync

* Cleanup parser

* Bilinear xpos and attrs

* Added Biaffine module from Stanza

* Tagger with reduced number of parameters:

* Parser with conditional attrs

* Working on tokenizer runtime

* Tokenizer process 90% done

* Added runtime for parser, tokenizer and tagger

* Added quick test for runtime

* Test for e2e

* Added support for multiple word embeddings at the same time

* Bugfix

* Added multiple word representations for tokenizer

* moved mask_concat to utils.py

* Added XPOS prediction to pipeline

* Bugfix in tokenizer shifted word embeddings

* Using Languasito tokenizer for HF tokenization

* Bugfix

* Bugfixing

* Bugfixing

* Bugfix

* Runtime fixing

* Sync

* Added spa for FT and Languasito

* Added spa for FT and Languasito

* Minor tweaks

* Added configuration for RNN layers

* Bugfix for spa

* HF runtime fix

* Mixed test fasttext+transformer

* Added word reconstruction and MHA

* Sync

* Bugfix

* bugfix

* Added masked attention

* Sync

* Added test for runtime

* Bugfix in mask values

* Updated test

* Added full mask dropout

* Added resume option

* Removed useless printouts

* Removed useless printouts

* Switched to eval at runtime

* multiprocessing added

* Added full mask dropout for word decoder

* Bugfix

* Residual

* Added lexical-contextual cosine loss

* Removed full mask dropout from WordDecoder

* Bugfix

* Training script generation update

* Added residual

* Updated languasito to pickle tokenized lines

* Updated languasito to pickle tokenized lines

* Updated languasito to pickle tokenized lines

* Not training for seq len > max_seq_len

* Added seq limmits for collates

* Passing seq limits from collate to tokenizer

* Skipping complex parsing

* Working on word decomposer

* Model update

* Sync

* Bugfix

* Bugfix

* Bugfix

* Using all reprs

* Dropped immediate context

* Multi train script added

* Changed gpu parameter type to string, for multiple gpus int failed

* Updated pytorch_lightning callback method to work with newer version

* Updated pytorch_lightning callback method to work with newer version

* Transparently pass PL args from the command line; skip over empty compound word datasets

* Fix typo

* Refactoring and on the way to working API

* API load working

* Partial _call_ working

* Partial _call_ working

* Added partly working api and refactored everything back to cube/. Compound not working yet and tokenizer needs retraining.

* api is working

* Fixing api

* Updated readme

* Update Readme to include flavours

* Device support

* api update

* Updated package

* Tweak + results

* Clarification

* Test update

* Update

* Sync

* Update README

* Bugfixing

* Bugfix and api update

* Fixed compound

* Evaluation update

* Bugfix

* Package update

* Bugfix for large sentences

* Pip package update

* Corrected spanish evaluation

* Package version update

* Fixed tokenization issues on transformers

* Removed pinned memory

* Bugfix for GPU tensors

* Update package version

* Automatically detecting hidden state size

* Automatically detecting hidden state size

* Automatically detecting hidden state size

* Sync

* Evaluation update

* Package update

* Bugfix

* Bugfixing

* Package version update

* Bugfix

* Package version update

* Update evaluation for Italian

* tentative support torchtext>=0.9.0 (#127)

as mentioned in Lightning-AI/pytorch-lightning#6211 and #100

* Update package dependencies

* Dummy word embeddings

* Update params

* Better dropout values

* Skipping long words

* Skipping long words

* dummy we -> float

* Added gradient clipping

* Update tokenizer

* Update tokenizer

* Sync

* DCWE

* Working on DCWE

---------

Co-authored-by: Stefan Dumitrescu <sdumitre@adobe.com>
Co-authored-by: Tiberiu Boros <boros@adobe.com>
Co-authored-by: Koichi Yasuoka <yasuoka@kanji.zinbun.kyoto-u.ac.jp>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants