Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use renamed tokenizer file name #56

Merged
merged 1 commit into from
Feb 19, 2024
Merged

fix: use renamed tokenizer file name #56

merged 1 commit into from
Feb 19, 2024

Conversation

lllAlexanderlll
Copy link
Contributor

No description provided.

@le1nux le1nux requested review from le1nux and flxst February 19, 2024 09:49
@lllAlexanderlll lllAlexanderlll merged commit f16c409 into main Feb 19, 2024
4 checks passed
@lllAlexanderlll lllAlexanderlll deleted the fix/tests branch February 19, 2024 10:05
luzian-hahn added a commit that referenced this pull request Mar 11, 2024
commit 0807555
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Mar 7 18:33:39 2024 +0100

    refactor: deleted failing legacy test

commit dd0db07
Merge: 095e491 4821804
Author: Luzian Hahn <145655920+luzian-hahn@users.noreply.github.com>
Date:   Thu Mar 7 10:29:09 2024 +0100

    Merge pull request #48 from Modalities/feat/merge-pbin-files

    feat: merge utility for pbin files

commit 4821804
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Thu Mar 7 10:27:28 2024 +0100

    docs: add hint about updated header structure

commit b34d6cb
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Thu Mar 7 10:19:54 2024 +0100

    refactor: remove unused utility

commit 7d05448
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 27 16:00:38 2024 +0100

    refactor: remove redundant check for valid pbin files

commit 2e27335
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Mon Feb 5 18:21:51 2024 +0100

    feat: add entrypoint for pbin-merge

commit 8ffc095
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Mon Feb 5 18:16:06 2024 +0100

    refactor: introduce entrypoint group "data"

commit a0d13a3
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Mon Feb 5 15:06:18 2024 +0100

    feat: add pbin-merger

commit 9f853cf
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Mon Feb 5 11:36:49 2024 +0100

    refactor: introduce abstraction for stream data below packed Datasets

commit 095e491
Merge: 419fc9e 0f3846a
Author: Luzian Hahn <145655920+luzian-hahn@users.noreply.github.com>
Date:   Thu Mar 7 09:38:53 2024 +0100

    Merge pull request #40 from Modalities/perf/benchmark-datasets-again-megatronlm

    perf: benchmark datasets against megatronlm

commit 0f3846a
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Thu Mar 7 09:28:27 2024 +0100

    test: prevent unnecessary warnings during tests

commit f2232c3
Merge: 9095ac5 419fc9e
Author: Luzian Hahn <145655920+luzian-hahn@users.noreply.github.com>
Date:   Thu Mar 7 08:45:14 2024 +0100

    Merge branch 'main' into perf/benchmark-datasets-again-megatronlm

commit 419fc9e
Merge: 8ab29d0 d192331
Author: Max Lübbering <2804731+le1nux@users.noreply.github.com>
Date:   Mon Mar 4 12:25:00 2024 +0100

    Merge pull request #65 from David-Berghaus/Fix-typos

    Fixed typos

commit d192331
Author: David Berghaus <machs3ll@gmail.com>
Date:   Mon Mar 4 12:12:47 2024 +0100

    Fixed typos

commit 8ab29d0
Merge: d71bceb f9b0f41
Author: Mehdi Ali <33023925+mali-git@users.noreply.github.com>
Date:   Fri Mar 1 15:59:01 2024 +0100

    Merge pull request #45 from Modalities/hierarchical_instantiation

    Hierarchical instantiation

commit f9b0f41
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Mon Feb 26 16:53:36 2024 +0100

    chore: fix linting

commit 042e3a0
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Mon Feb 26 16:00:39 2024 +0100

    refactor: fix typos

commit 8345e06
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 26 15:24:25 2024 +0100

    refactor: fixed the library usage exampe

commit cd2128d
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 26 15:24:00 2024 +0100

    refactor: replaced absolute paths with relative ones

commit 9ab6654
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 26 15:23:06 2024 +0100

    fix: fixed add_custom_component in Main

commit 64b785a
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Mon Feb 26 14:18:06 2024 +0100

    fix: skipping of tests in non-distributed environment

commit c7f7a7b
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 26 13:35:24 2024 +0100

    chore: minor changes in TestFSDPToDiscCheckpointing

commit 10538ac
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 26 13:24:10 2024 +0100

    refactor: also using ComponentEntity now in the tests

commit 432426b
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 26 13:23:46 2024 +0100

    refactor: fixed failing test_e2e_training_run_wout_ckpt

commit 63829e1
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 26 13:20:43 2024 +0100

    chore: excluded openGPTx from test cov

commit 12632fd
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 26 12:57:16 2024 +0100

    refactor:  introduced ComponentEntity

commit c15de17
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 26 12:09:14 2024 +0100

    refactor: various smaller changes

commit 973909d
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Mon Feb 26 10:58:27 2024 +0100

    refactor: sort classes in config

commit bc64ee0
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Mon Feb 26 10:52:21 2024 +0100

    refactor: remove RegistryFactory

commit b9dbe2e
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Mon Feb 26 10:19:24 2024 +0100

    refactor: rename and fix readme for getting started example

commit ca74340
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Feb 25 16:00:17 2024 +0100

    feat: added activation checkpointing to __main__.py

commit 7ae2234
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 24 21:19:44 2024 +0100

    refactor: fixed some of the configs

commit bcd6e5b
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 24 21:16:08 2024 +0100

    feat: experiment_id now set in the config via omega conf resolver

commit a6ea22a
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 24 14:03:46 2024 +0100

    refactor: gpt2 config for checkpointing tests

commit ff3eb52
Merge: 64617dd fb0aea5
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 24 14:01:15 2024 +0100

    chore: Merge branch 'hierarchical_instantiation' of github.com:Modalities/modalities into hierarchical_instantiation

commit 64617dd
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 24 14:00:35 2024 +0100

    feat: added add_custom_component function to Main

commit df4f971
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 24 13:59:33 2024 +0100

    test: fixed fsdp test, but cannot be run directly via pytest as it needs torchrun

commit fb0aea5
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Sat Feb 24 10:51:51 2024 +0100

    fix: replace conint/confloat correctly

commit fd07cb0
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 23 19:39:12 2024 +0100

    refactor: made base_model_to_dict public as it is great for testing

commit aa0d64f
Author: Max Lübbering <2804731+le1nux@users.noreply.github.com>
Date:   Fri Feb 23 18:31:54 2024 +0100

    Update README.md

commit e70f3a0
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Fri Feb 23 17:57:15 2024 +0100

    fix: replace conint/confloat for pydantic 3.0 compatibility

commit 70d9e63
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 23 17:40:38 2024 +0100

    chore: more documentation

commit 2396020
Merge: a68ddf4 021b7c2
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 23 17:39:09 2024 +0100

    chore: Merge branch 'hierarchical_instantiation' of github.com:Modalities/modalities into hierarchical_instantiation

commit a68ddf4
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 23 16:57:23 2024 +0100

    feat: added example for registering a custom component

commit 021b7c2
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Fri Feb 23 11:38:32 2024 +0100

    refactor: restored base_model_to_dict

commit b619b41
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Fri Feb 23 09:32:31 2024 +0100

    refactor: replace base_model_to_dict by pydantic built-in method

commit 34c6498
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Fri Feb 23 09:26:44 2024 +0100

    refactor: fixed typing for registry

commit 52ffea4
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 22 17:59:20 2024 +0100

    fix: fixed failing end 2 end test

commit b0bd296
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 22 17:58:38 2024 +0100

    fix: eval_dataloaders are now treated as list instead of dict. This was not reflected yet in the subscriber factory

commit cbf905b
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 22 17:47:53 2024 +0100

    fix: checkpointing test

commit a42a479
Merge: 26b8b82 e3b50f6
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 22 17:33:21 2024 +0100

    chore: Merge branch 'hierarchical_instantiation' of github.com:Modalities/modalities into hierarchical_instantiation

commit 26b8b82
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 22 17:32:41 2024 +0100

    refactor: we fully support the configs again for hierarchical instantiation

commit 9dfd100
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 22 17:31:45 2024 +0100

    refactor: eval_dataloaders are subsumed in a list now

commit e3b50f6
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Thu Feb 22 12:39:17 2024 +0100

    refactor: unification of Pydantic*IF classes

commit 7c4fafb
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Thu Feb 22 09:24:42 2024 +0000

    chore: enabled pytest discovery with all tests. Some tests still need to be fixed!

commit 34dc796
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Thu Feb 22 10:24:09 2024 +0100

    refactor: renaming for consistency

commit 2d8349d
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Thu Feb 22 08:45:23 2024 +0000

    fix: e2e test

commit cc60608
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Thu Feb 22 08:10:43 2024 +0000

    fix: set FIXME for fsdp_to_disc_checkpointing_test and fix oudated config test

commit fdfb90a
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 19:03:04 2024 +0100

    chore: fixed variable naming

commit 1de69c3
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 18:59:23 2024 +0100

    refactor: merged remote to local and refactored callback_interval_in_batches to callback_interval_in_samples in the config

commit e1dd046
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Wed Feb 21 15:22:33 2024 +0000

    fix: test discovery under vscode. TODO: replace PretrainedGPTConfig by correct class

commit cd5ec46
Merge: 281f20f e16dec9
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 13:15:31 2024 +0100

    chore: Merge branch 'hierarchical_instantiation' of github.com:Modalities/modalities into hierarchical_instantiation

commit 281f20f
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:56:45 2024 +0100

    refactor: moved LookupEnum to dedicated file to fix circular imports

commit e433913
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:55:34 2024 +0100

    refactor: removed types.py

commit 2c3762b
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:48:12 2024 +0100

    chore: import fix

commit 4f07fc9
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:47:50 2024 +0100

    feat: added checkpointed model and fsdp wrapped model to registry factory

commit 2ba8edd
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:46:26 2024 +0100

    chore: fixed import in registry factory

commit 76b4240
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:46:03 2024 +0100

    chore: minor fix

commit 417e0ed
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:45:48 2024 +0100

    refactor: deleted checkpointing factory

commit b056ddd
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:45:09 2024 +0100

    refactor: we always instantiate the LLMDataloader with a ResumableBatchSampler now

commit cd5e6fe
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:43:20 2024 +0100

    refactor: config_new.py renamed to config.py

commit f39051f
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:41:48 2024 +0100

    refactor: deleted lookup_types

commit c971bb0
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 21 12:39:47 2024 +0100

    refactor: removed resolver_register

commit 3371b39
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:37:11 2024 +0100

    refactor: __main__.py now is capable of instantiating hierarchical configs

commit b5f3d4d
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:34:25 2024 +0100

    refactor: refactored FSDPToDiscCheckpointing to use ModelFactory.get_fsdp_wrapped_model

commit 29aee7d
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:33:06 2024 +0100

    chore: ProcessGroupBackendType inherits now from LookupEnum

commit 197f863
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:32:36 2024 +0100

    feat: implemented OptimizerFactory

commit 8d1bb9e
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:32:12 2024 +0100

    feat: added model factory

commit 8b9dc20
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:31:40 2024 +0100

    feat: introduced CudaEnv

commit 89fa61c
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:31:15 2024 +0100

    chore: MixedPrecisionSettings inherits now from LookupEnum

commit 4037db2
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:30:50 2024 +0100

    refactor: removed running env

commit eb9f5b5
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:30:32 2024 +0100

    feat: added Settings basemodel to config and refactored FSDPToDiscCheckpointingConfig

commit c60d689
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 21:29:29 2024 +0100

    refactor: restructured config lorem ipsum

commit d9d8925
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 20 20:22:49 2024 +0100

    fix: bug fix in component factory

commit e16dec9
Merge: 4c17abb d71bceb
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Mon Feb 19 20:58:25 2024 +0100

    chore: merge main into hierarchical_instantiation

commit 4c17abb
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Mon Feb 19 15:52:59 2024 +0100

    refactor: unification of component registry and config registry

commit d71bceb
Author: Alexander Weber <alex.a.weber@gmx.de>
Date:   Mon Feb 19 15:29:09 2024 +0100

    Update README.md

commit 95bfc55
Merge: f16c409 a0b799a
Author: Alexander Weber <alex.a.weber@gmx.de>
Date:   Mon Feb 19 15:25:29 2024 +0100

    Merge pull request #52 from Modalities/chore/add-pytest-coverage

    chore: add pytest coverage

commit a0b799a
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 14:22:49 2024 +0000

    chore: clean gitignore

commit 5361ca5
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 14:11:00 2024 +0000

    chore: add toml support

commit 4047b67
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 14:07:13 2024 +0000

    chore: try fix from 2021

commit 20b1460
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 13:54:42 2024 +0000

    chore: remove outdated .coverage.toml

commit ec495a3
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 13:45:58 2024 +0000

    chore: remove --cov from github action

commit 920ccab
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 13:45:13 2024 +0000

    chore: add coverage options in pyproject.toml

commit a3ce9b1
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 19 14:44:59 2024 +0100

    feat: integrated message subscribers

commit b324c3f
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 19 14:41:37 2024 +0100

    refactor: refactored dataloader and its factory

commit f686268
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 13:41:30 2024 +0000

    chore: add pytest --cov arguments by default

commit f1e3155
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 13:36:22 2024 +0000

    chore: search for coverage bug

commit 4122c6c
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 13:31:43 2024 +0000

    chore: search for coverage bug

commit f43b81f
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 13:08:27 2024 +0000

    chore: fix coveralls github action

commit 81292e8
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 19 14:03:40 2024 +0100

    refactor: moved OpenGPTXDatasetWrapper to DatasetFactory

commit bc56246
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 12:41:54 2024 +0000

    chore: add pytest-cov execution as github action

commit f16c409
Merge: a0513e3 bc03021
Author: Alexander Weber <alex.a.weber@gmx.de>
Date:   Mon Feb 19 11:05:36 2024 +0100

    Merge pull request #56 from Modalities/fix/tests

    fix: use renamed tokenizer file name

commit bc03021
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 09:48:47 2024 +0000

    fix: use renamed tokenizer file name

commit a0513e3
Merge: b8117b1 76e0518
Author: Alexander Weber <alex.a.weber@gmx.de>
Date:   Mon Feb 19 10:26:45 2024 +0100

    Merge pull request #38 from Modalities/fix/tests-on-cpu

commit 76e0518
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 09:24:48 2024 +0000

    chore: moved if statement into torch.device

commit b8117b1
Merge: 1c99963 78b9645
Author: Alexander Weber <alex.a.weber@gmx.de>
Date:   Mon Feb 19 10:11:56 2024 +0100

    Merge pull request #42 from Modalities/fix/linting

    fix: lint all files

commit 78b9645
Merge: 5b60c2f 1c99963
Author: Alexander Weber <12560547+lllAlexanderlll@users.noreply.github.com>
Date:   Mon Feb 19 09:05:44 2024 +0000

    chore: local merge

commit 2267605
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Feb 18 23:27:27 2024 +0100

    feat: towards subscriber support with hierarchical instantiation

commit a449119
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Feb 18 23:25:40 2024 +0100

    chore: minor changes

commit aab3fa2
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Feb 18 23:24:58 2024 +0100

    feat: implemented subscriber factory

commit 1c99963
Merge: a8b6563 cf27873
Author: Max Lübbering <2804731+le1nux@users.noreply.github.com>
Date:   Sun Feb 18 22:45:14 2024 +0100

    Merge pull request #29 from Modalities/feat/contrastive_loss

    Add Noise Contrastive Estimation Loss

commit 6baf221
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 17:11:11 2024 +0100

    feat: added LLM dataloader support

commit 8ab04a5
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 17:10:16 2024 +0100

    feat: introduced CollateFnIF for colleate functions

commit 018c278
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 17:00:02 2024 +0100

    feat: added resumable batch sampler

commit 1273c31
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 16:53:57 2024 +0100

    feat: added gpt_2 collator support

commit 536447c
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 16:44:57 2024 +0100

    feat: added batch sampler support

commit 771eab1
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 16:18:51 2024 +0100

    feat: added PydanticDatasetIF for SamplerConfig

commit f1c1be4
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 15:55:47 2024 +0100

    feat: added support for the different dataset formats

commit 0824bb0
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 15:38:51 2024 +0100

    refactor: added adaptations that were injected in the dataloader factory previously

commit 6985fad
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 15:26:22 2024 +0100

    feat: implemented dataset factory for various dataset types

commit 81022f4
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 14:33:37 2024 +0100

    feat: added gpt2 tokenizer support

commit 55c0110
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 13:35:19 2024 +0100

    feat: added adamw support

commit 4a6a415
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 13:32:30 2024 +0100

    feat: implemented OptimizerFactory

commit c2bd570
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sat Feb 17 13:31:59 2024 +0100

    fix: added root-level to dict function for basemodel to prevent recursive model dumps

commit 90207ed
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 16 20:05:47 2024 +0100

    refactor: started refactoring the lorem ipsum config towards the new hierarchical configs

commit 1304241
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 16 20:05:24 2024 +0100

    refactor: Main makes partially use of the hierarchical instantiation now

commit f7dfe31
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 16 20:04:54 2024 +0100

    refactor: Refactored CheckpointingFactory

commit 38499c4
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 16 20:04:26 2024 +0100

    refactor: removed unused atribute in Checkpointing

commit 542ba75
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 16 20:04:08 2024 +0100

    fix: bugfix in component factory

commit d446260
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 16 20:03:54 2024 +0100

    feat: added new configs in separate file for now

commit 6d121f3
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 16 20:03:18 2024 +0100

    feat: added more components to registry factory

commit fb3b35f
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 16 20:02:48 2024 +0100

    refactor: refactored FSDPRunningEnvConfig

commit 8eda99c
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 15 23:37:33 2024 +0100

    refactor: refactored component factory to use the registry

commit 41be773
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 15 23:36:57 2024 +0100

    feat: added registry factory

commit 3ebb656
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 15 23:36:34 2024 +0100

    feat: implemented registry

commit f2164a8
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 15 23:36:00 2024 +0100

    test: configs now use the new format without typehints

commit 5fb2199
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 15 23:35:39 2024 +0100

    test: added registry testing

commit 623f847
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 15 23:34:49 2024 +0100

    test: updated test configs to the new  format

commit 372947b
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Wed Feb 14 21:45:11 2024 +0100

    chore: add pytest coverage (locally)

commit 36bc7ae
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 14 13:45:11 2024 +0100

    refactor: renamed config_types to custom_config_types in ComponentFactory

commit babd597
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 14 13:35:58 2024 +0100

    feat: added support custom types in component factory

commit 1639a6a
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 14 12:04:00 2024 +0100

    refactor: simplified ComponentFactory

commit aa9e040
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 14 10:33:11 2024 +0100

    test: removed code duplication in test_component_factory

commit 44677c6
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Wed Feb 14 10:30:43 2024 +0100

    test: refactored test_custom_component

commit 71de3ff
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 13 21:44:53 2024 +0100

    test: added testing for custom components

commit 2a54f84
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 13 20:57:25 2024 +0100

    test: added test yaml configs for component factory

commit 35236d0
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 13 20:56:22 2024 +0100

    test: implemented test_non_existing_reference

commit bb4bcb3
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 13 20:53:21 2024 +0100

    test: implemented test_component_filter

commit 0dfbbcb
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 13 20:49:36 2024 +0100

    test: implemented test_hierarchical_component_instantiation

commit 3a66b65
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Tue Feb 13 20:41:18 2024 +0100

    test: implemented forward and backward referencing test

commit c0c877c
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 12 14:52:13 2024 +0100

    chore: fixed imports in component factory

commit a9781a3
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 12 14:42:28 2024 +0100

    refactor: added drafted test code for component factory

commit b1cbb46
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 12 14:42:00 2024 +0100

    refactor: moved trial component factory code to test module

commit c115b2b
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 12 14:41:21 2024 +0100

    refactor: moved component factory into parent module

commit e678d78
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 12 14:26:55 2024 +0100

    refactor: renamed hierarchical DI module to hierarchical_instantiation

commit 45f7ff4
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 12 14:19:24 2024 +0100

    refactor: removed legacy code and added comments to component factory.

commit da88895
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 12 14:03:26 2024 +0100

    feat: added referencing to config

commit b42aeeb
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 12 14:02:50 2024 +0100

    feat: added ReferenceConfig and PassType

commit 72f0524
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 12 14:02:24 2024 +0100

    feat: implemented forward and backward component referencing

commit 43e1134
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Feb 11 19:56:18 2024 +0100

    chore: added documentation for generate_text text CMD interface

commit cf27873
Author: Sogol Haghighat <sogol.haghighat@iais.fraunhofer.de>
Date:   Fri Feb 9 17:38:31 2024 +0100

    refactor: adapt nce_loss function to reflect loss from CoCa paper

commit d388d21
Author: Sogol Haghighat <sogol.haghighat@iais.fraunhofer.de>
Date:   Fri Feb 9 17:37:35 2024 +0100

    test: adapt test_nce_loss_correctness to uni and bidirectional loss

commit a8b6563
Merge: da65493 00e10ae
Author: Max Lübbering <2804731+le1nux@users.noreply.github.com>
Date:   Fri Feb 9 16:46:36 2024 +0100

    Merge pull request #30 from Modalities/huggingface_models_support

    feat: Generic huggingface transformer support

commit 00e10ae
Author: Max Lübbering <2804731+le1nux@users.noreply.github.com>
Date:   Fri Feb 9 16:24:02 2024 +0100

    Update preprocess_dataset.py

commit e93e767
Merge: f435fc8 da65493
Author: Max Lübbering <2804731+le1nux@users.noreply.github.com>
Date:   Fri Feb 9 15:50:03 2024 +0100

    Merge branch 'main' into huggingface_models_support

commit f435fc8
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 9 15:46:59 2024 +0100

    feat: introduced huggingface_prediction_subscription_key to HuggingFacePretrainedModelConfig to support different output formats

commit e6f4aac
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Fri Feb 9 15:46:08 2024 +0100

    refactor: moved lookup_enum to dedicated file.

commit ebbe8c5
Author: Sogol Haghighat <sogol.haghighat@iais.fraunhofer.de>
Date:   Fri Feb 9 13:49:42 2024 +0100

    test: add test for nce_loss using a manually calculated example

commit 7d5c095
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 8 20:17:36 2024 +0100

    chore: removed legacy code

commit dad3ea4
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 8 20:16:40 2024 +0100

    chore: added legacy trials for  hierarchical  DI

commit 3ab9ff3
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 8 20:14:10 2024 +0100

    chore: added __init__.py

commit 3dfdb2a
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 8 20:13:51 2024 +0100

    feat: implemented factory for hierarchical component instantiation

commit dc7c1a2
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 8 20:13:17 2024 +0100

    feat: added example yaml config file for hierarchical instantiation

commit 099979b
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 8 20:12:58 2024 +0100

    feat: added configs for the test components

commit c4292ce
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 8 20:12:25 2024 +0100

    feat: added components for testing

commit fc5cb96
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 8 20:11:28 2024 +0100

    chore: minor debugging improvement in parse_enum_by_name in utils

commit 783ad81
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 8 20:10:57 2024 +0100

    chore: removed legacy trials

commit 9095ac5
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 16:17:22 2024 +0100

    docs: update times in table after perf upgrade

commit 91ec38e
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 16:07:46 2024 +0100

    fix: make encoding specification obsolete and improve perf of index creation

commit afae858
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 15:48:19 2024 +0100

    feat: make encoding configurable

commit 71f77e2
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 14:51:57 2024 +0100

    refactor: remove parameter-artifact

commit a668620
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 14:47:52 2024 +0100

    refactor: remove TODO-artifact

commit a08518f
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 14:43:31 2024 +0100

    refactor: rename queue for token-writing

commit 2e535a3
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 14:25:35 2024 +0100

    fix: derive default value for cpu count automatically

commit 03d3f47
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 14:24:48 2024 +0100

    perf: share FileIOStream among process calls - not threadsafe!

commit bc086ca
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 14:13:12 2024 +0100

    docs: remove auto execution of benchmarks, while sourcing bench utils

commit fb04dc8
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 14:08:14 2024 +0100

    fix: typo in warning

commit faa2eff
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 14:05:35 2024 +0100

    docs: unify time units in measurement table

commit 26ade7c
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Feb 6 13:37:22 2024 +0100

    docs: add definitions of benchmarking experiments

commit 463872d
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 5 18:55:00 2024 +0100

    refactor: drafted hierarchical instantiation

commit bd39244
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 5 18:52:20 2024 +0100

    chore: removed unused properties in config.py

commit a908e7a
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Mon Feb 5 18:50:26 2024 +0100

    refactor: moved resolver register

commit 540afe2
Author: Sogol Haghighat <sogol.haghighat@iais.fraunhofer.de>
Date:   Thu Feb 1 17:04:22 2024 +0100

    refactor: add keyword arguments

commit 57ccaf9
Author: Sogol Haghighat <sogol.haghighat@iais.fraunhofer.de>
Date:   Thu Feb 1 17:03:18 2024 +0100

    refactor: introduce nce_loss function and add asymmetry parameter in NCELoss

commit 35ca235
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Thu Feb 1 15:57:13 2024 +0100

    feat: drafted hierarchical instantiation

commit 5b60c2f
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Tue Jan 30 22:48:35 2024 +0100

    fix: lint all files

commit d84353f
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Jan 30 17:11:02 2024 +0100

    docs: add details about dataloading performance benchmarks

commit 93d9241
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Jan 30 17:10:12 2024 +0100

    perf: use one large memmap for PackedDatasets

commit e6cb130
Author: Sogol Haghighat <sogol.haghighat@iais.fraunhofer.de>
Date:   Tue Jan 30 16:18:50 2024 +0100

    refactor: apply ruff refactor comment

commit dfbefcb
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Tue Jan 30 15:23:42 2024 +0100

    fix: get rid of reduce mocking (for testing)

commit f4e3c56
Author: Felix Stollenwerk <felix.stollenwerk@ai.se>
Date:   Tue Jan 30 15:17:10 2024 +0100

    fix: training and evaluation on CPU (for testing)

commit 69e2050
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Jan 30 12:14:21 2024 +0100

    feat: infer smallest tokensize automatically for packing

commit a96a5f4
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Tue Jan 30 09:17:35 2024 +0100

    perf: use parallelized tokenization when creating .pbin files

commit ee08a01
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Mon Jan 29 15:35:55 2024 +0100

    perf: increase memmap index creation speed

commit 8e30e00
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Jan 28 23:22:39 2024 +0100

    chore: added documentation

commit abb63aa
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Jan 28 22:08:45 2024 +0100

    refactor: fixed configs due to latest changes

commit f83da11
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Jan 28 22:07:26 2024 +0100

    feat: wired up huggingface transformer models

commit 9309505
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Jan 28 22:01:29 2024 +0100

    chore: renamed Block to GPT2Block

commit 4d6a5ff
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Jan 28 22:01:17 2024 +0100

    feat: fully implemented HuggingFacePretrainedModel with respective configuration

commit 88c4fdb
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Jan 28 22:00:33 2024 +0100

    feat: implemented automatic FSDP wrapping

commit 3b51117
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Jan 28 21:56:48 2024 +0100

    refactor: renamed tokenizer.json to tokenizer_gpt2.json

commit 95e67a0
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Jan 28 21:55:52 2024 +0100

    feat: renamed redpajama memmap datasets (added tokenizer info)

commit 0992d21
Author: Max Luebbering <le1nux@users.noreply.github.com>
Date:   Sun Jan 28 00:26:36 2024 +0100

    feat: towards generic huggingface transformer support

commit ba65580
Author: Sogol Haghighat <sogol.haghighat@iais.fraunhofer.de>
Date:   Fri Jan 26 13:36:38 2024 +0100

    refactor: refactor docstrings

commit e459321
Author: Sogol Haghighat <sogol.haghighat@iais.fraunhofer.de>
Date:   Thu Jan 25 17:48:32 2024 +0100

    test: add test for contrastive loss

commit bb14749
Author: Sogol Haghighat <sogol.haghighat@iais.fraunhofer.de>
Date:   Thu Jan 25 17:47:43 2024 +0100

    feat: add contrastive loss for coca model training

commit c9e4e08
Author: Luzian Hahn <luzian.hahn@iis.fraunhofer.de>
Date:   Mon Jan 22 13:43:46 2024 +0100

    fix: rely again on iso-8859-1 instead of utf8

    the OpenGPT-X data seems to come with problematic chars, which cannot get edecoded via utf8.
    The former fix to use iso-8859-1 fixes this. However the issue probably lays actually with dataset conversions
le1nux pushed a commit that referenced this pull request Mar 13, 2024
fix: use renamed tokenizer file name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants