Add llama training support #2055

andreaskoepf · 2023-03-13T11:44:21Z

further work based on new-format-experiment branch

github-actions · 2023-03-13T12:31:03Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-03-13T19:03:17Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-03-13T19:19:48Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-03-14T10:22:53Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

…ssistant into llama_experiment

github-actions · 2023-03-15T11:37:49Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-03-15T11:43:03Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-03-15T11:54:26Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

.vscode/settings.json

model/reward/instructor/configs/zero_config.json

github-actions · 2023-03-22T04:05:25Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-03-22T04:12:29Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-03-22T04:41:46Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

theblackcat102 · 2023-03-22T04:45:37Z

@andreaskoepf @sanagno After merging conflict from main, the pre-commit behaving very weird, my local precommit check was fine, but github action was not happy

index f125a3d..434066b 100755
--- a/model/model_training/tools/model_chat.py
+++ b/model/model_training/tools/model_chat.py
@@ -19,9 +19,8 @@ if __name__ == "__main__":
 
     sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 
-from transformers import AutoModelForCausalLM, AutoTokenizer
-
 from tokenizers import pre_tokenizers
+from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 class ChatRole(str, Enum):
diff --git a/model/model_training/tools/model_cli.py b/model/model_training/tools/model_cli.py
index a3fac77..b89e4c0 100755
--- a/model/model_training/tools/model_cli.py
+++ b/model/model_training/tools/model_cli.py
@@ -6,9 +6,8 @@ import torch
 import transformers
 from custom_datasets.formatting import QA_SPECIAL_TOKENS, format_pairs, format_system_prefix
 from models import get_specific_model
-from utils import _strtobool
-
 from tokenizers import pre_tokenizers
+from utils import _strtobool
 
 if __name__ == "__main__":
     import warnings
diff --git a/model/model_training/utils.py b/model/model_training/utils.py
index 7a710ee..92eccc4 100644
--- a/model/model_training/utils.py
+++ b/model/model_training/utils.py
@@ -15,11 +15,10 @@ from models import freeze_top_n_layers, get_specific_model
 from models.reward_model import RewardModel, RewardModelConfig
 from models.tokenization_llama import LLaMATokenizer
 from sklearn.model_selection import train_test_split
+from tokenizers import pre_tokenizers
 from torch.utils.data import ConcatDataset, Subset
 from torch.utils.data.distributed import DistributedSampler
 
-from tokenizers import pre_tokenizers
-
 
 def _strtobool(x):
     return bool(strtobool(x))
diff --git a/model/reward/instructor/utils.py b/model/reward/instructor/utils.py
index d1fcc2a..1e97df9 100644
--- a/model/reward/instructor/utils.py
+++ b/model/reward/instructor/utils.py
@@ -3,11 +3,10 @@ from typing import AnyStr, List
 
 import yaml
 from sklearn.model_selection import train_test_split
+from tokenizers import pre_tokenizers
 from torch.utils.data import Subset
 from transformers import AutoTokenizer, T5Tokenizer
 
-from tokenizers import pre_tokenizers
-

these difference are simply just change of import line?

github-actions · 2023-03-22T23:53:51Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

github-actions · 2023-03-23T21:22:01Z

❌ pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

theblackcat102 and others added 9 commits March 4, 2023 03:47

[feature] change to new conversation format

6cd8336

[feature] add chat toolkit for v3 format

ebbe2a3

[merge] fix conflict with main

794f586

try new format

bfbc7da

[feature] add llama

ee2d2e1

[fix] Add missing llama files, my bad

a5f60b9

[feature] add llama to chat tool

2c709b1

[merge] fix model_chat conflict, wtf is that sys append

83cebfa

Integrate latest llama tokenizer changes, revert to v2.5 formatting

e9cefd0

andreaskoepf requested review from theblackcat102, sanagno and yk as code owners March 13, 2023 11:44

andreaskoepf and others added 2 commits March 13, 2023 11:45

Merge branch 'main' into llama_experiment

8e3c338

add patching support for llama

75b0a7d

andreaskoepf added 2 commits March 13, 2023 13:39

pre-commit fixes

9f68464

params for llama_13b_mask_lr8e-6 without dropout

be807b2

fix lazy _no_prefix_space_tokens generation

115eedd

sanagno added the ml label Mar 14, 2023

add latin_cyrillic config + llama export

1e53464

andreaskoepf and others added 4 commits March 14, 2023 15:25

pre-commit fixes

1f34914

config for llama_13b_mask_latcyr_lr8_do1_bs64 run

24c609e

Merge branch 'llama_experiment' of https://github.com/LAION-AI/Open-A…

1281b6d

…ssistant into llama_experiment

Merge branch 'main' into llama_experiment

e1d12d0

patch model

5790fe4

merge fixes

a509715

pre-commits

9b37123

sanagno approved these changes Mar 15, 2023

View reviewed changes

yk approved these changes Mar 20, 2023

View reviewed changes

.vscode/settings.json Outdated Show resolved Hide resolved

theblackcat102 approved these changes Mar 21, 2023

View reviewed changes

model/reward/instructor/configs/zero_config.json Outdated Show resolved Hide resolved

[fix] revert vscode settings change

19c5df3

theblackcat102 enabled auto-merge (squash) March 21, 2023 14:53

Merge branch 'main' into llama_experiment

839850a

[fix] manually rerun precommit

9c91bfb

Merge branch 'main' into llama_experiment

23f90b0

Merge remote-tracking branch 'origin/main' into llama_experiment

e1dffbb

andreaskoepf requested a review from dvruette as a code owner March 23, 2023 21:19

fix pre-commit

5f514a1

theblackcat102 merged commit 42d9a5f into main Mar 23, 2023
1 check passed

theblackcat102 deleted the llama_experiment branch March 23, 2023 21:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama training support #2055

Add llama training support #2055

andreaskoepf commented Mar 13, 2023

github-actions bot commented Mar 13, 2023

github-actions bot commented Mar 13, 2023

github-actions bot commented Mar 13, 2023

github-actions bot commented Mar 14, 2023

github-actions bot commented Mar 15, 2023

github-actions bot commented Mar 15, 2023

github-actions bot commented Mar 15, 2023

github-actions bot commented Mar 22, 2023

github-actions bot commented Mar 22, 2023

github-actions bot commented Mar 22, 2023

theblackcat102 commented Mar 22, 2023 •

edited

github-actions bot commented Mar 22, 2023

github-actions bot commented Mar 23, 2023

Add llama training support #2055

Add llama training support #2055

Conversation

andreaskoepf commented Mar 13, 2023

github-actions bot commented Mar 13, 2023

github-actions bot commented Mar 13, 2023

github-actions bot commented Mar 13, 2023

github-actions bot commented Mar 14, 2023

github-actions bot commented Mar 15, 2023

github-actions bot commented Mar 15, 2023

github-actions bot commented Mar 15, 2023

github-actions bot commented Mar 22, 2023

github-actions bot commented Mar 22, 2023

github-actions bot commented Mar 22, 2023

theblackcat102 commented Mar 22, 2023 • edited

github-actions bot commented Mar 22, 2023

github-actions bot commented Mar 23, 2023

theblackcat102 commented Mar 22, 2023 •

edited