Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llama training support #2055

Merged
merged 28 commits into from Mar 23, 2023
Merged

Add llama training support #2055

merged 28 commits into from Mar 23, 2023

Conversation

andreaskoepf
Copy link
Collaborator

  • further work based on new-format-experiment branch

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@sanagno sanagno added the ml label Mar 14, 2023
@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

.vscode/settings.json Outdated Show resolved Hide resolved
@theblackcat102 theblackcat102 enabled auto-merge (squash) March 21, 2023 14:53
@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

1 similar comment
@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@theblackcat102
Copy link
Collaborator

theblackcat102 commented Mar 22, 2023

@andreaskoepf @sanagno After merging conflict from main, the pre-commit behaving very weird, my local precommit check was fine, but github action was not happy

index f125a3d..434066b 100755
--- a/model/model_training/tools/model_chat.py
+++ b/model/model_training/tools/model_chat.py
@@ -19,9 +19,8 @@ if __name__ == "__main__":
 
     sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 
-from transformers import AutoModelForCausalLM, AutoTokenizer
-
 from tokenizers import pre_tokenizers
+from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 class ChatRole(str, Enum):
diff --git a/model/model_training/tools/model_cli.py b/model/model_training/tools/model_cli.py
index a3fac77..b89e4c0 100755
--- a/model/model_training/tools/model_cli.py
+++ b/model/model_training/tools/model_cli.py
@@ -6,9 +6,8 @@ import torch
 import transformers
 from custom_datasets.formatting import QA_SPECIAL_TOKENS, format_pairs, format_system_prefix
 from models import get_specific_model
-from utils import _strtobool
-
 from tokenizers import pre_tokenizers
+from utils import _strtobool
 
 if __name__ == "__main__":
     import warnings
diff --git a/model/model_training/utils.py b/model/model_training/utils.py
index 7a710ee..92eccc4 100644
--- a/model/model_training/utils.py
+++ b/model/model_training/utils.py
@@ -15,11 +15,10 @@ from models import freeze_top_n_layers, get_specific_model
 from models.reward_model import RewardModel, RewardModelConfig
 from models.tokenization_llama import LLaMATokenizer
 from sklearn.model_selection import train_test_split
+from tokenizers import pre_tokenizers
 from torch.utils.data import ConcatDataset, Subset
 from torch.utils.data.distributed import DistributedSampler
 
-from tokenizers import pre_tokenizers
-
 
 def _strtobool(x):
     return bool(strtobool(x))
diff --git a/model/reward/instructor/utils.py b/model/reward/instructor/utils.py
index d1fcc2a..1e97df9 100644
--- a/model/reward/instructor/utils.py
+++ b/model/reward/instructor/utils.py
@@ -3,11 +3,10 @@ from typing import AnyStr, List
 
 import yaml
 from sklearn.model_selection import train_test_split
+from tokenizers import pre_tokenizers
 from torch.utils.data import Subset
 from transformers import AutoTokenizer, T5Tokenizer
 
-from tokenizers import pre_tokenizers
-

these difference are simply just change of import line?

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@github-actions
Copy link

pre-commit failed.
Please run pre-commit run --all-files locally and commit the changes.
Find more information in the repository's CONTRIBUTING.md

@theblackcat102 theblackcat102 merged commit 42d9a5f into main Mar 23, 2023
1 check passed
@theblackcat102 theblackcat102 deleted the llama_experiment branch March 23, 2023 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants