Enhancements for Dataset Handling, Training Splits, Checkpoint Management, and Model Sharing #51

MubarakHAlketbi · 2023-05-29T07:34:41Z

Import Dataset and get_dataset_split_names from datasets to enable more flexible data handling
Add --train_split and --valid_split arguments in get_args() function for user-defined train and validation splits
Add --save_limit argument in get_args() function to control the number of model checkpoints saved
Add local_rank to get the local rank from the environment variable for torchrun, enabling distributed training
Update prepare_sample_text() function to check if the example is a dictionary, enhancing data compatibility
Update create_datasets() function to handle train and validation split based on --train_split and --valid_split arguments, providing more flexibility in dataset splitting
Update run_training() function to include save_strategy, save_total_limit, hub_strategy, and load_best_model_at_end in TrainingArguments, providing more control over model saving and loading
Add step in run_training() function to push the model to the hub after saving the final checkpoint, facilitating model sharing
Add print statement to indicate when the model is being pushed to the hub, improving user feedback

adding torchrun support

dataset splits sanity checks

sanity check 2

arguments updated

added save limit argument

few updates to arguments

possible fix for push to hub

run training, testing saving

debugging code

MubarakHAlketbi · 2023-06-01T06:56:42Z

more debugging required

MubarakHAlketbi added 15 commits May 28, 2023 03:46

modified arguments and splits

73be6ef

changed parser

e214063

Update finetune.py

d87fe0f

adding torchrun support

Update finetune.py

97ef64d

dataset splits sanity checks

Update finetune.py

a7f5aa7

sanity check 2

Update finetune.py

83e54fc

arguments updated

Update finetune.py

b813a31

added save limit argument

Update README.md

ad02700

few updates to arguments

Update finetune.py

cf181d5

possible fix for push to hub

still some issues

94d339b

under testing

b8fdcc6

testing

74f737c

run training, testing saving

issues with saving

d712713

debug level code

bc2ee98

debugging code

checkpoint

ee22c3c

MubarakHAlketbi closed this Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancements for Dataset Handling, Training Splits, Checkpoint Management, and Model Sharing #51

Enhancements for Dataset Handling, Training Splits, Checkpoint Management, and Model Sharing #51

Uh oh!

MubarakHAlketbi commented May 29, 2023

Uh oh!

MubarakHAlketbi commented Jun 1, 2023

Uh oh!

Uh oh!

Enhancements for Dataset Handling, Training Splits, Checkpoint Management, and Model Sharing #51

Enhancements for Dataset Handling, Training Splits, Checkpoint Management, and Model Sharing #51

Uh oh!

Conversation

MubarakHAlketbi commented May 29, 2023

Uh oh!

MubarakHAlketbi commented Jun 1, 2023

Uh oh!

Uh oh!