Skip to content

Enhancements for Dataset Handling, Training Splits, Checkpoint Management, and Model Sharing #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

MubarakHAlketbi
Copy link

  • Import Dataset and get_dataset_split_names from datasets to enable more flexible data handling
  • Add --train_split and --valid_split arguments in get_args() function for user-defined train and validation splits
  • Add --save_limit argument in get_args() function to control the number of model checkpoints saved
  • Add local_rank to get the local rank from the environment variable for torchrun, enabling distributed training
  • Update prepare_sample_text() function to check if the example is a dictionary, enhancing data compatibility
  • Update create_datasets() function to handle train and validation split based on --train_split and --valid_split arguments, providing more flexibility in dataset splitting
  • Update run_training() function to include save_strategy, save_total_limit, hub_strategy, and load_best_model_at_end in TrainingArguments, providing more control over model saving and loading
  • Add step in run_training() function to push the model to the hub after saving the final checkpoint, facilitating model sharing
  • Add print statement to indicate when the model is being pushed to the hub, improving user feedback

adding torchrun support
dataset splits sanity checks
sanity check 2
arguments updated
added save limit argument
few updates to arguments
possible fix for push to hub
run training, testing saving
debugging code
@MubarakHAlketbi
Copy link
Author

more debugging required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant