Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load lang id any2any training mengruw #8961

Closed

Commits on May 17, 2023

  1. Cut branch r1.19.0

    Signed-off-by: smajumdar <titu1994@gmail.com>
    titu1994 committed May 17, 2023
    Configuration menu
    Copy the full SHA
    4084669 View commit details
    Browse the repository at this point in the history

Commits on May 19, 2023

  1. Configuration menu
    Copy the full SHA
    1dc8b37 View commit details
    Browse the repository at this point in the history

Commits on May 23, 2023

  1. Fix k2 installation in Docker with CUDA 12 (NVIDIA#6707)

    Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
    artbataev committed May 23, 2023
    Configuration menu
    Copy the full SHA
    0ca1dd3 View commit details
    Browse the repository at this point in the history

Commits on May 24, 2023

  1. Tutorial fixes (NVIDIA#6717)

    Signed-off-by: smajumdar <titu1994@gmail.com>
    titu1994 committed May 24, 2023
    Configuration menu
    Copy the full SHA
    db6e29b View commit details
    Browse the repository at this point in the history

Commits on May 26, 2023

  1. VP Fixes for converter + Config management (NVIDIA#6698) (NVIDIA#6738)

    * [Temp] VP Fixes
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Revert logging
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    ---------
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    (cherry picked from commit b6f46a0)
    titu1994 committed May 26, 2023
    Configuration menu
    Copy the full SHA
    2e2df4a View commit details
    Browse the repository at this point in the history
  2. Fix fastpitch test nightly (NVIDIA#6742)

    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    hsiehjackson committed May 26, 2023
    Configuration menu
    Copy the full SHA
    4df8f33 View commit details
    Browse the repository at this point in the history
  3. check for first or last stage (NVIDIA#6708)

    * check for first or last stage
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * remove redundant check
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * fix typo
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add map_location
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    ---------
    
    Signed-off-by: ericharper <complex451@gmail.com>
    ericharper committed May 26, 2023
    Configuration menu
    Copy the full SHA
    e806e11 View commit details
    Browse the repository at this point in the history

Commits on May 29, 2023

  1. Bug fix to restore act ckpt (NVIDIA#6753)

    * Bug fix to restore act ckpt
    
    Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    markelsanz14 and pre-commit-ci[bot] committed May 29, 2023
    Configuration menu
    Copy the full SHA
    dbd6a56 View commit details
    Browse the repository at this point in the history

Commits on May 31, 2023

  1. Bug fix to reset sequence parallelism (NVIDIA#6756)

    * Bug fix to reset sequence parallelism
    
    Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
    
    * Update seq par reset/restore
    
    Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
    
    * Add nested loop
    
    Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
    
    ---------
    
    Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
    markelsanz14 committed May 31, 2023
    Configuration menu
    Copy the full SHA
    a0f757e View commit details
    Browse the repository at this point in the history
  2. Fix checkpointed forward and add test for full activation checkpointi…

    …ng (NVIDIA#6744)
    
    * fix checkpointed forward and add test for full activation checkpointing
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * add method
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * add method
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    ---------
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    aklife97 committed May 31, 2023
    Configuration menu
    Copy the full SHA
    39dd654 View commit details
    Browse the repository at this point in the history
  3. Fix Links (NVIDIA#6777)

    Signed-off-by: smajumdar <titu1994@gmail.com>
    titu1994 committed May 31, 2023
    Configuration menu
    Copy the full SHA
    216bcab View commit details
    Browse the repository at this point in the history

Commits on Jun 1, 2023

  1. add call to p2p overlap (NVIDIA#6779)

    * add call to p2p overlap
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    * update Jenkins for test
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    ---------
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    aklife97 committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    4ecc769 View commit details
    Browse the repository at this point in the history
  2. Fix get_parameters when using main params optimizer (NVIDIA#6764)

    * fix get param
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * change name
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    ---------
    
    Signed-off-by: ericharper <complex451@gmail.com>
    ericharper committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    1486b12 View commit details
    Browse the repository at this point in the history
  3. Lddl bert (NVIDIA#6761)

    * initial POC for LDDL Bert
    
    * Finish LDDL POC
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * address comments
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * fix merge head
    
    * resolving merge
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * add support for  val/test loaders
    
    * change to new LDDL class + add winding
    
    * fix logging level
    
    * fix winding
    
    * test fix
    
    * fixes to winding
    
    * add file system
    
    * add prepemption optimizations
    
    * more logging
    
    * more prints
    
    * better logging
    
    * asfsf
    
    * add barrier
    
    * removing prints
    
    * working with mb lddl loader
    
    * final changes
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * update requirements file with LDDL
    
    Signed-off-by: wdykas <wdykas@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * revert adding to requirements
    
    ---------
    
    Signed-off-by: wdykas <wdykas@nvidia.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: Eric Harper <complex451@gmail.com>
    3 people committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    aff5217 View commit details
    Browse the repository at this point in the history
  4. Debug Transformer Engine FP8 support with Megatron-core infrastructure (

    NVIDIA#6740)
    
    * Construct FP8 amax reduction group
    
    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    
    * update core for CI
    
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    
    ---------
    
    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
    timmoon10 and aklife97 committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    4bbb3c6 View commit details
    Browse the repository at this point in the history
  5. Tensor-parallel communication overlap with userbuffer backend (NVIDIA…

    …#6780)
    
    * add interfaces for tp_communication overlap
    
    [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    Interface to provide custom userbuffer communicator settings by yaml file
    
    [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    * Construct MPI process group for userbuffers support
    
    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    
    ---------
    
    Signed-off-by: Tim Moon <tmoon@nvidia.com>
    Co-authored-by: Tim Moon <tmoon@nvidia.com>
    Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
    3 people committed Jun 1, 2023
    Configuration menu
    Copy the full SHA
    e4460d1 View commit details
    Browse the repository at this point in the history

Commits on Jun 2, 2023

  1. Fix adapter tutorial r1.19.0 (NVIDIA#6776)

    * Fix TTS adapter tutorial
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    * Fix version
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    
    ---------
    
    Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
    hsiehjackson committed Jun 2, 2023
    Configuration menu
    Copy the full SHA
    9bd8ecd View commit details
    Browse the repository at this point in the history
  2. Fix check (NVIDIA#6798)

    Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
    MaximumEntropy committed Jun 2, 2023
    Configuration menu
    Copy the full SHA
    913e5e5 View commit details
    Browse the repository at this point in the history
  3. Bug fix for reset_sequence_parallel_args (NVIDIA#6802)

    Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
    markelsanz14 committed Jun 2, 2023
    Configuration menu
    Copy the full SHA
    a8aa8f1 View commit details
    Browse the repository at this point in the history

Commits on Jun 5, 2023

  1. Configuration menu
    Copy the full SHA
    0e0253e View commit details
    Browse the repository at this point in the history

Commits on Jun 6, 2023

  1. update core version (NVIDIA#6817)

    Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
    aklife97 committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    41bb941 View commit details
    Browse the repository at this point in the history
  2. Add trainer.validate example for GPT (NVIDIA#6794)

    * add trainer.validate example
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * clean up white space
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add mbs and gbs to the config
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    ---------
    
    Signed-off-by: ericharper <complex451@gmail.com>
    ericharper committed Jun 6, 2023
    Configuration menu
    Copy the full SHA
    45144f5 View commit details
    Browse the repository at this point in the history

Commits on Jun 8, 2023

  1. fix notebook error (NVIDIA#6840)

    Signed-off-by: Yi Dong <yidong@nvidia.com>
    yidong72 committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    dc52b94 View commit details
    Browse the repository at this point in the history
  2. fix (NVIDIA#6842)

    Signed-off-by: Yi Dong <yidong@nvidia.com>
    yidong72 committed Jun 8, 2023
    Configuration menu
    Copy the full SHA
    4239b80 View commit details
    Browse the repository at this point in the history

Commits on Jun 13, 2023

  1. Add API docs for NeMo Megatron (NVIDIA#6850)

    * add model pretraining and customization classes
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * fix
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * test width
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * increase middle pane width
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add modules and datasets
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * remove global in t5 dataset s and fix formatting in megatron base model
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    ---------
    
    Signed-off-by: ericharper <complex451@gmail.com>
    ericharper committed Jun 13, 2023
    Configuration menu
    Copy the full SHA
    87e1b81 View commit details
    Browse the repository at this point in the history

Commits on Jun 14, 2023

  1. Apply garbage collection interval to validation steps (NVIDIA#6870)

    * Apply garbage collection inverval to validation steps
    
    Signed-off-by: Sangkug Lym <slym@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    ---------
    
    Signed-off-by: Sangkug Lym <slym@nvidia.com>
    Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    erhoo82 and pre-commit-ci[bot] committed Jun 14, 2023
    Configuration menu
    Copy the full SHA
    f875702 View commit details
    Browse the repository at this point in the history

Commits on Jun 15, 2023

  1. update mcore version (NVIDIA#6875)

    Signed-off-by: ericharper <complex451@gmail.com>
    ericharper committed Jun 15, 2023
    Configuration menu
    Copy the full SHA
    2331b06 View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2024

  1. fix language id issue for any2any training. This fix loads language i…

    …ds from .yaml file
    Dockerfile builder committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    f0fa541 View commit details
    Browse the repository at this point in the history