Skip to content

Conversation

RaymondLi0
Copy link
Contributor

@RaymondLi0 RaymondLi0 commented Apr 1, 2025

✨ Description

Follow-up to #179 with some fixes and improvements:

  • Add more labels in data-sampling to avoid input-truncation
  • handle sequence-parallelism and cross-entropy splits (as a consequence of the above)
  • Convert the additional lm-heads and add a corresponding hf-transformer implementation

Potential future work:

  • add weights for each future-token prediction loss (e.g. the next token could be more important than the future ones)
  • Improve throughput

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@RaymondLi0 RaymondLi0 marked this pull request as ready for review April 14, 2025 17:10
@RaymondLi0 RaymondLi0 changed the title WIP: improvements to MTP implementation improvements to MTP implementation Apr 14, 2025
@RaymondLi0
Copy link
Contributor Author

Since there is no available implementation in the transformers library, I wrote a custom modeling.py for llama with multi-token prediction.
The converter then copies these files in the export directory.
Wasn't sure what the best approach would be here, but that's the only thing I came up with. Let me know if there would be a better way.

Copy link
Collaborator

@jlamypoirier jlamypoirier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some minor comments

Copy link
Collaborator

@tscholak tscholak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@jlamypoirier jlamypoirier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@RaymondLi0 RaymondLi0 merged commit 6ad0a96 into main Apr 17, 2025
4 checks passed
@RaymondLi0 RaymondLi0 deleted the raymond/mtp-improvements branch April 17, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants