Skip to content

Conversation

@jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Oct 30, 2025

✨ Description

  • Add patch sample type to handle image patches and similar objects.
  • Add optional image patches to language model samples
    *Update gpt preparator to support images and perform the necessary offline preprocessing.
  • Image normalization can't be done offline because of file size issues. Added it in the language model reader. Works, but not ideal and lacks configurability.
  • TODO Add image separators for patches (lenghts?) so we can calculate cu_seqlens.
  • TODO Add tests for images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants