min-patchnizer

Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder. The code here, first extracts still images (frames) from a video, splits the image frames into smaller fixed-size patches, linearly embeds each of them, adds position embeddings and then saves the resulting sequence of vectors for use in a Vision Transformer encoder. I tried training the resulting sequence vectors with Karpathy's minbpe and it took 2173.45 seconds per frame to tokenize. The whole "patchnization" took ~77.40a for a 20s video on my M2 Air.

The files in this repo work as follows:

patchnizer.py: Holds code for simple implemenatation of the three stages involved (extract_image_frames from video, reduce image_frames_to_patches of fixed sizes 16x16 pixels, then linearly_embed_patches into a 1D vector sequence with additional position embeddings.
patchnize.py: performs the whole process with custom configs (patch_size, created dirs, video - I am using the "dogs playing in snow" video by sora).
train.py: Trains the resulting one-dimensional vector sequence (linear_patch_embeddings + position_embeddings) on Karpathy's minbpe (a minimal implementation of the byte-pair encoding algorithm).
check.py: Checks to see if the patch embeddings match the original image patches and then reconstructs the original image frames - this basically just do the reverse of linear embedding.

The whole process builds on the approach introduced in the Vision Transformer paper: "An image is worth 16x16 words: Transformers for image recognition at scale."

Youtube Video: Watch Demo

Usage

First patchnize:

python patchnize.py

Next check:

python check.py

Then train:

python train.py

References

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
minbpe		minbpe
models		models
tests		tests
README.md		README.md
check.py		check.py
patchnize.py		patchnize.py
patchnizer.py		patchnizer.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minbpe

minbpe

models

models

tests

tests

README.md

README.md

check.py

check.py

patchnize.py

patchnize.py

patchnizer.py

patchnizer.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

min-patchnizer

Usage

References

License

About

Releases

Packages

Languages

Jaykef/min-patchnizer

Folders and files

Latest commit

History

Repository files navigation

min-patchnizer

Usage

References

License

About

Topics

Resources

Stars

Watchers

Forks

Languages