DALL-E

About

Re-implementation of Dall-E.

Model

This repository is loosely based on the original DALL-E paper by OpenAI. Instead of using a GPT2/GPT3 like autoregressive transformer decoder architecture, it uses the Megabyte based model from lucidrains.

Method

Use VQ-VAE to encode and decode images.
Ingest text tokens and predict VQ-VAE Codes
- Use megabyte model (Will also allow massive context length)
- Just encode text using chars for now
- Auto-regressively predict VQ-VAE codes from text tokens
CIFAR 10 results bad. Perhaps because VQ-VAE bad with images below 64x64, switching to Tiny ImageNet. (NOTE: There was issues processing data, nothing to do with CIFAR-10).

Datasets

Tiny ImageNet

Validate Tiny ImageNet captions and images (so they matchup)
- Labels needed to be sorted same as the trainloader/testloader.
Overfit DALL-E model on one caption image pair.
Overfit DALL-E model on one batch of caption image pairs.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
2.bmp		2.bmp
3.bmp		3.bmp
LICENSE		LICENSE
README.md		README.md
attend.py		attend.py
dog.JPEG		dog.JPEG
main.ipynb		main.ipynb
megabyte.py		megabyte.py
sample_image.bmp		sample_image.bmp
tiny_imagenet.py		tiny_imagenet.py
vqvae.py		vqvae.py
wooden spoon.bmp		wooden spoon.bmp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DALL-E

About

Model

Method

Datasets

Tiny ImageNet

About

Releases

Packages

Languages

License

MiscellaneousStuff/dall-e

Folders and files

Latest commit

History

Repository files navigation

DALL-E

About

Model

Method

Datasets

Tiny ImageNet

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages