Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TinCLIP training log #215

Open
Gumpest opened this issue Jan 10, 2024 · 8 comments
Open

TinCLIP training log #215

Gumpest opened this issue Jan 10, 2024 · 8 comments
Labels

Comments

@Gumpest
Copy link

Gumpest commented Jan 10, 2024

In my reproduction of auto_weight_inherit_100to75.sh, the imagenet-zeroshot-val-top1 is 0.0010 in Train Epoch: 0 [2501/48828]. I wonder about the situation weather is normal.

@Gumpest
Copy link
Author

Gumpest commented Jan 10, 2024

The wandb log.

@Gumpest
Copy link
Author

Gumpest commented Jan 10, 2024

It happened in Cross-Modal distillation process.

@wkcn
Copy link
Contributor

wkcn commented Jan 10, 2024

@Gumpest I observed --train-data synthetic in the training command.

Did you replace the dataloader with the one loading LAION-400M image-text pairs?

@wkcn wkcn added the TinyCLIP label Jan 10, 2024
@Gumpest
Copy link
Author

Gumpest commented Jan 13, 2024

@wkcn Oh, I didn't do that. The step is not mentioned in the docs. Do you have detailed information.

@wkcn
Copy link
Contributor

wkcn commented Jan 13, 2024

Sorry for that. Regarding to the data loader, you can refer to the repo OpenCLIP (https://github.com/mlfoundations/open_clip?tab=readme-ov-file#data).

@Gumpest
Copy link
Author

Gumpest commented Jan 15, 2024

@wkcn Sorry to bother you, (https://github.com/mlfoundations/open_clip?tab=readme-ov-file#data) tells me how to download the laion-400m dataset, and "replace the dataloader with the one loading LAION-400M image-text pairs" means what😂

@Gumpest
Copy link
Author

Gumpest commented Jan 16, 2024

@wkcn or please provide the script to train with YFCC.

@wkcn
Copy link
Contributor

wkcn commented Jan 21, 2024

@Gumpest Sorry for late reply.

@wkcn Sorry to bother you, (https://github.com/mlfoundations/open_clip?tab=readme-ov-file#data) tells me how to download the laion-400m dataset, and "replace the dataloader with the one loading LAION-400M image-text pairs" means what😂

In our scripts, --train-data and --dataset-type are both synthetic. You need to replace it in order to load the LAION-400M or YFCC-15M datasets.

@wkcn or please provide the script to train with YFCC.

Here are the hyper-parameters on YFCC.

On YFCC-15M, it contains 2 compression stages, where the training epochs are both 25 from 100% to 50% parameters, and 50% to 10%. We follow the hyper-parameter of CLIP except that the learning rate is set to 10^−4 when using weight inheritance.

Fig. 7 in Supplementary Material
Screenshot 2024-01-21 at 20 36 02

Stage 1: CLIP-VIT-16 to TinyCLIP-ViT-39M-16-Text-19M (manual inheritance, 100% to 50%)

export NNODES=1
export GPUS_PER_NODE=8

DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES"
torchrun $DISTRIBUTED_ARGS src/training/main.py \
 --save-frequency 1 \
 --report-to wandb \
 --train-data <your_yfcc_path/> \
 --dataset-type webdataset \
 --imagenet-val ./ImageNet \
 --warmup 2000 \
 --batch-size 512 \
 --epochs 25 \
 --workers 8 \
 --model TinyCLIP-ViT-39M-16-Text-19M \
 --name exp_name \
 --seed 0 \
 --local-loss \
 --grad-checkpointing \
 --logs ./outputs/TinyCLIP-ViT-39M-16-Text-19M \
 --lr 0.0001 \
 --gather-with-grad \
 --pretrained-image-file ViT-B-16@openai \
 --pretrained-text-file ViT-B-16@openai \
 --distillation-teacher ViT-B-32@laion2b_e16 \
 --logit-scale 50 \
 --norm_gradient_clip 5 \
 --train-num-samples 15000000

Stage 2: TinyCLIP-ViT-39M-16-Text-19M to TinyCLIP-ViT-8M-16-Text-3M (manual inheritance, 50% to 10%)

export NNODES=1
export GPUS_PER_NODE=8

DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES"
torchrun $DISTRIBUTED_ARGS src/training/main.py \
 --save-frequency 1 \
 --report-to wandb \
 --train-data <your_yfcc_path/> \
 --dataset-type webdataset \
 --imagenet-val ./ImageNet \
 --warmup 2000 \
 --batch-size 512 \
 --epochs 25 \
 --workers 8 \
 --model TinyCLIP-ViT-8M-16-Text-3M \
 --name exp_name \
 --seed 0 \
 --local-loss \
 --grad-checkpointing \
 --logs ./outputs/TinyCLIP-ViT-8M-16-Text-3M \
 --lr 0.0001 \
 --gather-with-grad \
 --pretrained-image-file checkpoints/TinyCLIP-ViT-39M-16-Text-19M-YFCC15M.pt \
 --pretrained-text-file checkpoints/TinyCLIP-ViT-39M-16-Text-19M-YFCC15M.pt \
 --distillation-teacher ViT-B-32@laion2b_e16 \
 --logit-scale 50 \
 --norm_gradient_clip 5 \
 --train-num-samples 15000000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants