Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create, train and evaluate DTD dataset #5

Closed
dreamflasher opened this issue Oct 8, 2020 · 3 comments
Closed

How to create, train and evaluate DTD dataset #5

dreamflasher opened this issue Oct 8, 2020 · 3 comments

Comments

@dreamflasher
Copy link

Hi, would you be so kind to explain how to do the DTD training and evaluation?
In the paper you mention that DTD are the inliers and imagenet30 the outliers. How is the folder structure of "~/data/dtd/" supposed to look like?

For training, do I assume correctly unlabeled multi-class? I.e. CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --dataset dtd --model resnet18 --mode simclr_CSI --shift_trans_type rotation --batch_size 32 --one_class_idx None

And for evaluation, how do I specify the out-distribution? I only see the "dataset" flag, but I would need to specify in-distribution and out-of-distribution datasets, right?

Thank you again for your help!

@dreamflasher
Copy link
Author

And the same question for the steel dataset in the appendix; looks like this didn't get in the code?

@jihoontack
Copy link
Collaborator

jihoontack commented Oct 9, 2020

Hi! Thank you again for your interest!

Before answering the question, we found that there is a minor value mistake in Table 6 (DTD to ImageNet detection). Even after fixing the minor bug, we found out that our message doesn't change.

  • reported: Base 96.4, CSI(Rotation) 65.4
  • fixed: Base 90.0, CSI(Rotation) 79.9
  • The mistake was due to the evaluation code. Please aware if you are using an imagenet sized in-lier dataset (add option to line 146 in evals/ood_pre.py e.g., P.dataset == 'dtd')

To use DTD as in-liers, you should first divide the DTD dataset into train/test sets. The following code is the one I have implemented to divide the set. Run this code at the ~/data/dtd folder. (and note that you should create test folder before running the code)

import os
import shutil

f = open('labels/test1.txt', 'r')
while True:
    line = f.readline()
    if not line: break

    line = line.replace("\n", "")
    test_class, test_sample_name = line.split('/')

    if not os.path.exists(f'./test/{test_class}'):
        os.mkdir(f'./test/{test_class}')

    shutil.move(f'./images/{line}', f'./test/{line}')

f.close()

After dividing the set, you can use DTD as a training dataset with some modification on your dataset.py code (just as same as loading cifar10 or ImageNet). Also, I believe you should modify the code a little since we restricted the argument parsers for --dataset.

For CSI training, we have used unlabeled multiclass training for DTD.
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py --dataset dtd --model resnet18_imagenet --mode simclr_CSI --shift_trans_type rotation --batch_size 32

For CSI evaluation:
python eval.py --mode ood_pre --dataset dtd --ood_dataset imagenet --model resnet18_imagenet --ood_score CSI --shift_trans_type rotation --print_score --ood_samples 10 --resize_factor 0.54 --resize_fix --load_path <MODEL_PATH>

For the steel dataset, we didn't open the code since it shows similar results with the DTD dataset. Of course, you can download the dataset and run the code: https://www.kaggle.com/c/severstal-steel-defect-detection/data

Thank you again for your interest and feel free to ask if you have any questions!

@dreamflasher
Copy link
Author

Thank you again for your great support and responsiveness and your great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants