image preprocessing #3

bakachan19 · 2023-05-10T13:07:55Z

Hi.
Thank you for this interesting package.

I was looking at the examples in the tools, more precisely at docta_rare_pattern.py and using it with the following command:
%run ./tools/docta_rare_pattern.py --feature_type 'embedding' --suffix 'c1m_subset' , following the demo in the notebook docta_rare_pattern_clothes.ipynb.

In this case the dataset is created using:
dataset = Customize_Image_Folder(root=cfg.data_root, transform=None).

Looking at the Customize_Image_Folder(), given that transform =None the image will be transformed using the following function:

if transform is None:
            self.transform = transforms.Compose([
                transforms.Resize(256),
                transforms.CenterCrop(224),
                transforms.ToTensor(),
                transforms.Normalize((0.6959, 0.6537, 0.6371), (0.3113, 0.3192, 0.3214)),
            ])

and followed up by:

feature_all.append(tmp_feature.permute(1, 2, 0).numpy().astype(np.uint8))

But then in the docta_rare_pattern.py file the pre_processor.encode_feature() is used to get the image embeddings but also customized the data (performs another normalization step before extracting the embeddings) with:
dataset_list += [CustomizedDataset(feature=self.dataset.feature, label=self.dataset.label, preprocess=preprocess)], where the preprocess function is retrieved from get_encoder():
model_embedding, _, preprocess = open_clip.create_model_and_transforms(self.cfg.embedding_model)

The CLIP preprocessing function seems to be this one:

Compose(
    Resize(size=224, interpolation=bicubic, max_size=None, antialias=warn)
    CenterCrop(size=(224, 224))
    <function _convert_to_rgb at 0x7ff56ae71f30>
    ToTensor()
    Normalize(mean=[0.48145466, 0.4578275, 0.40821073], std=[0.26862954, 0.26130258, 0.27577711])
)

I am wondering why the images are normalized two times: one within the Customized_Image_Folder() and a second time with the preprocessing function from open_clip.
Are there any steps that I am missing? I cannot fully understand why the double normalization & preprocessing.
Could you please explain why this strategy is used in the method?

I am greatly thankful for any hints and insights that you can provide.
Thank you for your time.

Later edit: I also noticed this in docta_cifar10.py where the data is loaded with Cifar10_noisy(cfg, train=True) which has a transform function defined and then the pre_process.encode_feature() is called which customized the data with another preprocessing step.

I apologize for the long post.

The text was updated successfully, but these errors were encountered:

weijiaheng · 2023-05-10T23:44:26Z

Hi

Thanks for the great catching and your detailed post! We have fixed this "double processing" in our latest commit.

Briefly speaking, in the file customize_img_folder.py, we removed the data preprocessing code when transform=None to avoid double preprocessing. In this case, Docta will simply append features by changing each image from <PIL.Image.Image> to the numpy array. Thus, the revised code will process & normalize the image data for only once when transform=None (default value).

Please feel free to let us know if you have any additional concerns!

Best,
Jiaheng

bakachan19 · 2023-05-11T09:42:50Z

Hi @weijiaheng.
Thank you for your reply and for helping me clear my doubts.

I also went over again the CIFAR10 preprocessing cause I thought there was the same double processing issue also there.
But I missed the following:

docta/docta/datasets/cifar.py

Line 33 in 100384d

self.feature = self.data

and here self.data is not normalized yet (since the normalization is applied by the dataloader and not by the dataset)
and then it just undergoes the preprocessing of CLIP with:
model_embedding, _, preprocess = open_clip.create_model_and_transforms(self.cfg.embedding_model)
and
CustomizedDataset(..., preprocess = preprocess) in line:

docta/docta/core/preprocess.py

Line 129 in 100384d

    
           dataset_list += [CustomizedDataset(feature=self.dataset.feature, label=self.dataset.label, preprocess=preprocess)]

Thank you once again for your help.
Have an amazing day.

zwzhu-d · 2023-05-11T23:46:52Z

Yes, the CIFAR data is not double-processed. The transforms in the CIFAR dataloader are left for future extensions. Thank you for the good catching.

bakachan19 · 2023-05-12T07:19:27Z

Thank you for the clarification @zwzhu-d and @weijiaheng.
It was extremely useful.

weijiaheng added a commit that referenced this issue May 10, 2023

Fix issues: image preprocessing #3

3e38eb5

bakachan19 closed this as completed May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image preprocessing #3

image preprocessing #3

bakachan19 commented May 10, 2023 •

edited

weijiaheng commented May 10, 2023

bakachan19 commented May 11, 2023

zwzhu-d commented May 11, 2023

bakachan19 commented May 12, 2023

image preprocessing #3

image preprocessing #3

Comments

bakachan19 commented May 10, 2023 • edited

weijiaheng commented May 10, 2023

bakachan19 commented May 11, 2023

zwzhu-d commented May 11, 2023

bakachan19 commented May 12, 2023

bakachan19 commented May 10, 2023 •

edited