Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load side information in Tensorflow? #4

Closed
gortizji opened this issue Aug 12, 2022 · 3 comments
Closed

How to load side information in Tensorflow? #4

gortizji opened this issue Aug 12, 2022 · 3 comments

Comments

@gortizji
Copy link

Hi,

I would like to load the side information and associate it with the correct sample in Tensorflow. What order do the side_info_cifar10N.csv and side_info_cifar100N.csv follow, the one of the PyTorch or the Tensorflow files?

And if they don't come in the Tensorflow order, should I load them like this?

import numpy as np
import pandas as pd

noise_file = np.load('./data/CIFAR-10_human_ordered.npy', allow_pickle=True)
random_label1 = noise_file.item().get('random_label1')

train_ds, test_ds = tfds.load('cifar10', split=['train','test'], as_supervised=True, batch_size = -1)
train_images, train_labels = tfds.as_numpy(train_ds) 

side_info_df = pd.read_csv('side_info_cifar10N.csv')
worker1_id = df['Worker1-id'].to_numpy()

# Reorder side information with correct order
image_order = np.load('image_order_c10.npy')
worker1_id_ordered = worker1_id[image_order // 10]

# Now, the indexing of all arrays matches correctly
first_example = (train_images[0], train_labels[0], worker1_id_ordered[0])

Thank you very much!

@weijiaheng
Copy link
Collaborator

Hi,

Thanks for your concerns. To clarify:
(1) The order of side information matches with the PyTorch version;
(2) For the side information, we did not provide per-sample information. And the provided information is w.r.t. 10 images (a small batch), rather than a single image. The reason is that we wish each worker could contribute not too few on the annotations, i.e., at least annotating 10 images.
(3) Suggested solution for making use of the side information is to load the image_order_c10.npy file and obtain the index mapping between PyTorch and Tensorflow versions. Your solution seems to be reasonable to me!

Best,
Jiaheng

@gortizji
Copy link
Author

gortizji commented Feb 3, 2023

Thanks a lot Jiaheng, that's what I assumed.

FYI, CIFAR-10/100N is now part of tensorflow_datasets. You might want to link to it in your README.md 😄 .

@gortizji gortizji closed this as completed Feb 3, 2023
@weijiaheng
Copy link
Collaborator

Thanks for the information, Guille! I will add this to README.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants