Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data split fails #2

Closed
yonatansverdlov opened this issue Jul 10, 2024 · 26 comments
Closed

Data split fails #2

yonatansverdlov opened this issue Jul 10, 2024 · 26 comments

Comments

@yonatansverdlov
Copy link

Hi I have two issues:
First, I run python experiments/utils/data/generate_splits.py --data-root datasets/mnist_classifiers --save-path datasets/splits.json to create the splits and have the following error: raise ValueError(
ValueError: With n_samples=0, test_size=0.25 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.
Second, can you add the networks of all other datasets like CIFAR10, LST?
Thanks!

@AvivNavon
Copy link
Owner

Hi, could you please make sure that the all_files list (here: https://github.com/AvivNavon/deep-align/blob/main/experiments/utils/data/generate_splits.py#L16) is not empty?

@yonatansverdlov
Copy link
Author

It's empty but I followed the instructions.

@AvivNavon
Copy link
Owner

Could you please provide the structure of the datasets/mnist_classifiers folder?

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 10, 2024

it contains around 10K models that end with .pth.
The all_files variable is empty list.

@AvivNavon
Copy link
Owner

But what is the structure of the datasets/mnist_classifiers folder? Are there other folders inside? Or just the *.pth files (e.g., datasets/mnist_classifiers/model_xx.pth)

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 10, 2024 via email

@AvivNavon
Copy link
Owner

Are you sure you pass the correct path as --data-root?
Try this please:

from pathlib import Path
data_root = "datasets/mnist_models"
data_root = Path(data_root)
all_files = [p.as_posix() for p in data_root.glob("**/*.pth")]
all_files[:10]

The output should look like this:

['datasets/mnist_models/model_899.pth', 'datasets/mnist_models/model_3082.pth', 'datasets/mnist_models/model_641.pth', 'datasets/mnist_models/model_4935.pth', 'datasets/mnist_models/model_1695.pth', 'datasets/mnist_models/model_7582.pth', 'datasets/mnist_models/model_6844.pth', 'datasets/mnist_models/model_8869.pth', 'datasets/mnist_models/model_5395.pth', 'datasets/mnist_models/model_127.pth']

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 10, 2024 via email

@AvivNavon
Copy link
Owner

Try providing a full path to datasets/mnist_classifiers (and not relative).

@yonatansverdlov
Copy link
Author

['datasets/mnist_models/model_3302.pth', 'datasets/mnist_models/model_2930.pth', 'datasets/mnist_models/model_2542.pth', 'datasets/mnist_models/model_1457.pth', 'datasets/mnist_models/model_1825.pth', 'datasets/mnist_models/model_4309.pth', 'datasets/mnist_models/model_5549.pth', 'datasets/mnist_models/model_9289.pth', 'datasets/mnist_models/model_2123.pth', 'datasets/mnist_models/model_2684.pth']
Full path yeilds the same.

@AvivNavon
Copy link
Owner

Try running the generate_splits.py command with full path (and maybe provide test/val sizes)

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 10, 2024 via email

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 10, 2024 via email

@AvivNavon
Copy link
Owner

Could you share the exact command you are using and the full trace?
Also, could you please try to debug to understand why the file structure does not fit data_root.glob("**/*.pth") ?

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 11, 2024 via email

@AvivNavon
Copy link
Owner

I think I see the problem, the subfolder is called mnist_models and not mnist_classifiers

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 11, 2024 via email

@AvivNavon
Copy link
Owner

AvivNavon commented Jul 11, 2024

python experiments/utils/data/generate_splits.py --data-root datasets/mnist_models --save-path datasets/splits.json

@AvivNavon
Copy link
Owner

Also, I suggest providing exact sizes for the test/val splits using --test-size and --val-size

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 11, 2024 via email

@AvivNavon
Copy link
Owner

I believe we provide the full experimental details in the Appendix of the paper

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 11, 2024 via email

@AvivNavon
Copy link
Owner

We will make the effort to release other datasets and the supporting code in the future

@AvivNavon
Copy link
Owner

We've released the code for the CNNs experiments

@yonatansverdlov
Copy link
Author

yonatansverdlov commented Jul 30, 2024 via email

@AvivNavon
Copy link
Owner

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants