New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unfair split in mnist metric learning dataset #949
Fix unfair split in mnist metric learning dataset #949
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
root: root directory of the dataset | ||
train: for MnistMLDataset should always be True | ||
|
||
Raises: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove train
from args. and consider 2 situations:
train
is in kwargs
, then we should check that train is True, otherwise raise an Error
train
is not in kwargs
, then we should add it as train=True
in current implementation we allow to pass train
value directly to init, but don't allow to change what, this is confusing behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is you concern about?
something like this?
kwargs = {"train": False} dataset = MnistMLDataset("./data", True, **kwargs)
it will fail with exception about multiple values for train args.
I thougt it to be a good idea to remind in docs that train should always be True in this case 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, i think i understood, ok
@@ -277,5 +345,15 @@ def query_size(self) -> int: | |||
"""Query Gallery dataset should have query_size property""" | |||
return self._query_size | |||
|
|||
@property | |||
def data(self) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
i think it's ok when you can get data and target from your dataset directly without using private fields
moreover, it gives us some coherence in mnist datasets (train mnist dataset has these properties, classic mnist has too)
return self._mnist.data | ||
|
||
@property | ||
def targets(self) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need it?
Pull request has been modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to update readme also :)
@@ -29,7 +29,7 @@ def run_ml_pipeline(sampler_inbatch: data.IInbatchTripletSampler) -> float: | |||
root=dataset_root, train=True, download=True, transform=transforms, | |||
) | |||
sampler = data.BalanceBatchSampler( | |||
labels=dataset_train.get_labels(), p=10, k=10 | |||
labels=dataset_train.get_labels(), p=5, k=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, could you please also change this example into Readme.md
?
Pull request has been modified.
This pull request is now in conflicts. @julia-shenshina, could you fix it? 🙏 |
…to feature/ml_mnist_dataset
Before submitting
catalyst-make-codestyle && catalyst-check-codestyle
(pip install -U catalyst-codestyle
).make check-docs
?Description
Related Issue
Type of Change
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.