Fix unfair split in mnist metric learning dataset #949

julia-shenshina · 2020-10-03T19:42:56Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contribution guide?
Did you check the code style? catalyst-make-codestyle && catalyst-check-codestyle (pip install -U catalyst-codestyle).
Did you make sure to update the docs? We use Google format for all the methods and classes.
Did you check the docs with make check-docs?
Did you write any new necessary tests?
Did you add your new functionality to the docs?
Did you update the CHANGELOG?
You can use 'Login as guest' to see Teamcity build logs.

Description

Related Issue

Type of Change

Examples / docs / tutorials / contributors update
Bug fix (non-breaking change which fixes an issue)
Improvement (non-breaking change which improves an existing feature)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

AlekseySh

looks good

AlekseySh · 2020-10-08T11:05:36Z

catalyst/contrib/datasets/mnist.py

+            root: root directory of the dataset
+            train: for MnistMLDataset should always be True
+
+        Raises:


I think we should remove train from args. and consider 2 situations:
train is in kwargs, then we should check that train is True, otherwise raise an Error
train is not in kwargs, then we should add it as train=True

in current implementation we allow to pass train value directly to init, but don't allow to change what, this is confusing behaviour

what is you concern about?
something like this?
kwargs = {"train": False} dataset = MnistMLDataset("./data", True, **kwargs)
it will fail with exception about multiple values for train args.

I thougt it to be a good idea to remind in docs that train should always be True in this case 🤔

oh, i think i understood, ok

AlekseySh · 2020-10-08T11:08:32Z

catalyst/contrib/datasets/mnist.py

@@ -277,5 +345,15 @@ def query_size(self) -> int:
        """Query Gallery dataset should have query_size property"""
        return self._query_size

+    @property
+    def data(self) -> torch.Tensor:


why do we need it?

why not?
i think it's ok when you can get data and target from your dataset directly without using private fields
moreover, it gives us some coherence in mnist datasets (train mnist dataset has these properties, classic mnist has too)

AlekseySh · 2020-10-08T11:08:36Z

catalyst/contrib/datasets/mnist.py

+        return self._mnist.data
+
+    @property
+    def targets(self) -> torch.Tensor:


why do we need it?

Pull request has been modified.

Scitator

we need to update readme also :)

Scitator · 2020-10-11T12:10:02Z

tests/_tests_scripts/dl_z_mvp_mnist_metric_learning.py

@@ -29,7 +29,7 @@ def run_ml_pipeline(sampler_inbatch: data.IInbatchTripletSampler) -> float:
        root=dataset_root, train=True, download=True, transform=transforms,
    )
    sampler = data.BalanceBatchSampler(
-        labels=dataset_train.get_labels(), p=10, k=10
+        labels=dataset_train.get_labels(), p=5, k=10


btw, could you please also change this example into Readme.md?

Pull request has been modified.

mergify · 2020-10-11T16:23:32Z

This pull request is now in conflicts. @julia-shenshina, could you fix it? 🙏

…to feature/ml_mnist_dataset

Fix problem with mnist metric learning dataset

d69af6f

julia-shenshina requested review from bagxi, ditwoo and Scitator as code owners October 3, 2020 19:42

Fix p in pk-sampler ml tests for mnist

4df4b9b

Scitator requested a review from AlekseySh October 4, 2020 08:08

Update mnist metric learning datasets

0ec630e

AlekseySh previously requested changes Oct 8, 2020

View reviewed changes

Hide train to kwargs

dd7ab8f

Add train arg to kwargs

f893fc4

Scitator previously requested changes Oct 11, 2020

View reviewed changes

Fix ml example in README

441a040

Update CHANGELOG

632d4e1

Merge branch 'master' of https://github.com/catalyst-team/catalyst in…

e6e49c7

…to feature/ml_mnist_dataset

Scitator approved these changes Oct 11, 2020

View reviewed changes

Scitator merged commit fc5c4ca into catalyst-team:master Oct 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unfair split in mnist metric learning dataset #949

Fix unfair split in mnist metric learning dataset #949

julia-shenshina commented Oct 3, 2020 •

edited

AlekseySh left a comment

AlekseySh Oct 8, 2020

julia-shenshina Oct 8, 2020 •

edited

julia-shenshina Oct 8, 2020

AlekseySh Oct 8, 2020

julia-shenshina Oct 8, 2020

AlekseySh Oct 8, 2020

Scitator left a comment

Scitator Oct 11, 2020

mergify bot commented Oct 11, 2020

Fix unfair split in mnist metric learning dataset #949

Fix unfair split in mnist metric learning dataset #949

Conversation

julia-shenshina commented Oct 3, 2020 • edited

Before submitting

Description

Related Issue

Type of Change

PR review

AlekseySh left a comment

Choose a reason for hiding this comment

AlekseySh Oct 8, 2020

Choose a reason for hiding this comment

julia-shenshina Oct 8, 2020 • edited

Choose a reason for hiding this comment

julia-shenshina Oct 8, 2020

Choose a reason for hiding this comment

AlekseySh Oct 8, 2020

Choose a reason for hiding this comment

julia-shenshina Oct 8, 2020

Choose a reason for hiding this comment

AlekseySh Oct 8, 2020

Choose a reason for hiding this comment

Scitator left a comment

Choose a reason for hiding this comment

Scitator Oct 11, 2020

Choose a reason for hiding this comment

mergify bot commented Oct 11, 2020

julia-shenshina commented Oct 3, 2020 •

edited

julia-shenshina Oct 8, 2020 •

edited