Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unfair split in mnist metric learning dataset #949

Merged

Conversation

julia-shenshina
Copy link
Contributor

@julia-shenshina julia-shenshina commented Oct 3, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contribution guide?
  • Did you check the code style? catalyst-make-codestyle && catalyst-check-codestyle (pip install -U catalyst-codestyle).
  • Did you make sure to update the docs? We use Google format for all the methods and classes.
  • Did you check the docs with make check-docs?
  • Did you write any new necessary tests?
  • Did you add your new functionality to the docs?
  • Did you update the CHANGELOG?
  • You can use 'Login as guest' to see Teamcity build logs.

Description

Related Issue

Type of Change

  • Examples / docs / tutorials / contributors update
  • Bug fix (non-breaking change which fixes an issue)
  • Improvement (non-breaking change which improves an existing feature)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Copy link
Contributor

@AlekseySh AlekseySh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

root: root directory of the dataset
train: for MnistMLDataset should always be True

Raises:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove train from args. and consider 2 situations:
train is in kwargs, then we should check that train is True, otherwise raise an Error
train is not in kwargs, then we should add it as train=True

in current implementation we allow to pass train value directly to init, but don't allow to change what, this is confusing behaviour

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is you concern about?
something like this?
kwargs = {"train": False} dataset = MnistMLDataset("./data", True, **kwargs)
it will fail with exception about multiple values for train args.

I thougt it to be a good idea to remind in docs that train should always be True in this case 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i think i understood, ok

@@ -277,5 +345,15 @@ def query_size(self) -> int:
"""Query Gallery dataset should have query_size property"""
return self._query_size

@property
def data(self) -> torch.Tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not?
i think it's ok when you can get data and target from your dataset directly without using private fields
moreover, it gives us some coherence in mnist datasets (train mnist dataset has these properties, classic mnist has too)

return self._mnist.data

@property
def targets(self) -> torch.Tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need it?

@mergify mergify bot dismissed AlekseySh’s stale review October 8, 2020 15:14

Pull request has been modified.

Copy link
Member

@Scitator Scitator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to update readme also :)

@@ -29,7 +29,7 @@ def run_ml_pipeline(sampler_inbatch: data.IInbatchTripletSampler) -> float:
root=dataset_root, train=True, download=True, transform=transforms,
)
sampler = data.BalanceBatchSampler(
labels=dataset_train.get_labels(), p=10, k=10
labels=dataset_train.get_labels(), p=5, k=10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, could you please also change this example into Readme.md?

@mergify mergify bot dismissed Scitator’s stale review October 11, 2020 13:04

Pull request has been modified.

@mergify
Copy link

mergify bot commented Oct 11, 2020

This pull request is now in conflicts. @julia-shenshina, could you fix it? 🙏

@Scitator Scitator merged commit fc5c4ca into catalyst-team:master Oct 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants