Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi io #30

Merged
merged 13 commits into from Apr 30, 2019
Merged

Multi io #30

merged 13 commits into from Apr 30, 2019

Conversation

atremblay
Copy link
Contributor

The core modification was to create a new TensorData that would recursively fetch the proper index value and a function that can aggregate batches of output as one big output.

I disabled one test test_disable_batch_size_warning as I am not sure if it's required anymore. I modified _get_batch_size to account for this new input/output structure, but I am not sure what the original intention was.

@freud14
Copy link
Collaborator

freud14 commented Apr 27, 2019

Hi,
From what I could see, here are a few comments. I might do more later.

  • Since we put back x, y, we should put back validation_x, validation_y in fit_generator.
  • Could you remove type documentation in the *_on_batch and *_generator methods since we don't constrain them anyway.
  • In TensorDataset, there must be cleaner way than with nonlocal to assert the length of the tensors (like return the length in the recursion and asserting that the lengths in the list/tuple are all the same for instance).
  • In the test for TensorDataset, could you test with not only zeros?

Thank you.
Frédérik

@atremblay
Copy link
Contributor Author

  • Since we put back x, y, we should put back validation_x, validation_y in fit_generator.

in fit_generator? Do you mean fit? If so, Keras has validation_data that contains a tuple of (x, y). I can change it to validation_{x|y}, we don't have to stick to Keras, but keeping as many similarities as possible is nice.

  • Could you remove type documentation in the *_on_batch and *_generator methods since we don't constrain them anyway.

We constrain it a little bit, no dictionaries. But it is getting a bit too verbose. I'll remove it. Should I remove it in fit as well?

  • In TensorDataset, there must be cleaner way than with nonlocal to assert the length of the tensors (like return the length in the recursion and asserting that the lengths in the list/tuple are all the same for instance).

I didn't see an elegant way the first time around to make sure everything everywhere has the same batch size, but I'll look into it again.

  • In the test for TensorDataset, could you test with not only zeros?

Sure thing.

@freud14
Copy link
Collaborator

freud14 commented Apr 29, 2019

  • Since we put back x, y, we should put back validation_x, validation_y in fit_generator.

in fit_generator? Do you mean fit? If so, Keras has validation_data that contains a tuple of (x, y). I can change it to validation_{x|y}, we don't have to stick to Keras, but keeping as many similarities as possible is nice.

Yeah, I meant fit. Hmmm, let me think about it.

  • Could you remove type documentation in the *_on_batch and *_generator methods since we don't constrain them anyway.

We constrain it a little bit, no dictionaries. But it is getting a bit too verbose. I'll remove it. Should I remove it in fit as well?

We constrain the types in fit, evaluate and predict but we do not in the *_on_batch and *_generator methods as far as I can tell.

  • In TensorDataset, there must be cleaner way than with nonlocal to assert the length of the tensors (like return the length in the recursion and asserting that the lengths in the list/tuple are all the same for instance).

I didn't see an elegant way the first time around to make sure everything everywhere has the same batch size, but I'll look into it again.

The change you did is fine.

@atremblay
Copy link
Contributor Author

  • Could you remove type documentation in the *_on_batch and *_generator methods since we don't constrain them anyway.

We constrain it a little bit, no dictionaries. But it is getting a bit too verbose. I'll remove it. Should I remove it in fit as well?

We constrain the types in fit, evaluate and predict but we do not in the *_on_batch and *_generator methods as far as I can tell.

Do we directly constrain it? I may be having a major brain fart, I don't see where it's done. Those types are assumed, but not directly enforced, are they?

@freud14
Copy link
Collaborator

freud14 commented Apr 29, 2019

Do we directly constrain it? I may be having a major brain fart, I don't see where it's done. Those types are assumed, but not directly enforced, are they?

In the fit, evaluate and predict methods, your _concat function constrains (or assumes if you will) y and TensorDataset constrains x. In the other methods, there are no constraints.

@atremblay
Copy link
Contributor Author

Ahh gotcha! Ok yeah makes sense.

@freud14
Copy link
Collaborator

freud14 commented Apr 29, 2019

Hi,
Few last points (I think):

  • In the doc, could you specify that validation_data is a tuple (x, y) and say that they are the same thing as x and y previously described? Right now, the type documentation for validation data is different than x and y.
  • In predict_generator, could you specify that the generator should not yield the y contrary to the other *_generator methods ? This is because this confuses people when they try to pass a generator that yields (x, y) and there is an error.
  • In tests/test_utils.py, could you remove the automatically added comment header?
  • In TensorDataset, just put self._len = _rabbit_hole(self.tensors) instead of two lines.

Thank you.
Frédérik Paradis

@atremblay
Copy link
Contributor Author

Sure thing. And damn vim plugin, a bit too eager on the file header.

@freud14 freud14 merged commit 804cb97 into GRAAL-Research:master Apr 30, 2019
@atremblay atremblay deleted the multi_IO branch April 30, 2019 13:41
@freud14
Copy link
Collaborator

freud14 commented Apr 30, 2019

Pull request merged! I might be a good idea to try make a pull request to PyTorch with the extended version of the TensorDataset class.

Merci bien pour ta collaboration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants