New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dl4j iterators #4855

Merged
merged 51 commits into from Mar 29, 2018

Conversation

Projects
None yet
2 participants
@agibsonccc
Copy link
Member

agibsonccc commented Mar 27, 2018

What changes were proposed in this pull request?

Split out dl4j iterators from deeplearning4j-core (but still keep same transitive deps)
(Please fill in changes proposed in this fix)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

Quick checklist

The following checklist helps ensure your PR is complete:

  • Reviewed the Contributing Guidelines and followed the steps within.
  • Created tests for any significant new code additions.
  • Relevant tests for your changes are passing.
  • Ran mvn formatter:format (see formatter instructions for targeting your specific files).

agibsonccc added some commits Jan 29, 2018

agibsonccc added some commits Feb 21, 2018

@agibsonccc

This comment has been minimized.

Copy link
Member

agibsonccc commented Mar 27, 2018

@AlexDBlack when you review this just look at the last few commits, I deleted all the changes related to the PR and split them out. The datavec iterators and specialized iterators are the main focus of this PR.

@agibsonccc agibsonccc requested a review from AlexDBlack Mar 27, 2018

agibsonccc added some commits Mar 27, 2018

@AlexDBlack
Copy link
Member

AlexDBlack left a comment

I'm ignoring all the non-data/iterator related changes here.
Overall this makes a lot of sense - though on reflection I'd argue it probably doesn't go far enough :)

Some suggested changes:

  1. Call deeplearning4j-iterators (parent) module deeplearning4j-data instead. More generic/usable in the future with that name
  2. Call deeplearning4j-specialized-iterators something like deeplearning4j-datasets (I'd basically want it to signify to users that this module is for common datasets that are built in to DL4J - "specialized" is too abstract a term for these standard datasets)
  3. Create another module - maybe deeplearning4j-utility-iterators and dump most of the utility/wrapper iterators in there (link) - perhaps excluding a few like Async that are used in MultiLayerNetwork/ComputationGraph etc directly
  4. RR(M)DSI tests should also be moved to deeplearning4j-datavec-iterators link
  5. Relevant (MNIST, EMNIST etc) tests should be moved to deeplearning4j-specialized-iterators link and some of link
@agibsonccc

This comment has been minimized.

Copy link
Member

agibsonccc commented Mar 28, 2018

@AlexDBlack go ahead and do another round

@AlexDBlack
Copy link
Member

AlexDBlack left a comment

Overall, I like how this is looking. Noted a couple of issues, but not much.

Only other general issue here is the removal of the utility classes - i.e., the ones that just extended the ND4J utility classes with no implementation... I did that to avoid breaking changes in user code. I mean we've got plenty already with this release, so I'm not sure we should care much at this point?

<artifactId>deeplearning4j-data</artifactId>
<packaging>pom</packaging>

<name>deeplearning4j-iterators</name>

This comment has been minimized.

@AlexDBlack

AlexDBlack Mar 28, 2018

Member

name should match artifactId (i.e., both should be deeplearning4j-data)

<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-utility-iterators</artifactId>
<version>${project.version}</version>

This comment has been minimized.

@AlexDBlack

AlexDBlack Mar 28, 2018

Member

Do we want deeplearning4j-nn relying on deeplearning4j-utility-iterators?
My first thought here would be no - i.e., -nn should be very minimal.
That's why I was suggesting we keep the async iterators in -nn but move everything else - I think the async ones are the only ones that are used directly in -nn. If we move them to -nn, we can drop this dependency here?
Not sure how strict we want to be with the "-nn should be minimal" design...

This comment has been minimized.

@agibsonccc

agibsonccc Mar 29, 2018

Member

The reason we do this is because the neural net api itself requires the async iterators and the like.

</dependency>
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-specialized-iterators</artifactId>

This comment has been minimized.

@AlexDBlack

AlexDBlack Mar 28, 2018

Member

Update to deeplearning4j-datasets

<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-specialized-iterators</artifactId>
<version>${project.version}</version>
<scope>test</scope>

This comment has been minimized.

@AlexDBlack

AlexDBlack Mar 28, 2018

Member

I'd be inclined to make this default (i.e., compile) scope not test scope to make things easier for users - my thinking is that deeplearning4j-core is "all the usual stuff you might need" - things like MNIST iterator included.
It also saves users wondering what's going on when they upgrade after release - otherwise where did their MNIST iterator go, and now how do they find out about the change?

agibsonccc added some commits Mar 29, 2018

fml

@agibsonccc agibsonccc merged commit 988630b into master Mar 29, 2018

1 of 2 checks passed

codeclimate 132 issues to fix
Details
continuous-integration/jenkins/pr-merge This commit looks good
Details

@agibsonccc agibsonccc deleted the dl4jiterators branch Mar 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment