Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data extension framework #731

Merged
merged 3 commits into from
May 20, 2016
Merged

Conversation

gheinrich
Copy link
Contributor

@gheinrich gheinrich commented May 10, 2016

Summary

This adds a data extension framework along the lines of:

data-classes

This also adds an image gradient extension (same as this tutorial )

Samples extensions will be added in separate pull requests.

This pull requests depends on #723

Progress

  • add data extension framework
  • add templates
  • add views
  • rebase to tip of master branch / squash commits
  • add sample extension
  • add tests

Documentation may be added later when we want to advertise the feature.





Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks awesome! Here though, it's not immediatelly obvious to me what we're trying to do. Also it's unclear what i should see in the example.

@gheinrich
Copy link
Contributor Author

Thanks for the feedback. Indeed the image reconstruction tutorial is work in progress. The idea is to train a network that learns how to reconstruct distorted images but the model isn't working well. Mostly the example is there to show that you can add a data extension in DIGITS to create a dataset for this task pretty easily. Getting the model to actually work will require more efforts.

The main idea behind this PR is to allow anybody to import any data in DIGITS so you don't have to write a script to create the LMDB yourself. You could imagine importing data from text, csv, numpy, or anything you can read from Python. Possibilities are endless.

A subsequent PR should come to introduce a visualization framework, allowing DIGITS users to easily add and select view extensions to visualize the output of a network.

@TimZaman
Copy link
Contributor

Sounds amazing man, and will be very useful! We were in dire need of this a few weeks back, working on upscaling and incoloring.

@gheinrich gheinrich force-pushed the dev/data-extensions branch 2 times, most recently from c7262cb to ad8325c Compare May 13, 2016 12:12
@gheinrich gheinrich force-pushed the dev/data-extensions branch 6 times, most recently from cbf634a to ee1df82 Compare May 18, 2016 07:07
@gheinrich
Copy link
Contributor Author

Finally got coverage to increase by adding more tests. I think this is ready for review. Thanks!

@lukeyeager
Copy link
Member

lukeyeager commented May 18, 2016

Something funky happened in your rebase with master, and now I can't get to any of the new stuff from the homepage.

Edit: oh, I found this comment:

# set show=True if extension should be listed in known extensions





Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of extra space here

@gheinrich
Copy link
Contributor Author

Something funky happened in your rebase with master, and now I can't get to any of the new stuff from the homepage.

Actually I've made it such that the gradients extension doesn't show on the home page with this line. I thought this was an interesting extension to play with but maybe not something we want all users to see by default. Sorry I should have mentioned this.

try:
job_id = self.create_dataset(json=True, dsopts_num_threads=0)
assert self.dataset_wait_completion(job_id) != 'Done', 'should have failed'
except RuntimeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your checks seem weird. Do you mean to allow the job to fail in one of two ways (either status != 'Done' or RuntimeError)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I am expecting the call to create_dataset to fail I could remove the assert on line 142

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just add an assert False if you don't expect to ever get there.

@lukeyeager
Copy link
Member

Yeah, I found it eventually (edited my post above)

@lukeyeager
Copy link
Member

Since you're hiding the only new feature you added, I'm not sure how to test the UI, exactly. This is just one of several "enabler" pull requests, right? Not much real added value yet?

@gheinrich
Copy link
Contributor Author

I have fixed a lot of PEP8 errors in the latest commits. Hopefully I didn't break anything. I'll do another round of testing tomorrow but I'm still interested in your feedback.

To test you can enable the gradient extension in /digits/extensions/data/__init__.py. I am thinking we don't want to show this by default.

Note that in the dataset creation form, one subset of the fields is coming from the data extension. The other subset is coming from /digits/dataset/generic/forms.py. The latter subset corresponds to the options that apply to all datasets (DB backend, encoding, number of encoder threads, ...).

@gheinrich gheinrich force-pushed the dev/data-extensions branch 2 times, most recently from 1b5fb3e to a313d38 Compare May 19, 2016 11:37
@gheinrich gheinrich force-pushed the dev/data-extensions branch 5 times, most recently from 12815c3 to b98aa96 Compare May 20, 2016 19:27
Integrated with new dataset common interface
This creates a dataset of gradient images.
Gradients are randomly chose in the x and y direction.
Labels are set as in the regression tutorial.
@gheinrich
Copy link
Contributor Author

Rebased again...

@lukeyeager lukeyeager merged commit e6c1ff2 into NVIDIA:master May 20, 2016
@gheinrich gheinrich deleted the dev/data-extensions branch May 24, 2016 11:19
gheinrich added a commit to gheinrich/DIGITS that referenced this pull request May 27, 2016
The common dataset interface exposes a get_mean_file() method to retrieve the mean file.
Using the dataset.mean_file attribute only works for "images/generic" datasets.
This should have been changed in NVIDIA#731
@lukeyeager
Copy link
Member

It's kinda weird that there's a noop test_db task created for object detection datasets:

@gheinrich was this your intention?

object-detection-test-dataset

@gheinrich
Copy link
Contributor Author

For data extensions we always spawn a test_db task from which we create an instance of a create_generic_db.py process. What happens within this process depends on the data extension specifics: in the case of object detection there is no provision to create a test DB so this never actually does anything. The gradient extension, for example, may create an actual test DB if instructed to do so.

So to answer the question: we always need to create a test_db task because we don't know what the extension will choose to do about it. Are you saying we shouldn't show it in the UI if it didn't actually create a database? That would certainly make sense.

@lukeyeager
Copy link
Member

Are you saying we shouldn't show it in the UI if it didn't actually create a database? That would certainly make sense.

I guess that would work, yeah. Either detect that we won't need the task and don't create it (sounds nice), or just hide the fact that we created a useless task (also works).

@gheinrich
Copy link
Contributor Author

We can't detect that we don't need the task because the itemization is running in the external process. But we can not show the task if it's done (without errors) and its entry count is zero. This is implemented in #813.

SlipknotTN pushed a commit to cynnyx/DIGITS that referenced this pull request Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants