Data extension framework #731

gheinrich · 2016-05-10T14:30:58Z

Summary

This adds a data extension framework along the lines of:

This also adds an image gradient extension (same as this tutorial )

Samples extensions will be added in separate pull requests.

This pull requests depends on #723

Progress

Documentation may be added later when we want to advertise the feature.

TimZaman · 2016-05-11T17:20:57Z

examples/image-reconstruction/README.md

+
+
+
+


This PR looks awesome! Here though, it's not immediatelly obvious to me what we're trying to do. Also it's unclear what i should see in the example.

gheinrich · 2016-05-11T19:11:06Z

Thanks for the feedback. Indeed the image reconstruction tutorial is work in progress. The idea is to train a network that learns how to reconstruct distorted images but the model isn't working well. Mostly the example is there to show that you can add a data extension in DIGITS to create a dataset for this task pretty easily. Getting the model to actually work will require more efforts.

The main idea behind this PR is to allow anybody to import any data in DIGITS so you don't have to write a script to create the LMDB yourself. You could imagine importing data from text, csv, numpy, or anything you can read from Python. Possibilities are endless.

A subsequent PR should come to introduce a visualization framework, allowing DIGITS users to easily add and select view extensions to visualize the output of a network.

TimZaman · 2016-05-11T19:40:22Z

Sounds amazing man, and will be very useful! We were in dire need of this a few weeks back, working on upscaling and incoloring.

gheinrich · 2016-05-18T12:50:53Z

Finally got coverage to increase by adding more tests. I think this is ready for review. Thanks!

lukeyeager · 2016-05-18T20:23:11Z

Something funky happened in your rebase with master, and now I can't get to any of the new stuff from the homepage.

Edit: oh, I found this comment:

# set show=True if extension should be listed in known extensions

lukeyeager · 2016-05-18T20:28:25Z

digits/dataset/generic/job.py

+
+
+
+


Bunch of extra space here

gheinrich · 2016-05-18T20:49:01Z

Something funky happened in your rebase with master, and now I can't get to any of the new stuff from the homepage.

Actually I've made it such that the gradients extension doesn't show on the home page with this line. I thought this was an interesting extension to play with but maybe not something we want all users to see by default. Sorry I should have mentioned this.

lukeyeager · 2016-05-18T20:49:28Z

digits/dataset/generic/test_views.py

+        try:
+            job_id = self.create_dataset(json=True, dsopts_num_threads=0)
+            assert self.dataset_wait_completion(job_id) != 'Done', 'should have failed'
+        except RuntimeError:


Your checks seem weird. Do you mean to allow the job to fail in one of two ways (either status != 'Done' or RuntimeError)?

actually I am expecting the call to create_dataset to fail I could remove the assert on line 142

Or just add an assert False if you don't expect to ever get there.

lukeyeager · 2016-05-18T21:14:08Z

Yeah, I found it eventually (edited my post above)

lukeyeager · 2016-05-18T21:37:00Z

Since you're hiding the only new feature you added, I'm not sure how to test the UI, exactly. This is just one of several "enabler" pull requests, right? Not much real added value yet?

gheinrich · 2016-05-18T22:51:05Z

I have fixed a lot of PEP8 errors in the latest commits. Hopefully I didn't break anything. I'll do another round of testing tomorrow but I'm still interested in your feedback.

To test you can enable the gradient extension in /digits/extensions/data/__init__.py. I am thinking we don't want to show this by default.

Note that in the dataset creation form, one subset of the fields is coming from the data extension. The other subset is coming from /digits/dataset/generic/forms.py. The latter subset corresponds to the options that apply to all datasets (DB backend, encoding, number of encoder threads, ...).

Integrated with new dataset common interface

This creates a dataset of gradient images. Gradients are randomly chose in the x and y direction. Labels are set as in the regression tutorial.

gheinrich · 2016-05-20T20:20:01Z

Rebased again...

The common dataset interface exposes a get_mean_file() method to retrieve the mean file. Using the dataset.mean_file attribute only works for "images/generic" datasets. This should have been changed in NVIDIA#731

lukeyeager · 2016-06-01T22:45:34Z

It's kinda weird that there's a noop test_db task created for object detection datasets:

@gheinrich was this your intention?

gheinrich · 2016-06-02T09:03:18Z

For data extensions we always spawn a test_db task from which we create an instance of a create_generic_db.py process. What happens within this process depends on the data extension specifics: in the case of object detection there is no provision to create a test DB so this never actually does anything. The gradient extension, for example, may create an actual test DB if instructed to do so.

So to answer the question: we always need to create a test_db task because we don't know what the extension will choose to do about it. Are you saying we shouldn't show it in the UI if it didn't actually create a database? That would certainly make sense.

lukeyeager · 2016-06-02T17:07:40Z

Are you saying we shouldn't show it in the UI if it didn't actually create a database? That would certainly make sense.

I guess that would work, yeah. Either detect that we won't need the task and don't create it (sounds nice), or just hide the fact that we created a useless task (also works).

gheinrich · 2016-06-07T06:21:32Z

We can't detect that we don't need the task because the itemization is running in the external process. But we can not show the task if it's done (without errors) and its entry count is zero. This is implemented in #813.

Data extension framework

TimZaman reviewed May 11, 2016
View reviewed changes

gheinrich force-pushed the dev/data-extensions branch 2 times, most recently from c7262cb to ad8325c Compare May 13, 2016 12:12

lukeyeager added the enhancement label May 16, 2016

gheinrich force-pushed the dev/data-extensions branch 6 times, most recently from cbf634a to ee1df82 Compare May 18, 2016 07:07

gheinrich mentioned this pull request May 18, 2016

View extension framework #756

Merged

lukeyeager reviewed May 18, 2016
View reviewed changes

digits/dataset/generic/job.py

Copy link

Member

lukeyeager May 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of extra space here

lukeyeager reviewed May 18, 2016
View reviewed changes

gheinrich force-pushed the dev/data-extensions branch from ee1df82 to beb804f Compare May 18, 2016 22:46

gheinrich force-pushed the dev/data-extensions branch 2 times, most recently from 1b5fb3e to a313d38 Compare May 19, 2016 11:37

gheinrich force-pushed the dev/data-extensions branch 5 times, most recently from 12815c3 to b98aa96 Compare May 20, 2016 19:27

gheinrich added 3 commits May 20, 2016 22:17

Generic datasets

37b60a0

Integrated with new dataset common interface

Add Image Gradient data extension

fead727

This creates a dataset of gradient images. Gradients are randomly chose in the x and y direction. Labels are set as in the regression tutorial.

Add generic dataset tests

46c8ff3

gheinrich force-pushed the dev/data-extensions branch from b98aa96 to 46c8ff3 Compare May 20, 2016 20:18

lukeyeager merged commit e6c1ff2 into NVIDIA:master May 20, 2016

gheinrich deleted the dev/data-extensions branch May 24, 2016 11:19

SlipknotTN pushed a commit to cynnyx/DIGITS that referenced this pull request Mar 30, 2017

Merge pull request NVIDIA#731 from gheinrich/dev/data-extensions

46ba3ff

Data extension framework

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data extension framework #731

Data extension framework #731

gheinrich commented May 10, 2016 •

edited

Loading

TimZaman May 11, 2016

gheinrich commented May 11, 2016

TimZaman commented May 11, 2016

gheinrich commented May 18, 2016

lukeyeager commented May 18, 2016 •

edited

Loading

lukeyeager May 18, 2016

gheinrich commented May 18, 2016

lukeyeager May 18, 2016

gheinrich May 18, 2016

lukeyeager May 18, 2016

lukeyeager commented May 18, 2016

lukeyeager commented May 18, 2016

gheinrich commented May 18, 2016

gheinrich commented May 20, 2016

lukeyeager commented Jun 1, 2016

gheinrich commented Jun 2, 2016

lukeyeager commented Jun 2, 2016

gheinrich commented Jun 7, 2016

Data extension framework #731

Data extension framework #731

Conversation

gheinrich commented May 10, 2016 • edited Loading

Summary

Progress

TimZaman May 11, 2016

Choose a reason for hiding this comment

gheinrich commented May 11, 2016

TimZaman commented May 11, 2016

gheinrich commented May 18, 2016

lukeyeager commented May 18, 2016 • edited Loading

lukeyeager May 18, 2016

Choose a reason for hiding this comment

gheinrich commented May 18, 2016

lukeyeager May 18, 2016

Choose a reason for hiding this comment

gheinrich May 18, 2016

Choose a reason for hiding this comment

lukeyeager May 18, 2016

Choose a reason for hiding this comment

lukeyeager commented May 18, 2016

lukeyeager commented May 18, 2016

gheinrich commented May 18, 2016

gheinrich commented May 20, 2016

lukeyeager commented Jun 1, 2016

gheinrich commented Jun 2, 2016

lukeyeager commented Jun 2, 2016

gheinrich commented Jun 7, 2016

gheinrich commented May 10, 2016 •

edited

Loading

lukeyeager commented May 18, 2016 •

edited

Loading