-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data extension framework #731
Conversation
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks awesome! Here though, it's not immediatelly obvious to me what we're trying to do. Also it's unclear what i should see in the example.
Thanks for the feedback. Indeed the image reconstruction tutorial is work in progress. The idea is to train a network that learns how to reconstruct distorted images but the model isn't working well. Mostly the example is there to show that you can add a data extension in DIGITS to create a dataset for this task pretty easily. Getting the model to actually work will require more efforts. The main idea behind this PR is to allow anybody to import any data in DIGITS so you don't have to write a script to create the LMDB yourself. You could imagine importing data from text, csv, numpy, or anything you can read from Python. Possibilities are endless. A subsequent PR should come to introduce a visualization framework, allowing DIGITS users to easily add and select view extensions to visualize the output of a network. |
Sounds amazing man, and will be very useful! We were in dire need of this a few weeks back, working on upscaling and incoloring. |
c7262cb
to
ad8325c
Compare
cbf634a
to
ee1df82
Compare
Finally got coverage to increase by adding more tests. I think this is ready for review. Thanks! |
Something funky happened in your rebase with Edit: oh, I found this comment:
|
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bunch of extra space here
Actually I've made it such that the gradients extension doesn't show on the home page with this line. I thought this was an interesting extension to play with but maybe not something we want all users to see by default. Sorry I should have mentioned this. |
try: | ||
job_id = self.create_dataset(json=True, dsopts_num_threads=0) | ||
assert self.dataset_wait_completion(job_id) != 'Done', 'should have failed' | ||
except RuntimeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your checks seem weird. Do you mean to allow the job to fail in one of two ways (either status != 'Done' or RuntimeError
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I am expecting the call to create_dataset
to fail I could remove the assert on line 142
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just add an assert False
if you don't expect to ever get there.
Yeah, I found it eventually (edited my post above) |
Since you're hiding the only new feature you added, I'm not sure how to test the UI, exactly. This is just one of several "enabler" pull requests, right? Not much real added value yet? |
ee1df82
to
beb804f
Compare
I have fixed a lot of PEP8 errors in the latest commits. Hopefully I didn't break anything. I'll do another round of testing tomorrow but I'm still interested in your feedback. To test you can enable the gradient extension in Note that in the dataset creation form, one subset of the fields is coming from the data extension. The other subset is coming from |
1b5fb3e
to
a313d38
Compare
12815c3
to
b98aa96
Compare
Integrated with new dataset common interface
This creates a dataset of gradient images. Gradients are randomly chose in the x and y direction. Labels are set as in the regression tutorial.
b98aa96
to
46c8ff3
Compare
Rebased again... |
The common dataset interface exposes a get_mean_file() method to retrieve the mean file. Using the dataset.mean_file attribute only works for "images/generic" datasets. This should have been changed in NVIDIA#731
It's kinda weird that there's a noop test_db task created for object detection datasets: @gheinrich was this your intention? |
For data extensions we always spawn a So to answer the question: we always need to create a |
I guess that would work, yeah. Either detect that we won't need the task and don't create it (sounds nice), or just hide the fact that we created a useless task (also works). |
We can't detect that we don't need the task because the itemization is running in the external process. But we can not show the task if it's done (without errors) and its entry count is zero. This is implemented in #813. |
Data extension framework
Summary
This adds a data extension framework along the lines of:
This also adds an image gradient extension (same as this tutorial )
Samples extensions will be added in separate pull requests.
This pull requests depends on #723
Progress
Documentation may be added later when we want to advertise the feature.