Conversation
|
Cool! It would be nice to have the option to view the confusion matrix in either bullet-point format, or as a color-coded table (color of cell to indicate percentages, like white = 0, green = 100, with a gradient in between). |
|
Please squash your patches. |
|
Thanks for the contribution, @groar! I hope we can get this merged soon. |
|
When I try to evaluate AlexNet trained on ImageNet, I get this error: When I try to evaluate LeNet trained on MNIST, I get this error (probably something to do with grayscale images?): What dataset/network are you using? I can't get anything to work. |
digits/evaluation/tasks/accuracy.py
Outdated
There was a problem hiding this comment.
English variable names, please :)
|
I'm currently fixing the grayscale problem, but I can't manage to reproduce your problem with Alexnet. I tested it with alexnet, googlenet and vgg, with no problem (I used some standard color datasets). |
|
@lukeyeager I fixed the grayscale bug and tested it on MNIST / LeNet. My changes may also have fixed the problem you had with AlexNet. On my side, it works with every standard network. Let me know if you still have those issues. |
|
Thanks for the fixes! LeNet on MNIST works for me now. But I'm still getting the same error with AlexNet: It looks like an issue with cropping. I think you meant to reshape the input, like I do here, rather than setting the input to zeros. I'm curious why this works for you, though. The standard AlexNet network crops to 227. |
|
Ok, this is weird, because I really can't reproduce the bug. That part of the code in fact comes from Caffe, (there is a small modification but this is pretty much the original Classifier class) so it should not be problematic. I don't see why this should raise an index error exception... Which dataset are you using ? |
tools/compute_accuracy.py
Outdated
There was a problem hiding this comment.
Why aren't you passing the prebuilt database instead of the textfile? Those images are already loaded and resized. Then all you would need to know is the crop size.
|
I don't really like having Evaluations as Jobs, it feels like they should rather be Tasks instead of Jobs. But in the interest of getting this new functionality merged, I'm ok with keeping them as a new Job type for now. But displaying them at the bottom of the homepage is a problem. Nobody is going to be able to find them. Can we have the list of EvaluationJobs shown somewhere on the model page? |
|
Also, if you delete a ModelJob and there are dependent EvaluationJobs, you run into issues. There should be some sort of a check for EvaluationJobs dependent on a ModelJob like we have for DatasetJobs. |
tools/compute_accuracy.py
Outdated
|
As I'm trying to track down my issue with the |
|
Ok for pickle instead of joblib, and I'm going to write some tests as well ! I will also take care of the dependency between jobs. I don't really know about the evaluation being a job. At first I designed it as a task, and then changed my mind for some reason (I was thinking about future performance evaluations tasks that involve more than one model, in which case a model task is not so adapted). |
|
Concerning the test, it's the same for me, the test succeeds on my machine although it doesn't for Travis... @lukeyeager I was thinking about the way to display the related evaluationJobs on the model page. In general, it would be nice to be able to see all the dependencies on a job page. For example, I'd love to see directly in the Dataset view a summary of all Models I've trained on it. What do you think about displaying a "Related jobs" section under the "Job Status" cell ? I could do that instead of just adding a ad-hoc section in the Model view. |
Great, thanks!
Verified. You could test for this with a test like this one.
That's an interesting thought. Just stick with the EvaluationJob for now.
Well, now the test are failing for other reasons. Fixes here: https://github.com/Deepomatic/DIGITS/pull/4. It might be a good idea to turn on Travis for your fork so you can see which tests fail before updating this pull request.
Yep, sounds great! |
|
I've added:
When I have some time, I will probably add some tests for compute_accuracy.py, but I think pretty much everything else works. I have still not been able to reproduce the "index out of range" error though. |
|
Toggling PR status to retrigger Travis build ... |
|
Maybe I just need to merge master into perf to remove conflicts ? |
Yeah, I think that's correct. So do this: git checkout perf
# Assuming NVIDIA/DIGITS is called "upstream"
git fetch upstream
# You can squash down to fewer commits while you're at it
git rebase -i upstream/master
git push |
|
There remains some problem with the tests, but I don't know how to fix them (the generate_docs.py script yields an error on my side..). |
Did you install On my side, I'm still trying to debug why your code is giving me that |
That looks great, nice work! |
tools/compute_accuracy.py
Outdated
There was a problem hiding this comment.
I don't know what's supposed to be in img_matrix, but this isn't right. For my dataset with 20 classes, num_classes is getting set to 16. Are you assuming that the images are sorted by class?
|
You moved the listing of evaluations to the |
|
+1 for this feature |
|
I've been very busy, but I'm back trying to merge this pull request ! I think I've fixed everything that was not working, and I've updated my branch to keep up with the current master. I need to port the old tests, refactor some code, and it should be ok. Oh, and now when you delete a model, it deletes all the dependent evaluation jobs. I think this is the expected behavior. |
There was a problem hiding this comment.
I don't think you need to make any changes to this file. Merge error?
|
Thanks for the rebase work @groar! @gheinrich and @jmancewicz, will y'all take a look at this? We need some sort of a framework for model "evaluations" and this is not a bad way to do it. |
There was a problem hiding this comment.
Ideally the core DIGITS code should be framework independent and communicate with the DL framework using the Framework class interface (see https://github.com/NVIDIA/DIGITS/blob/master/digits/frameworks/framework.py).
Perhaps you can do something like:
fw_id = job.model_job.train_task().get_framework_id()
fw = frameworks.get_framework_by_id(fw_id)
job.tasks.append(fw.create_accuracy_task(...))
You might also want to add a function in the framework interface that tells whether the underlying framework supports this type of tasks.
There was a problem hiding this comment.
Ok great for the info, I had in mind that something like that should be needed but did not look at it. I'll do that.
|
The tests for EvaluationJob are still not passing on Travis, but are when I launch them locally. I thought it was a missing package, but it doesn't seem so. If someone can tell me if it works on another environment.. |
The tests do not pass for me. |
|
Ok thanks, that will probably help me |
Epoch can be a float Remove debug statement Match style changes added in 3642aed and 7103d8b Add basic test for evaluations replaced joblib by pickle Check dependencies between EvaluationJob and parent ModelJob Only use GPU for inference when CUDA enabled Update documentation with EvaluationJob routes Added a related jobs section to jobs show views. Added unit tests for evaluations bugfix fixing docs changes config fix doc fixing tests adding evaluation tests test minor fixes perf added package fixing evaluation test compute seek fix accuracy last
|
Closing as abandoned. |
See #17
Adds a new kind of job for performance evaluation of trained classifiers. It is now possible to visualize :
Accuracy and the confusion matrix are computed against a chosen snapshot of a training task, and against both the validation set and testing set (if it exists). An "evaluate performance" button has been added on the training view. This is currently the only way to run an evaluation job. The results are stored in the job directory in the form of two pickle files.
Accuracy / recall curve
Confusion matrix
I chose a very simple representation of the confusion matrix (not in the form of a matrix !), because it is more adapted to datasets with lots of classes. For each class, the top 10 most represented classes are displayed, with their respective %.
Related jobs
I added a "Related jobs" section on each job show view. It displays the jobs which depends on the current job. For example, models trained on a specific dataset, evaluations ran on a specific model.
Let me know what you think, critiques and comments are more than welcome.