Accuracy & confusion matrix by groar · Pull Request #137 · NVIDIA/DIGITS

groar · 2015-06-08T18:48:00Z

Adds a new kind of job for performance evaluation of trained classifiers. It is now possible to visualize :

accuracy / recall curve
confusion matrix

Accuracy and the confusion matrix are computed against a chosen snapshot of a training task, and against both the validation set and testing set (if it exists). An "evaluate performance" button has been added on the training view. This is currently the only way to run an evaluation job. The results are stored in the job directory in the form of two pickle files.

Accuracy / recall curve

Confusion matrix

I chose a very simple representation of the confusion matrix (not in the form of a matrix !), because it is more adapted to datasets with lots of classes. For each class, the top 10 most represented classes are displayed, with their respective %.

Related jobs

I added a "Related jobs" section on each job show view. It displays the jobs which depends on the current job. For example, models trained on a specific dataset, evaluations ran on a specific model.

Let me know what you think, critiques and comments are more than welcome.

jcohenpersonal · 2015-06-08T19:44:15Z

Cool!

It would be nice to have the option to view the confusion matrix in either bullet-point format, or as a color-coded table (color of cell to indicate percentages, like white = 0, green = 100, with a gradient in between).

flx42 · 2015-06-08T20:57:54Z

Please squash your patches.

lukeyeager · 2015-06-08T23:51:24Z

Thanks for the contribution, @groar! I hope we can get this merged soon.

lukeyeager · 2015-06-08T23:57:21Z

When I try to evaluate AlexNet trained on ImageNet, I get this error:

Traceback (most recent call last):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 223, in <module>
    args['resize_mode']):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 162, in compute_accuracy
    probas[i] = net.predict([input_image], oversample=False)
File "/home/lyeager/digits/tools/compute_accuracy.py", line 79, in predict
    self.image_dims[0], self.image_dims[1], inputs[0].shape[2]),
IndexError: tuple index out of range

When I try to evaluate LeNet trained on MNIST, I get this error (probably something to do with grayscale images?):

Traceback (most recent call last):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 223, in <module>
    args['resize_mode']):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 147, in compute_accuracy
    raw_scale=255)
File "/home/lyeager/digits/tools/compute_accuracy.py", line 56, in __init__
    self.transformer.set_channel_swap(in_, channel_swap)
File "/home/lyeager/caffe/python/caffe/io.py", line 212, in set_channel_swap
    raise Exception('Channel swap needs to have the same number of '
Exception: Channel swap needs to have the same number of dimensions as the input channels.

What dataset/network are you using? I can't get anything to work.

lukeyeager · 2015-06-09T00:00:46Z

digits/evaluation/tasks/accuracy.py

English variable names, please :)

groar · 2015-06-09T08:49:00Z

I'm currently fixing the grayscale problem, but I can't manage to reproduce your problem with Alexnet. I tested it with alexnet, googlenet and vgg, with no problem (I used some standard color datasets).

groar · 2015-06-09T09:52:39Z

@lukeyeager I fixed the grayscale bug and tested it on MNIST / LeNet. My changes may also have fixed the problem you had with AlexNet. On my side, it works with every standard network. Let me know if you still have those issues.

lukeyeager · 2015-06-09T16:54:36Z

Thanks for the fixes!

LeNet on MNIST works for me now. But I'm still getting the same error with AlexNet:

Traceback (most recent call last):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 233, in <module>
    grayscale=args['grayscale']):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 172, in compute_accuracy
    probas[i] = net.predict([input_image], oversample=False)
File "/home/lyeager/digits/tools/compute_accuracy.py", line 79, in predict
    self.image_dims[0], self.image_dims[1], inputs[0].shape[2]),
IndexError: tuple index out of range

It looks like an issue with cropping.

len(inputs):      1
inputs[0].shape:  (256, 256, 3)
image_dims:       [227 227]

I think you meant to reshape the input, like I do here, rather than setting the input to zeros.

I'm curious why this works for you, though. The standard AlexNet network crops to 227.

groar · 2015-06-11T14:11:26Z

Ok, this is weird, because I really can't reproduce the bug. That part of the code in fact comes from Caffe, (there is a small modification but this is pretty much the original Classifier class) so it should not be problematic.

I don't see why this should raise an index error exception...

np.zeros((1, 227, 227, 3), dtype=np.float32)

Which dataset are you using ?

lukeyeager · 2015-06-12T18:05:18Z

tools/compute_accuracy.py

Why aren't you passing the prebuilt database instead of the textfile? Those images are already loaded and resized. Then all you would need to know is the crop size.

lukeyeager · 2015-06-12T20:55:02Z

I don't really like having Evaluations as Jobs, it feels like they should rather be Tasks instead of Jobs. But in the interest of getting this new functionality merged, I'm ok with keeping them as a new Job type for now.

But displaying them at the bottom of the homepage is a problem. Nobody is going to be able to find them. Can we have the list of EvaluationJobs shown somewhere on the model page?

lukeyeager · 2015-06-12T20:57:27Z

Also, if you delete a ModelJob and there are dependent EvaluationJobs, you run into issues. There should be some sort of a check for EvaluationJobs dependent on a ModelJob like we have for DatasetJobs.

lukeyeager · 2015-06-12T21:03:57Z

tools/compute_accuracy.py

Can we use Pickle instead of joblib to avoid adding a new dependency? See usage here and here.

lukeyeager · 2015-06-12T21:07:53Z

As I'm trying to track down my issue with the IndexError, I added a basic test (see https://github.com/Deepomatic/DIGITS/pull/2) for EvaluateJobs which works on my machine, but which fails for Travis. I'm not sure why it fails, but can you try the test on your machine? Also, some tests for tools/compute_accuracy.py would be awesome.

groar · 2015-06-13T09:27:39Z

Ok for pickle instead of joblib, and I'm going to write some tests as well ! I will also take care of the dependency between jobs.

I don't really know about the evaluation being a job. At first I designed it as a task, and then changed my mind for some reason (I was thinking about future performance evaluations tasks that involve more than one model, in which case a model task is not so adapted).

groar · 2015-06-15T09:15:38Z

Concerning the test, it's the same for me, the test succeeds on my machine although it doesn't for Travis...

@lukeyeager I was thinking about the way to display the related evaluationJobs on the model page. In general, it would be nice to be able to see all the dependencies on a job page. For example, I'd love to see directly in the Dataset view a summary of all Models I've trained on it. What do you think about displaying a "Related jobs" section under the "Job Status" cell ? I could do that instead of just adding a ad-hoc section in the Model view.

lukeyeager · 2015-06-15T18:33:45Z

Ok for pickle instead of joblib

Great, thanks!

I will also take care of the dependency between jobs.

Verified. You could test for this with a test like this one.

I was thinking about future performance evaluations tasks that involve more than one model, in which case a model task is not so adapted.

That's an interesting thought. Just stick with the EvaluationJob for now.

Concerning the test, it's the same for me, the test succeeds on my machine although it doesn't for Travis...

Well, now the test are failing for other reasons. Fixes here: https://github.com/Deepomatic/DIGITS/pull/4. It might be a good idea to turn on Travis for your fork so you can see which tests fail before updating this pull request.

In general, it would be nice to be able to see all the dependencies on a job page. For example, I'd love to see directly in the Dataset view a summary of all Models I've trained on it. What do you think about displaying a "Related jobs" section under the "Job Status" cell ? I could do that instead of just adding a ad-hoc section in the Model view.

Yep, sounds great!

groar · 2015-06-16T13:54:27Z

I've added:

the "related jobs" section
some tests on evaluations

When I have some time, I will probably add some tests for compute_accuracy.py, but I think pretty much everything else works. I have still not been able to reproduce the "index out of range" error though.

lukeyeager · 2015-06-16T16:27:53Z

Toggling PR status to retrigger Travis build ...

groar · 2015-06-16T17:12:41Z

Maybe I just need to merge master into perf to remove conflicts ?

lukeyeager · 2015-06-16T17:17:31Z

Maybe I just need to merge master into perf to remove conflicts ?

Yeah, I think that's correct.
travis-ci/travis-ci#4102 (comment)

So do this:

git checkout perf
# Assuming NVIDIA/DIGITS is called "upstream"
git fetch upstream
# You can squash down to fewer commits while you're at it
git rebase -i upstream/master
git push

groar · 2015-06-18T12:57:36Z

There remains some problem with the tests, but I don't know how to fix them (the generate_docs.py script yields an error on my side..).

lukeyeager · 2015-06-18T16:27:40Z

the generate_docs.py script yields an error on my side

Did you install Flask-Autodoc with pip? I had to create a custom version to get what I needed. See this comment.

$ pip install -r requirements_test.txt
$ ./scripts/generate_docs.py

On my side, I'm still trying to debug why your code is giving me that IndexError, but the use of the "classify" code from Caffe adds some unnecessary complexity. I just wrote a classification example that shows how you can do classification a little more simply.

lukeyeager · 2015-06-25T17:59:29Z

What do you think about displaying a "Related jobs" section under the "Job Status" cell?

That looks great, nice work!

lukeyeager · 2015-06-25T18:22:15Z

tools/compute_accuracy.py

I don't know what's supposed to be in img_matrix, but this isn't right. For my dataset with 20 classes, num_classes is getting set to 16. Are you assuming that the images are sorted by class?

lukeyeager · 2015-06-25T18:24:51Z

You moved the listing of evaluations to the models#show page, and that's great, but you still left the changes to home.html. Also, can you add a "Delete All" button above the listing of dependent jobs?

jmozah · 2015-07-25T17:44:25Z

+1 for this feature

groar · 2015-09-08T13:45:05Z

I've been very busy, but I'm back trying to merge this pull request ! I think I've fixed everything that was not working, and I've updated my branch to keep up with the current master. I need to port the old tests, refactor some code, and it should be ok.

Oh, and now when you delete a model, it deletes all the dependent evaluation jobs. I think this is the expected behavior.

lukeyeager · 2015-09-08T17:49:25Z

digits/config/caffe_option.py

I don't think you need to make any changes to this file. Merge error?

Yes, gonna fix that.

lukeyeager · 2015-09-08T19:46:42Z

Thanks for the rebase work @groar!

@gheinrich and @jmancewicz, will y'all take a look at this? We need some sort of a framework for model "evaluations" and this is not a bad way to do it.

gheinrich · 2015-09-09T13:22:59Z

digits/evaluation/images/classification/views.py

Ideally the core DIGITS code should be framework independent and communicate with the DL framework using the Framework class interface (see https://github.com/NVIDIA/DIGITS/blob/master/digits/frameworks/framework.py).
Perhaps you can do something like:

fw_id = job.model_job.train_task().get_framework_id() fw = frameworks.get_framework_by_id(fw_id) job.tasks.append(fw.create_accuracy_task(...))

You might also want to add a function in the framework interface that tells whether the underlying framework supports this type of tasks.

Ok great for the info, I had in mind that something like that should be needed but did not look at it. I'll do that.

groar · 2015-09-09T16:48:24Z

The tests for EvaluationJob are still not passing on Travis, but are when I launch them locally. I thought it was a missing package, but it doesn't seem so. If someone can tell me if it works on another environment..

lukeyeager · 2015-09-15T16:40:38Z

If someone can tell me if it works on another environment..

The tests do not pass for me.

1) ERROR: test suite for <class 'digits.evaluation.images.classification.test_views.TestCaffeCreated'>

   Traceback (most recent call last):
    /usr/lib/python2.7/dist-packages/nose/suite.py line 208 in run
      self.setUp()
    /usr/lib/python2.7/dist-packages/nose/suite.py line 291 in setUp
      self.setupContext(ancestor)
    /usr/lib/python2.7/dist-packages/nose/suite.py line 314 in setupContext
      try_run(context, names)
    /usr/lib/python2.7/dist-packages/nose/util.py line 471 in try_run
      return func()
    digits/evaluation/images/classification/test_views.py line 147 in setUpClass
      assert cls.evaluation_wait_completion(cls.evaluation_id) == 'Done', 'create failed'
   AssertionError: create failed
   -------------------- >> begin captured logging << --------------------
   digits.webapp: INFO: Parse Folder (train/val) task started.
   digits.webapp: WARNING: Parse Folder (train/val) unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: INFO: Parse Folder (train/val) task completed.
   digits.webapp: INFO: Create DB (train) task started.
   digits.webapp: INFO: Create DB (val) task started.
   digits.webapp: WARNING: Create DB (train) unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: WARNING: Create DB (val) unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: DEBUG: 24 images written to database
   digits.webapp: INFO: Create DB (train) task completed.
   digits.webapp: DEBUG: 6 images written to database
   digits.webapp: INFO: Create DB (val) task completed.
   digits: INFO: Job complete.
   digits.webapp: INFO: Train Caffe Model task started.
   digits.webapp: DEBUG: Training 0.0% complete.
   digits.webapp: DEBUG: Snapshot saved.
   digits.webapp: INFO: Train Caffe Model task completed.
   digits: INFO: Job complete.
   digits.webapp: INFO: Compute performance on val_db task started.
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: Traceback (most recent call last):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 291, in <module>
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: grayscale=args['grayscale']):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 238, in compute_accuracy
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: input_image = PIL.Image.open(s)
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 2028, in open
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: raise IOError("cannot identify image file")
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: IOError: cannot identify image file
   digits.webapp: ERROR: Compute performance on val_db task failed with error code 1
   --------------------- >> end captured logging << ---------------------

-----------------------------------------------------------------------------
2) FAIL: digits.evaluation.images.classification.test_views.TestCaffeCreation.test_create_wait_delete

   Traceback (most recent call last):
    /usr/lib/python2.7/dist-packages/nose/case.py line 197 in runTest
      self.test(*self.arg)
    digits/evaluation/images/classification/test_views.py line 174 in test_create_wait_delete
      assert self.evaluation_wait_completion(job_id) == 'Done', 'create failed'
   AssertionError: create failed

   -------------------- >> begin captured stdout << ---------------------
   /evaluations/images/classification?job_id=20150915-093712-7c5c

   --------------------- >> end captured stdout << ----------------------
   -------------------- >> begin captured logging << --------------------
   digits.webapp: INFO: Compute performance on val_db task started.
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: Traceback (most recent call last):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 291, in <module>
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: grayscale=args['grayscale']):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 238, in compute_accuracy
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: input_image = PIL.Image.open(s)
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 2028, in open
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: raise IOError("cannot identify image file")
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: IOError: cannot identify image file
   digits.webapp: ERROR: Compute performance on val_db task failed with error code 1
   --------------------- >> end captured logging << ---------------------

-----------------------------------------------------------------------------
3) FAIL: digits.evaluation.images.classification.test_views.TestCaffeCreation.test_evaluate_snapshot_2

   Traceback (most recent call last):
    /usr/lib/python2.7/dist-packages/nose/case.py line 197 in runTest
      self.test(*self.arg)
    digits/evaluation/images/classification/test_views.py line 189 in test_evaluate_snapshot_2
      assert self.evaluation_wait_completion(job_id) == 'Done', 'evaluation create failed'
   AssertionError: evaluation create failed

   -------------------- >> begin captured stdout << ---------------------
   /evaluations/images/classification?job_id=20150915-093715-daf6

   --------------------- >> end captured stdout << ----------------------
   -------------------- >> begin captured logging << --------------------
   digits.webapp: INFO: Train Caffe Model task started.
   digits.webapp: DEBUG: Training 0.0% complete.
   digits.webapp: DEBUG: Snapshot saved.
   digits.webapp: DEBUG: Snapshot saved.
   digits.webapp: INFO: Train Caffe Model task completed.
   digits: INFO: Job complete.
   digits.webapp: INFO: Compute performance on val_db task started.
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: Traceback (most recent call last):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 291, in <module>
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: grayscale=args['grayscale']):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 238, in compute_accuracy
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: input_image = PIL.Image.open(s)
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 2028, in open
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: raise IOError("cannot identify image file")
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: IOError: cannot identify image file
   digits.webapp: ERROR: Compute performance on val_db task failed with error code 1
   --------------------- >> end captured logging << ---------------------


-----------------------------------------------------------------------------
523 tests run in 187.0 seconds. 
2 FAILED, 1 error, 2 skipped (518 tests passed)

groar · 2015-09-15T16:44:20Z

Ok thanks, that will probably help me

Epoch can be a float Remove debug statement Match style changes added in 3642aed and 7103d8b Add basic test for evaluations replaced joblib by pickle Check dependencies between EvaluationJob and parent ModelJob Only use GPU for inference when CUDA enabled Update documentation with EvaluationJob routes Added a related jobs section to jobs show views. Added unit tests for evaluations bugfix fixing docs changes config fix doc fixing tests adding evaluation tests test minor fixes perf added package fixing evaluation test compute seek fix accuracy last

lukeyeager · 2016-01-26T21:48:07Z

Closing as abandoned.

lukeyeager reviewed Jun 9, 2015
View reviewed changes

digits/evaluation/tasks/accuracy.py Outdated

Copy link
Copy Markdown

Member

lukeyeager Jun 9, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English variable names, please :)

groar force-pushed the perf branch from 5e86a31 to eaf8de5 Compare June 9, 2015 10:50

lukeyeager reviewed Jun 12, 2015
View reviewed changes

lukeyeager closed this Jun 16, 2015

lukeyeager reopened this Jun 16, 2015

lukeyeager closed this Jun 16, 2015

lukeyeager reopened this Jun 16, 2015

lukeyeager mentioned this pull request Jun 16, 2015

Build not triggered when merge commit pushed to pull request travis-ci/travis-ci#4102

Closed

groar force-pushed the perf branch from 84a74ad to 9c6fc49 Compare June 16, 2015 18:07

lukeyeager reviewed Jun 25, 2015
View reviewed changes

lukeyeager mentioned this pull request Aug 3, 2015

Don't show the accuracy of test set #190

Closed

lukeyeager reviewed Sep 8, 2015
View reviewed changes

gheinrich reviewed Sep 9, 2015
View reviewed changes

groar mentioned this pull request Sep 28, 2015

Dataset visualization #331

Merged

groar force-pushed the perf branch from 8cec9d9 to f316db2 Compare November 18, 2015 14:03

tests

f316db2

lukeyeager closed this Jan 26, 2016

TimZaman mentioned this pull request Apr 4, 2016

'Related Jobs' window feature #667

Closed

Conversation

groar commented Jun 8, 2015

Accuracy / recall curve

Confusion matrix

Related jobs

Uh oh!

jcohenpersonal commented Jun 8, 2015

Uh oh!

flx42 commented Jun 8, 2015

Uh oh!

lukeyeager commented Jun 8, 2015

Uh oh!

lukeyeager commented Jun 8, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

groar commented Jun 9, 2015

Uh oh!

groar commented Jun 9, 2015

Uh oh!

lukeyeager commented Jun 9, 2015

Uh oh!

groar commented Jun 11, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukeyeager commented Jun 12, 2015

Uh oh!

lukeyeager commented Jun 12, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukeyeager commented Jun 12, 2015

Uh oh!

groar commented Jun 13, 2015

Uh oh!

groar commented Jun 15, 2015

Uh oh!

lukeyeager commented Jun 15, 2015

Uh oh!

groar commented Jun 16, 2015

Uh oh!

lukeyeager commented Jun 16, 2015

Uh oh!

groar commented Jun 16, 2015

Uh oh!

lukeyeager commented Jun 16, 2015

Uh oh!

groar commented Jun 18, 2015

Uh oh!

lukeyeager commented Jun 18, 2015

Uh oh!

lukeyeager commented Jun 25, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukeyeager commented Jun 25, 2015

Uh oh!

jmozah commented Jul 25, 2015

Uh oh!

groar commented Sep 8, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukeyeager commented Sep 8, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

groar commented Sep 9, 2015

Uh oh!

lukeyeager commented Sep 15, 2015

Uh oh!

groar commented Sep 15, 2015

Uh oh!

lukeyeager commented Jan 26, 2016

Uh oh!