Skip to content
This repository was archived by the owner on Jan 7, 2025. It is now read-only.

Accuracy & confusion matrix#137

Closed
groar wants to merge 2 commits intoNVIDIA:masterfrom
Deepomatic:perf
Closed

Accuracy & confusion matrix#137
groar wants to merge 2 commits intoNVIDIA:masterfrom
Deepomatic:perf

Conversation

@groar
Copy link
Copy Markdown
Contributor

@groar groar commented Jun 8, 2015

See #17

Adds a new kind of job for performance evaluation of trained classifiers. It is now possible to visualize :

  • accuracy / recall curve
  • confusion matrix

Accuracy and the confusion matrix are computed against a chosen snapshot of a training task, and against both the validation set and testing set (if it exists). An "evaluate performance" button has been added on the training view. This is currently the only way to run an evaluation job. The results are stored in the job directory in the form of two pickle files.

button

Accuracy / recall curve

accuracy recall curve

Confusion matrix

I chose a very simple representation of the confusion matrix (not in the form of a matrix !), because it is more adapted to datasets with lots of classes. For each class, the top 10 most represented classes are displayed, with their respective %.

confusion matrix

Related jobs

I added a "Related jobs" section on each job show view. It displays the jobs which depends on the current job. For example, models trained on a specific dataset, evaluations ran on a specific model.

Related jobs

Let me know what you think, critiques and comments are more than welcome.

@jcohenpersonal
Copy link
Copy Markdown

Cool!

It would be nice to have the option to view the confusion matrix in either bullet-point format, or as a color-coded table (color of cell to indicate percentages, like white = 0, green = 100, with a gradient in between).

@flx42
Copy link
Copy Markdown
Member

flx42 commented Jun 8, 2015

Please squash your patches.

@lukeyeager
Copy link
Copy Markdown
Member

Thanks for the contribution, @groar! I hope we can get this merged soon.

@lukeyeager
Copy link
Copy Markdown
Member

When I try to evaluate AlexNet trained on ImageNet, I get this error:

Traceback (most recent call last):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 223, in <module>
    args['resize_mode']):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 162, in compute_accuracy
    probas[i] = net.predict([input_image], oversample=False)
File "/home/lyeager/digits/tools/compute_accuracy.py", line 79, in predict
    self.image_dims[0], self.image_dims[1], inputs[0].shape[2]),
IndexError: tuple index out of range

When I try to evaluate LeNet trained on MNIST, I get this error (probably something to do with grayscale images?):

Traceback (most recent call last):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 223, in <module>
    args['resize_mode']):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 147, in compute_accuracy
    raw_scale=255)
File "/home/lyeager/digits/tools/compute_accuracy.py", line 56, in __init__
    self.transformer.set_channel_swap(in_, channel_swap)
File "/home/lyeager/caffe/python/caffe/io.py", line 212, in set_channel_swap
    raise Exception('Channel swap needs to have the same number of '
Exception: Channel swap needs to have the same number of dimensions as the input channels.

What dataset/network are you using? I can't get anything to work.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English variable names, please :)

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Jun 9, 2015

I'm currently fixing the grayscale problem, but I can't manage to reproduce your problem with Alexnet. I tested it with alexnet, googlenet and vgg, with no problem (I used some standard color datasets).

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Jun 9, 2015

@lukeyeager I fixed the grayscale bug and tested it on MNIST / LeNet. My changes may also have fixed the problem you had with AlexNet. On my side, it works with every standard network. Let me know if you still have those issues.

@lukeyeager
Copy link
Copy Markdown
Member

Thanks for the fixes!

LeNet on MNIST works for me now. But I'm still getting the same error with AlexNet:

Traceback (most recent call last):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 233, in <module>
    grayscale=args['grayscale']):
File "/home/lyeager/digits/tools/compute_accuracy.py", line 172, in compute_accuracy
    probas[i] = net.predict([input_image], oversample=False)
File "/home/lyeager/digits/tools/compute_accuracy.py", line 79, in predict
    self.image_dims[0], self.image_dims[1], inputs[0].shape[2]),
IndexError: tuple index out of range

It looks like an issue with cropping.

len(inputs):      1
inputs[0].shape:  (256, 256, 3)
image_dims:       [227 227]

I think you meant to reshape the input, like I do here, rather than setting the input to zeros.

I'm curious why this works for you, though. The standard AlexNet network crops to 227.

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Jun 11, 2015

Ok, this is weird, because I really can't reproduce the bug. That part of the code in fact comes from Caffe, (there is a small modification but this is pretty much the original Classifier class) so it should not be problematic.

I don't see why this should raise an index error exception...

np.zeros((1, 227, 227, 3), dtype=np.float32) 

Which dataset are you using ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't you passing the prebuilt database instead of the textfile? Those images are already loaded and resized. Then all you would need to know is the crop size.

@lukeyeager
Copy link
Copy Markdown
Member

I don't really like having Evaluations as Jobs, it feels like they should rather be Tasks instead of Jobs. But in the interest of getting this new functionality merged, I'm ok with keeping them as a new Job type for now.

But displaying them at the bottom of the homepage is a problem. Nobody is going to be able to find them. Can we have the list of EvaluationJobs shown somewhere on the model page?

@lukeyeager
Copy link
Copy Markdown
Member

Also, if you delete a ModelJob and there are dependent EvaluationJobs, you run into issues. There should be some sort of a check for EvaluationJobs dependent on a ModelJob like we have for DatasetJobs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Pickle instead of joblib to avoid adding a new dependency? See usage here and here.

@lukeyeager
Copy link
Copy Markdown
Member

As I'm trying to track down my issue with the IndexError, I added a basic test (see https://github.com/Deepomatic/DIGITS/pull/2) for EvaluateJobs which works on my machine, but which fails for Travis. I'm not sure why it fails, but can you try the test on your machine? Also, some tests for tools/compute_accuracy.py would be awesome.

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Jun 13, 2015

Ok for pickle instead of joblib, and I'm going to write some tests as well ! I will also take care of the dependency between jobs.

I don't really know about the evaluation being a job. At first I designed it as a task, and then changed my mind for some reason (I was thinking about future performance evaluations tasks that involve more than one model, in which case a model task is not so adapted).

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Jun 15, 2015

Concerning the test, it's the same for me, the test succeeds on my machine although it doesn't for Travis...

@lukeyeager I was thinking about the way to display the related evaluationJobs on the model page. In general, it would be nice to be able to see all the dependencies on a job page. For example, I'd love to see directly in the Dataset view a summary of all Models I've trained on it. What do you think about displaying a "Related jobs" section under the "Job Status" cell ? I could do that instead of just adding a ad-hoc section in the Model view.

@lukeyeager
Copy link
Copy Markdown
Member

Ok for pickle instead of joblib

Great, thanks!

I will also take care of the dependency between jobs.

Verified. You could test for this with a test like this one.

I was thinking about future performance evaluations tasks that involve more than one model, in which case a model task is not so adapted.

That's an interesting thought. Just stick with the EvaluationJob for now.

Concerning the test, it's the same for me, the test succeeds on my machine although it doesn't for Travis...

Well, now the test are failing for other reasons. Fixes here: https://github.com/Deepomatic/DIGITS/pull/4. It might be a good idea to turn on Travis for your fork so you can see which tests fail before updating this pull request.

In general, it would be nice to be able to see all the dependencies on a job page. For example, I'd love to see directly in the Dataset view a summary of all Models I've trained on it. What do you think about displaying a "Related jobs" section under the "Job Status" cell ? I could do that instead of just adding a ad-hoc section in the Model view.

Yep, sounds great!

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Jun 16, 2015

I've added:

  • the "related jobs" section
  • some tests on evaluations

When I have some time, I will probably add some tests for compute_accuracy.py, but I think pretty much everything else works. I have still not been able to reproduce the "index out of range" error though.

@lukeyeager
Copy link
Copy Markdown
Member

Toggling PR status to retrigger Travis build ...

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Jun 16, 2015

Maybe I just need to merge master into perf to remove conflicts ?

@lukeyeager
Copy link
Copy Markdown
Member

Maybe I just need to merge master into perf to remove conflicts ?

Yeah, I think that's correct.
travis-ci/travis-ci#4102 (comment)

So do this:

git checkout perf
# Assuming NVIDIA/DIGITS is called "upstream"
git fetch upstream
# You can squash down to fewer commits while you're at it
git rebase -i upstream/master
git push

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Jun 18, 2015

There remains some problem with the tests, but I don't know how to fix them (the generate_docs.py script yields an error on my side..).

@lukeyeager
Copy link
Copy Markdown
Member

the generate_docs.py script yields an error on my side

Did you install Flask-Autodoc with pip? I had to create a custom version to get what I needed. See this comment.

$ pip install -r requirements_test.txt
$ ./scripts/generate_docs.py

On my side, I'm still trying to debug why your code is giving me that IndexError, but the use of the "classify" code from Caffe adds some unnecessary complexity. I just wrote a classification example that shows how you can do classification a little more simply.

@lukeyeager
Copy link
Copy Markdown
Member

What do you think about displaying a "Related jobs" section under the "Job Status" cell?

That looks great, nice work!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what's supposed to be in img_matrix, but this isn't right. For my dataset with 20 classes, num_classes is getting set to 16. Are you assuming that the images are sorted by class?

@lukeyeager
Copy link
Copy Markdown
Member

You moved the listing of evaluations to the models#show page, and that's great, but you still left the changes to home.html. Also, can you add a "Delete All" button above the listing of dependent jobs?

@jmozah
Copy link
Copy Markdown

jmozah commented Jul 25, 2015

+1 for this feature

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Sep 8, 2015

I've been very busy, but I'm back trying to merge this pull request ! I think I've fixed everything that was not working, and I've updated my branch to keep up with the current master. I need to port the old tests, refactor some code, and it should be ok.

Oh, and now when you delete a model, it deletes all the dependent evaluation jobs. I think this is the expected behavior.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to make any changes to this file. Merge error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, gonna fix that.

@lukeyeager
Copy link
Copy Markdown
Member

Thanks for the rebase work @groar!

@gheinrich and @jmancewicz, will y'all take a look at this? We need some sort of a framework for model "evaluations" and this is not a bad way to do it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the core DIGITS code should be framework independent and communicate with the DL framework using the Framework class interface (see https://github.com/NVIDIA/DIGITS/blob/master/digits/frameworks/framework.py).
Perhaps you can do something like:

fw_id = job.model_job.train_task().get_framework_id()
fw = frameworks.get_framework_by_id(fw_id)
job.tasks.append(fw.create_accuracy_task(...))

You might also want to add a function in the framework interface that tells whether the underlying framework supports this type of tasks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great for the info, I had in mind that something like that should be needed but did not look at it. I'll do that.

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Sep 9, 2015

The tests for EvaluationJob are still not passing on Travis, but are when I launch them locally. I thought it was a missing package, but it doesn't seem so. If someone can tell me if it works on another environment..

@lukeyeager
Copy link
Copy Markdown
Member

If someone can tell me if it works on another environment..

The tests do not pass for me.

1) ERROR: test suite for <class 'digits.evaluation.images.classification.test_views.TestCaffeCreated'>

   Traceback (most recent call last):
    /usr/lib/python2.7/dist-packages/nose/suite.py line 208 in run
      self.setUp()
    /usr/lib/python2.7/dist-packages/nose/suite.py line 291 in setUp
      self.setupContext(ancestor)
    /usr/lib/python2.7/dist-packages/nose/suite.py line 314 in setupContext
      try_run(context, names)
    /usr/lib/python2.7/dist-packages/nose/util.py line 471 in try_run
      return func()
    digits/evaluation/images/classification/test_views.py line 147 in setUpClass
      assert cls.evaluation_wait_completion(cls.evaluation_id) == 'Done', 'create failed'
   AssertionError: create failed
   -------------------- >> begin captured logging << --------------------
   digits.webapp: INFO: Parse Folder (train/val) task started.
   digits.webapp: WARNING: Parse Folder (train/val) unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: INFO: Parse Folder (train/val) task completed.
   digits.webapp: INFO: Create DB (train) task started.
   digits.webapp: INFO: Create DB (val) task started.
   digits.webapp: WARNING: Create DB (train) unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: WARNING: Create DB (val) unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: DEBUG: 24 images written to database
   digits.webapp: INFO: Create DB (train) task completed.
   digits.webapp: DEBUG: 6 images written to database
   digits.webapp: INFO: Create DB (val) task completed.
   digits: INFO: Job complete.
   digits.webapp: INFO: Train Caffe Model task started.
   digits.webapp: DEBUG: Training 0.0% complete.
   digits.webapp: DEBUG: Snapshot saved.
   digits.webapp: INFO: Train Caffe Model task completed.
   digits: INFO: Job complete.
   digits.webapp: INFO: Compute performance on val_db task started.
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: Traceback (most recent call last):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 291, in <module>
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: grayscale=args['grayscale']):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 238, in compute_accuracy
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: input_image = PIL.Image.open(s)
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 2028, in open
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: raise IOError("cannot identify image file")
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: IOError: cannot identify image file
   digits.webapp: ERROR: Compute performance on val_db task failed with error code 1
   --------------------- >> end captured logging << ---------------------

-----------------------------------------------------------------------------
2) FAIL: digits.evaluation.images.classification.test_views.TestCaffeCreation.test_create_wait_delete

   Traceback (most recent call last):
    /usr/lib/python2.7/dist-packages/nose/case.py line 197 in runTest
      self.test(*self.arg)
    digits/evaluation/images/classification/test_views.py line 174 in test_create_wait_delete
      assert self.evaluation_wait_completion(job_id) == 'Done', 'create failed'
   AssertionError: create failed

   -------------------- >> begin captured stdout << ---------------------
   /evaluations/images/classification?job_id=20150915-093712-7c5c

   --------------------- >> end captured stdout << ----------------------
   -------------------- >> begin captured logging << --------------------
   digits.webapp: INFO: Compute performance on val_db task started.
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: Traceback (most recent call last):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 291, in <module>
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: grayscale=args['grayscale']):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 238, in compute_accuracy
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: input_image = PIL.Image.open(s)
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 2028, in open
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: raise IOError("cannot identify image file")
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: IOError: cannot identify image file
   digits.webapp: ERROR: Compute performance on val_db task failed with error code 1
   --------------------- >> end captured logging << ---------------------

-----------------------------------------------------------------------------
3) FAIL: digits.evaluation.images.classification.test_views.TestCaffeCreation.test_evaluate_snapshot_2

   Traceback (most recent call last):
    /usr/lib/python2.7/dist-packages/nose/case.py line 197 in runTest
      self.test(*self.arg)
    digits/evaluation/images/classification/test_views.py line 189 in test_evaluate_snapshot_2
      assert self.evaluation_wait_completion(job_id) == 'Done', 'evaluation create failed'
   AssertionError: evaluation create failed

   -------------------- >> begin captured stdout << ---------------------
   /evaluations/images/classification?job_id=20150915-093715-daf6

   --------------------- >> end captured stdout << ----------------------
   -------------------- >> begin captured logging << --------------------
   digits.webapp: INFO: Train Caffe Model task started.
   digits.webapp: DEBUG: Training 0.0% complete.
   digits.webapp: DEBUG: Snapshot saved.
   digits.webapp: DEBUG: Snapshot saved.
   digits.webapp: INFO: Train Caffe Model task completed.
   digits: INFO: Job complete.
   digits.webapp: INFO: Compute performance on val_db task started.
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: WARNING: log_file config option not found - no log file is being saved
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: Traceback (most recent call last):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 291, in <module>
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: grayscale=args['grayscale']):
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/home/lyeager/digits/tools/compute_accuracy.py", line 238, in compute_accuracy
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: input_image = PIL.Image.open(s)
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 2028, in open
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: raise IOError("cannot identify image file")
   digits.webapp: WARNING: Compute performance on val_db unrecognized output: IOError: cannot identify image file
   digits.webapp: ERROR: Compute performance on val_db task failed with error code 1
   --------------------- >> end captured logging << ---------------------


-----------------------------------------------------------------------------
523 tests run in 187.0 seconds. 
2 FAILED, 1 error, 2 skipped (518 tests passed)

@groar
Copy link
Copy Markdown
Contributor Author

groar commented Sep 15, 2015

Ok thanks, that will probably help me

@groar groar mentioned this pull request Sep 28, 2015
Epoch can be a float

Remove debug statement

Match style changes added in 3642aed and 7103d8b

Add basic test for evaluations

replaced joblib by pickle

Check dependencies between EvaluationJob and parent ModelJob

Only use GPU for inference when CUDA enabled

Update documentation with EvaluationJob routes

Added a related jobs section to jobs show views.

Added unit tests for evaluations

bugfix

fixing docs

changes

config fix

doc

fixing tests

adding evaluation tests

test

minor fixes

perf

added package

fixing evaluation test

compute seek fix

accuracy last
@lukeyeager
Copy link
Copy Markdown
Member

Closing as abandoned.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants