Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it optional to subtract mean pixel or mean image #321

Merged
merged 1 commit into from
Oct 30, 2015

Conversation

jmancewicz
Copy link
Contributor

Closes #169

t.set_mean('data', mean_pixel)
elif self.use_mean == 'image':
mean_image = self.get_mean_image(mean_file)
t.set_mean('data', mean_image)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to crop the image here.

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/lyeager/digits/digits/model/images/classification/views.py", line 276, in image_classification_model_classify_one
    predictions, visualizations = job.train_task().infer_one(image, snapshot_epoch=epoch, layers=layers)
  File "/home/lyeager/digits/digits/model/tasks/caffe_train.py", line 972, in infer_one
    layers=layers,
  File "/home/lyeager/digits/digits/model/tasks/caffe_train.py", line 1002, in classify_one
    preprocessed = self.get_transformer().preprocess(
  File "/home/lyeager/digits/digits/model/tasks/caffe_train.py", line 1473, in get_transformer
    t.set_mean('data', mean_image)
  File "/home/lyeager/caffe/nv-0.13.2/python/caffe/io.py", line 255, in set_mean
    raise ValueError('Mean shape incompatible with input shape.')
ValueError: Mean shape incompatible with input shape.

@jmancewicz
Copy link
Contributor Author

get_mean_image does an imresize. I'll look into what's going wrong.

@jmancewicz
Copy link
Contributor Author

@gheinrich, please look at the changes to torch_train.py and see if they are correct.

@gheinrich
Copy link
Contributor

@jmancewicz the parameter mappings look OK to me. I have trained LeNet on MNIST using all three methods (subtractMeanImage, subtractMeanPixel, none) and they all converged in approximately the same time. Maybe MNIST is too trivial to actually observe the effect of mean subtraction... anyway, the Torch part looks OK to me thanks!

@lukeyeager
Copy link
Member

You're getting a bunch of test failures.

https://travis-ci.org/NVIDIA/DIGITS/builds/83967654
https://s3.amazonaws.com/archive.travis-ci.org/jobs/83967655/log.txt

There are a bunch of weird PIL messages too - any idea what that's about?

PIL.PngImagePlugin: DEBUG: STREAM IHDR 16 13

@gheinrich
Copy link
Contributor

@jmancewicz the support for generic inference with Torch is now on the master branch. There might be some rebasing to do for this PR. Let me know when you have rebased as I now have a network for which mean subtraction is critical so I will be able to test how the network behaves depending on the mean subtraction method.

@lukeyeager
Copy link
Member

@jmancewicz can you make sure that #365 is fixed in this PR too?

@jmancewicz
Copy link
Contributor Author

To handle #365, I'm deleting setting mean_pixel rather than appending, and deleting mean_file when using mean_pixel. Using mean mean_file will similarly del mean_pixel and set mean_file.

@@ -136,6 +135,8 @@ def task_arguments(self, resources, env):
'--save=%s' % self.job_dir,
'--snapshotPrefix=%s' % self.snapshot_prefix,
'--snapshotInterval=%s' % self.snapshot_interval,
'--mean=%s' % self.dataset.path(constants.MEAN_FILE_IMAGE),
'--labels=%s' % self.dataset.path(self.dataset.labels_file),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datasets don't necessarily have a labels_file. That's why you're getting these TravisCI errors for generic Torch tests:

======================================================================
FAIL: digits.model.images.generic.test_views.TestTorchCreation.test_create_wait_delete
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/NVIDIA/DIGITS/digits/model/images/generic/test_views.py", line 245, in test_create_wait_delete
    assert self.model_wait_completion(job_id) == 'Done', 'create failed'
AssertionError: create failed
-------------------- >> begin captured logging << --------------------
digits: ERROR: AttributeError: 'GenericImageDatasetJob' object has no attribute 'labels_file'
--------------------- >> end captured logging << ---------------------

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you adding labels at all? It's already being done below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two lines were moved to lines 150-151
line 192 addresses the generic dataset case (where we need to create an image file out of the .protobuf from the dataset)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 174:

if self.use_mean:

Maybe this should be:

if self.use_mean != 'none':

@lukeyeager
Copy link
Member

Looks like I added a bug way back in #41. At line 1427, this:

t = caffe.io.Transformer(
    inputs = {'data':  data_shape}
    )

needs to change to this.

t = caffe.io.Transformer(
    inputs = {'data':  tuple(data_shape)}
    )

It was originally me that added the overly strict check to Caffe with BVLC/caffe#2031. Now I wish I would have done it in such a way that it worked with either lists or tuples.

We didn't notice the issue in DIGITS because we haven't done mean file subtraction for so long.

def set_mean_file(self, layer, mean_file):
# remove any values that may already be in the network
layer.transform_param.ClearField('mean_value')
layer.transform_param.mean_file = mean_file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two functions do the correct thing from a functional point of view (at least I think they do - haven't tested it), but I'd like to print some WARNINGs to the log if any of these ClearField() calls are actually overwriting any information.

jmancewicz added a commit to jmancewicz/DIGITS that referenced this pull request Oct 17, 2015
@jmancewicz
Copy link
Contributor Author

@lukeyeager I've added tests for mean image and pixel for training and inference. Also dependency on the deploy.prototxt was avoidable, because I didn't need to imgresize the mean file at that time.

@@ -317,6 +317,20 @@ def check_select_gpus(self, gpu_list):
job_id = self.create_model(select_gpus_list=','.join(gpu_list), batch_size=len(gpu_list))
assert self.model_wait_completion(job_id) == 'Done', 'create failed'

def test_mean_image(self):
options = {
'use_mean': 'image',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if my understanding is correct the default value for use_mean is 'image'. Therefore, is this test not already covered by the default case?
Either way it is worth considering adding a test case for use_mean='none'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way it is worth considering adding a test case for use_mean='none'

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any objection to leaving the image test in there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None from me, I think it's good to explicitly test all three cases

@gheinrich
Copy link
Contributor

My auto-encoder network in Torch is quite sensitive to initialization so I thought this was a good candidate to try the various mean subtraction methods on. Along the way I found another issue in the mean pixel subtraction code. Besides I wasn't entirely happy about the Lua wrapper command-line flags (some were mutually exclusive) so I pushed another commit gheinrich@559f6a5.
So using mean image subtraction:
mean-image
Using mean pixel subtraction (note the loss is decreasing less rapidly):
mean-pixel
And using no mean subtraction: the network diverges:
mean-none

loss = nn.ClassNLLCriterion(),
trainBatchSize = 64,
validationBatchSize = 100,
loss = nn.ClassNLLCriterion()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to keep the version from master here too

@jmancewicz
Copy link
Contributor Author

@gheinrich, I have made the changes from your comments. Now, it's failing on digits.model.images.classification.test_views.TestTorchCreation.test_classify_one_mean_pixel 8 out of 10 times with AssertionError: image misclassified.

The command that is being issued is


So --subtractMean=pixel is being passed. If it's expected that that should fail, I can ignore the classification error.

@@ -216,9 +212,9 @@ function DBSource:new (backend, db_path, labels_db_path, mirror, crop, croplen,
self.dbs = {}
self.total = 0
for line in file:lines() do
local fileName = paths.concat(db_path, paths.basename(line))
db_path = paths.concat(db_path, paths.basename(line))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have overwritten the master version here and one line 217

@gheinrich
Copy link
Contributor

Joe, if you wish I can try an integrate my patch again over 84653db?

@jmancewicz
Copy link
Contributor Author

@gheinrich, if this doesn't look correct, then maybe it would be best for you to integrate your patch. At some point my branch got out of sync, and it's become difficult to know which changes to use from master.

@gheinrich
Copy link
Contributor

@jmancewicz I have pushed this change.
The Travis test would have passed if it wasn't for this strange (unrelated?) error (another code -11!) on:

ERROR: test suite for <class 'digits.model.images.classification.test_views.TestCaffeCreatedCropInForm'>
...
digits.webapp: ERROR: Create DB (val) task failed with error code -11

The C standard error code -11 means EAGAIN: try again...

if self.trained_on_cpu:
args.append('--type=float')

# input image has been resized to network input dimensions by caller
args.append('--crop=no')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that --crop=no has disappeared but it isn't an issue because that flag defaults to no

@gheinrich
Copy link
Contributor

The changes look OK to me.

jmancewicz added a commit that referenced this pull request Oct 30, 2015
Make it optional to subtract mean pixel or mean image
@jmancewicz jmancewicz merged commit dd9927b into NVIDIA:master Oct 30, 2015
@jmancewicz jmancewicz deleted the mean branch November 4, 2015 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants