-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it optional to subtract mean pixel or mean image #321
Conversation
t.set_mean('data', mean_pixel) | ||
elif self.use_mean == 'image': | ||
mean_image = self.get_mean_image(mean_file) | ||
t.set_mean('data', mean_image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to crop the image here.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/lyeager/digits/digits/model/images/classification/views.py", line 276, in image_classification_model_classify_one
predictions, visualizations = job.train_task().infer_one(image, snapshot_epoch=epoch, layers=layers)
File "/home/lyeager/digits/digits/model/tasks/caffe_train.py", line 972, in infer_one
layers=layers,
File "/home/lyeager/digits/digits/model/tasks/caffe_train.py", line 1002, in classify_one
preprocessed = self.get_transformer().preprocess(
File "/home/lyeager/digits/digits/model/tasks/caffe_train.py", line 1473, in get_transformer
t.set_mean('data', mean_image)
File "/home/lyeager/caffe/nv-0.13.2/python/caffe/io.py", line 255, in set_mean
raise ValueError('Mean shape incompatible with input shape.')
ValueError: Mean shape incompatible with input shape.
get_mean_image does an imresize. I'll look into what's going wrong. |
@gheinrich, please look at the changes to torch_train.py and see if they are correct. |
@jmancewicz the parameter mappings look OK to me. I have trained LeNet on MNIST using all three methods (subtractMeanImage, subtractMeanPixel, none) and they all converged in approximately the same time. Maybe MNIST is too trivial to actually observe the effect of mean subtraction... anyway, the Torch part looks OK to me thanks! |
You're getting a bunch of test failures. https://travis-ci.org/NVIDIA/DIGITS/builds/83967654 There are a bunch of weird PIL messages too - any idea what that's about?
|
@jmancewicz the support for generic inference with Torch is now on the master branch. There might be some rebasing to do for this PR. Let me know when you have rebased as I now have a network for which mean subtraction is critical so I will be able to test how the network behaves depending on the mean subtraction method. |
@jmancewicz can you make sure that #365 is fixed in this PR too? |
To handle #365, I'm deleting setting mean_pixel rather than appending, and deleting mean_file when using mean_pixel. Using mean mean_file will similarly del mean_pixel and set mean_file. |
@@ -136,6 +135,8 @@ def task_arguments(self, resources, env): | |||
'--save=%s' % self.job_dir, | |||
'--snapshotPrefix=%s' % self.snapshot_prefix, | |||
'--snapshotInterval=%s' % self.snapshot_interval, | |||
'--mean=%s' % self.dataset.path(constants.MEAN_FILE_IMAGE), | |||
'--labels=%s' % self.dataset.path(self.dataset.labels_file), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datasets don't necessarily have a labels_file
. That's why you're getting these TravisCI errors for generic Torch tests:
======================================================================
FAIL: digits.model.images.generic.test_views.TestTorchCreation.test_create_wait_delete
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/miniconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/travis/build/NVIDIA/DIGITS/digits/model/images/generic/test_views.py", line 245, in test_create_wait_delete
assert self.model_wait_completion(job_id) == 'Done', 'create failed'
AssertionError: create failed
-------------------- >> begin captured logging << --------------------
digits: ERROR: AttributeError: 'GenericImageDatasetJob' object has no attribute 'labels_file'
--------------------- >> end captured logging << ---------------------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you adding labels at all? It's already being done below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two lines were moved to lines 150-151
line 192 addresses the generic dataset case (where we need to create an image file out of the .protobuf from the dataset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 174:
if self.use_mean:
Maybe this should be:
if self.use_mean != 'none':
Looks like I added a bug way back in #41. At line 1427, this: t = caffe.io.Transformer(
inputs = {'data': data_shape}
) needs to change to this. t = caffe.io.Transformer(
inputs = {'data': tuple(data_shape)}
) It was originally me that added the overly strict check to Caffe with BVLC/caffe#2031. Now I wish I would have done it in such a way that it worked with either We didn't notice the issue in DIGITS because we haven't done mean file subtraction for so long. |
def set_mean_file(self, layer, mean_file): | ||
# remove any values that may already be in the network | ||
layer.transform_param.ClearField('mean_value') | ||
layer.transform_param.mean_file = mean_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two functions do the correct thing from a functional point of view (at least I think they do - haven't tested it), but I'd like to print some WARNINGs to the log if any of these ClearField()
calls are actually overwriting any information.
295df25
to
0315054
Compare
@lukeyeager I've added tests for mean image and pixel for training and inference. Also dependency on the deploy.prototxt was avoidable, because I didn't need to imgresize the mean file at that time. |
@@ -317,6 +317,20 @@ def check_select_gpus(self, gpu_list): | |||
job_id = self.create_model(select_gpus_list=','.join(gpu_list), batch_size=len(gpu_list)) | |||
assert self.model_wait_completion(job_id) == 'Done', 'create failed' | |||
|
|||
def test_mean_image(self): | |||
options = { | |||
'use_mean': 'image', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if my understanding is correct the default value for use_mean
is 'image'
. Therefore, is this test not already covered by the default case?
Either way it is worth considering adding a test case for use_mean='none'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way it is worth considering adding a test case for use_mean='none'
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any objection to leaving the image test in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None from me, I think it's good to explicitly test all three cases
My auto-encoder network in Torch is quite sensitive to initialization so I thought this was a good candidate to try the various mean subtraction methods on. Along the way I found another issue in the mean pixel subtraction code. Besides I wasn't entirely happy about the Lua wrapper command-line flags (some were mutually exclusive) so I pushed another commit gheinrich@559f6a5. |
loss = nn.ClassNLLCriterion(), | ||
trainBatchSize = 64, | ||
validationBatchSize = 100, | ||
loss = nn.ClassNLLCriterion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to keep the version from master here too
@gheinrich, I have made the changes from your comments. Now, it's failing on The command that is being issued is
So
|
@@ -216,9 +212,9 @@ function DBSource:new (backend, db_path, labels_db_path, mirror, crop, croplen, | |||
self.dbs = {} | |||
self.total = 0 | |||
for line in file:lines() do | |||
local fileName = paths.concat(db_path, paths.basename(line)) | |||
db_path = paths.concat(db_path, paths.basename(line)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have overwritten the master version here and one line 217
Joe, if you wish I can try an integrate my patch again over 84653db? |
@gheinrich, if this doesn't look correct, then maybe it would be best for you to integrate your patch. At some point my branch got out of sync, and it's become difficult to know which changes to use from master. |
@jmancewicz I have pushed this change.
The C standard error |
if self.trained_on_cpu: | ||
args.append('--type=float') | ||
|
||
# input image has been resized to network input dimensions by caller | ||
args.append('--crop=no') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that --crop=no
has disappeared but it isn't an issue because that flag defaults to no
The changes look OK to me. |
Make it optional to subtract mean pixel or mean image
Closes #169