preprocessing? #5

Closed
ry opened this Issue Feb 11, 2016 · 6 comments

Comments

Projects
None yet
5 participants
@ry

ry commented Feb 11, 2016

Hi @KaimingHe ,
I'm wondering about the exact preprocessing involved. The paper just says you subtract the test set mean? Do you switch RGB to BGR like in VGG nets? I assume the input is [0, 255]? Do you use the same means as VGG?

Experimentation seems to indicate the same preprocessing as used in VGG works?

def preprocess(img):
    VGG_MEAN = [103.939, 116.779, 123.68]
    out = np.copy(img) * 255
    out = out[:, :, [2,1,0]] # swap channel from RGB to BGR
    out[:,:,0] -= VGG_MEAN[0]
    out[:,:,1] -= VGG_MEAN[1]
    out[:,:,2] -= VGG_MEAN[2]
    return out
@KaimingHe

This comment has been minimized.

Show comment
Hide comment
@KaimingHe

KaimingHe Feb 12, 2016

Owner

Hi @ry,
The order is BGR. The mean is a per-pixel mean image (224x224x3) in the download links, named as "ResNet_mean.binaryproto" in caffe's format. For a 3-channel mean you could average along the spatial dimensions from this per-pixel mean image.

Owner

KaimingHe commented Feb 12, 2016

Hi @ry,
The order is BGR. The mean is a per-pixel mean image (224x224x3) in the download links, named as "ResNet_mean.binaryproto" in caffe's format. For a 3-channel mean you could average along the spatial dimensions from this per-pixel mean image.

@ShaoqingRen

This comment has been minimized.

Show comment
Hide comment
@ShaoqingRen

ShaoqingRen Feb 12, 2016

Collaborator

@ry,
"ResNet_mean.binaryproto" is a (224 * 224 * 3, BGR) mean file.
Our preprocessing is Crop(img) - mean. Both img and mean are in ([0, 255]).

Now caffe's implemenation is Crop (img - mean) in data_transformer.cpp. With simple modification, the test phase can be done with Caffe's code.

Collaborator

ShaoqingRen commented Feb 12, 2016

@ry,
"ResNet_mean.binaryproto" is a (224 * 224 * 3, BGR) mean file.
Our preprocessing is Crop(img) - mean. Both img and mean are in ([0, 255]).

Now caffe's implemenation is Crop (img - mean) in data_transformer.cpp. With simple modification, the test phase can be done with Caffe's code.

@ry

This comment has been minimized.

Show comment
Hide comment
@ry

ry Feb 12, 2016

@ShaoqingRen @KaimingHe Thank you.
This is a little off topic, but why do you use per-pixel means instead of per-channel mean? The variance within the channels of ResNet_mean.binaryproto is small.

ry commented Feb 12, 2016

@ShaoqingRen @KaimingHe Thank you.
This is a little off topic, but why do you use per-pixel means instead of per-channel mean? The variance within the channels of ResNet_mean.binaryproto is small.

@KaimingHe

This comment has been minimized.

Show comment
Hide comment
@KaimingHe

KaimingHe Feb 13, 2016

Owner

@ry We do so just for historical reasons - the original AlexNet paper did this. For these ResNet models the per-channel mean is sufficient and only has marginal numerical differences.

Owner

KaimingHe commented Feb 13, 2016

@ry We do so just for historical reasons - the original AlexNet paper did this. For these ResNet models the per-channel mean is sufficient and only has marginal numerical differences.

@taey16

This comment has been minimized.

Show comment
Hide comment
@taey16

taey16 Feb 23, 2016

@KaimingHe, @ShaoqingRen
I also tried to reproduce your great results on ILSVRC2012 validation set.

https://gist.github.com/taey16/04f019dac78deada5e21

MODEL_ORIGINAL_INPUT_SIZE = 256, 256
MODEL_INPUT_SIZE = 224, 224
MODEL_MEAN_FILE = '/storage/ImageNet/ILSVRC2012/model/resnet/ResNet_caffe_models/ResNet_mean.binaryproto'
blob = caffe.proto.caffe_pb2.BlobProto()
data = open(MODEL_MEAN_FILE, 'rb').read()
blob.ParseFromString(data)
MODEL_MEAN_VALUE = np.squeeze(np.array( caffe.io.blobproto_to_array(blob) ))

MODEL_DEPLOY_FILE =
'/storage/ImageNet/ILSVRC2012/model/resnet/ResNet_caffe_models/ResNet-152-deploy.prototxt'
MODEL_WEIGHT_FILE =
'/storage/ImageNet/ILSVRC2012/model/resnet/ResNet_caffe_models/ResNet-152-model.caffemodel'

net = caffe.Classifier(
model_file=MODEL_DEPLOY_FILE,
pretrained_file=MODEL_WEIGHT_FILE,
image_dims=(MODEL_ORIGINAL_INPUT_SIZE[0], MODEL_ORIGINAL_INPUT_SIZE[1]),
raw_scale=255., # scale befor mean subtraction
input_scale=None, # scale after mean subtraction
mean = MODEL_MEAN_VALUE,
channel_swap = (2, 1, 0) )

im = caffe.io.load_image(name)
oversample = False
scores = net.predict([im], oversample)

but, we got following results.
acc@1: 75.067003(37532/49998) acc@5: 92.237690(46117/49998) in pred:123.0000, load:12.0000, resize:0.0000 msec.
acc@1: 75.067501(37533/49999) acc@5: 92.237845(46118/49999) in pred:122.0000, load:10.0000, resize:0.0000 msec.
acc@1: 75.068000(37534/50000) acc@5: 92.238000(46119/50000) in pred:122.0000, load:15.0000, resize:0.0000 sec.

our result is slight lower than yours.

Please, give me comments and advices.

Thanks.

taey16 commented Feb 23, 2016

@KaimingHe, @ShaoqingRen
I also tried to reproduce your great results on ILSVRC2012 validation set.

https://gist.github.com/taey16/04f019dac78deada5e21

MODEL_ORIGINAL_INPUT_SIZE = 256, 256
MODEL_INPUT_SIZE = 224, 224
MODEL_MEAN_FILE = '/storage/ImageNet/ILSVRC2012/model/resnet/ResNet_caffe_models/ResNet_mean.binaryproto'
blob = caffe.proto.caffe_pb2.BlobProto()
data = open(MODEL_MEAN_FILE, 'rb').read()
blob.ParseFromString(data)
MODEL_MEAN_VALUE = np.squeeze(np.array( caffe.io.blobproto_to_array(blob) ))

MODEL_DEPLOY_FILE =
'/storage/ImageNet/ILSVRC2012/model/resnet/ResNet_caffe_models/ResNet-152-deploy.prototxt'
MODEL_WEIGHT_FILE =
'/storage/ImageNet/ILSVRC2012/model/resnet/ResNet_caffe_models/ResNet-152-model.caffemodel'

net = caffe.Classifier(
model_file=MODEL_DEPLOY_FILE,
pretrained_file=MODEL_WEIGHT_FILE,
image_dims=(MODEL_ORIGINAL_INPUT_SIZE[0], MODEL_ORIGINAL_INPUT_SIZE[1]),
raw_scale=255., # scale befor mean subtraction
input_scale=None, # scale after mean subtraction
mean = MODEL_MEAN_VALUE,
channel_swap = (2, 1, 0) )

im = caffe.io.load_image(name)
oversample = False
scores = net.predict([im], oversample)

but, we got following results.
acc@1: 75.067003(37532/49998) acc@5: 92.237690(46117/49998) in pred:123.0000, load:12.0000, resize:0.0000 msec.
acc@1: 75.067501(37533/49999) acc@5: 92.237845(46118/49999) in pred:122.0000, load:10.0000, resize:0.0000 msec.
acc@1: 75.068000(37534/50000) acc@5: 92.238000(46119/50000) in pred:122.0000, load:15.0000, resize:0.0000 sec.

our result is slight lower than yours.

Please, give me comments and advices.

Thanks.

@dieterichlawson

This comment has been minimized.

Show comment
Hide comment
@dieterichlawson

dieterichlawson Mar 2, 2016

so @KaimingHe @ShaoqingRen, just to clarify, the weights in the caffe model zoo are trained on BGR images?

so @KaimingHe @ShaoqingRen, just to clarify, the weights in the caffe model zoo are trained on BGR images?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment