New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance regression? caffe -> caffe2 #503

Open
tdp2110 opened this Issue May 4, 2017 · 13 comments

Comments

Projects
None yet
6 participants
@tdp2110

tdp2110 commented May 4, 2017

I have an install of caffe and caffe2 on my desktop linux machine (specs at end of post). I have an NVIDIA GPU and I believe both builds (caffe and caffe2) are using the GPU and CUDNN (both projects were built from source). I compared performance of caffe and caffe2 running squeezenet and found significantly better performance in caffe (using GPU mode) than the new caffe2. Here are my scripts

Caffe on GPU

import caffe
import numpy as np
import time


net_file = '/home/tom/caffe_models/SqueezeNet/SqueezeNet_v1.0/deploy.prototxt'
caffemodel_file = '/home/tom/caffe_models/SqueezeNet/SqueezeNet_v1.0/squeezenet_v1.0.caffemodel'


height, width = 227, 227

caffe.set_mode_gpu()
caffe.set_device(0)

net = caffe.Net(net_file, caffemodel_file, caffe.TEST)
net.blobs['data'].reshape(1, 3, height, width)

times =[]
np.random.seed(123456789)
for _ in xrange(100):
    image = np.random.rand(1, 3, height, width)
    net.blobs['data'].data[...] = image

    t0 = time.time()
    net.forward()
    times.append(time.time() - t0)

print "min time %s, mean time %s, max time %s, time stdev %s" % (
    min(times), np.mean(times), max(times), np.std(times))

and output:
min time 0.0024528503418, mean time 0.0026821231842, max time 0.0160629749298, time stdev 0.00134584111746

Caffe2

from caffe2.python import workspace
import numpy as np
import time

def get_predictor():
    with open("init_net.pb", "rb") as f:
        init_net = f.read()
    with open("predict_net.pb", "rb") as f:
        predict_net = f.read()

    return workspace.Predictor(init_net, predict_net)

predictor = get_predictor()

input_image_size = 227
height, width = input_image_size, input_image_size

times = []
np.random.seed(123456789)
for _ in xrange(100):
    image = np.random.rand(1, 3, height, width).astype(np.float32)
    arg = [image]

    t0 = time.time()
    results = predictor.run(arg)
    times.append(time.time() - t0)

print "min time %s, mean time %s, max time %s, time stdev %s" % (
    min(times), np.mean(times), max(times), np.std(times))

and output:
min time 0.0587921142578, mean time 0.0777468800545, max time 0.131857872009, time stdev 0.0156148011639

(I saw similar performance between the caffe2 model zoo protocol buffers and those generated from a caffe model using caffe2.python.caffe_translator).

I'm pretty sure my caffe2 setup is using GPU because python -m caffe2.python.operator_test.relu_op_test runs OK and references "engine=CUDNN").

So the question is, is this expected? Am I doing something wrong? I've tried a few other models, and caffe+GPU seems to beat caffe2 every time.


Here are my machine specs: I'm running 64-bit Ubuntu 16.04, Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz x 8, 16 GB RAM. And my GPU is a GeForce GTX 1080 with NVIDIA-SMI 375.51 Driver Version: 375.51

@salexspb

This comment has been minimized.

Show comment
Hide comment
@salexspb

salexspb May 4, 2017

Contributor

have you checked which engine are you using in Caffe2? You could verify that by looking into predict_net.Proto(). It should have "engine=CUDNN" for every op that you expect to run on CUDA

Contributor

salexspb commented May 4, 2017

have you checked which engine are you using in Caffe2? You could verify that by looking into predict_net.Proto(). It should have "engine=CUDNN" for every op that you expect to run on CUDA

@tdp2110

This comment has been minimized.

Show comment
Hide comment
@tdp2110

tdp2110 May 4, 2017

Thanks. So when I parse that string (predict_net) into a caffe2_pb2.NetDef protobuf and iterate over its opfields, none of them have CUDNN engine (in fact, they're all empty strings).

Does this mean that in order to run on GPU, the NetDef protobuf explicitly has to say so?

tdp2110 commented May 4, 2017

Thanks. So when I parse that string (predict_net) into a caffe2_pb2.NetDef protobuf and iterate over its opfields, none of them have CUDNN engine (in fact, they're all empty strings).

Does this mean that in order to run on GPU, the NetDef protobuf explicitly has to say so?

@tdp2110

This comment has been minimized.

Show comment
Hide comment
@tdp2110

tdp2110 May 4, 2017

Hmm, this seems to be related to #323, which doesn't seem to be resolved yet.

tdp2110 commented May 4, 2017

Hmm, this seems to be related to #323, which doesn't seem to be resolved yet.

@sergey-serebryakov

This comment has been minimized.

Show comment
Hide comment
@sergey-serebryakov

sergey-serebryakov May 4, 2017

@tdp2110 It seems your code, the way it's written, must run on CPU. You have to manually specify the appropriate device placement (at least, now).

sergey-serebryakov commented May 4, 2017

@tdp2110 It seems your code, the way it's written, must run on CPU. You have to manually specify the appropriate device placement (at least, now).

@tdp2110

This comment has been minimized.

Show comment
Hide comment
@tdp2110

tdp2110 May 5, 2017

Hi @sergey-serebryakov, thanks. I tried your def load_net_def from #323, with device_opts = core.DeviceOption(caffe2_pb2.CUDA, gpu_id) (having gpu_id = 0) on my predict_net in this example, and I get

Traceback (most recent call last):
  File "squeezenet/run_in_caffe2_simple.py", line 41, in <module>
    results = predictor.run(arg)
RuntimeError: [enforce fail at blob.h:76] IsType<T>(). wrong type for the Blob instance. Blob contains caffe2::Tensor<caffe2::CPUContext> while caller expects caffe2::Tensor<caffe2::CUDAContext> .
Offending Blob name: data.
Error from operator:
input: "data" input: "conv1_w" input: "conv1_b" output: "conv1" type: "Conv" arg { name: "stride" i: 2 } arg { name: "pad" i: 0 } arg { name: "kernel" i: 7 } device_option { device_type: 1 cuda_gpu_id: 0 }

If I use the same device_opts on both predict_net and init_net, I get

Traceback (most recent call last):
  File "squeezenet/run_in_caffe2_simple.py", line 39, in <module>
    results = predictor.run(arg)
RuntimeError: [enforce fail at predictor.cc:11] blob->template IsType<TensorCPU>(). Blob is not a CPU Tensor: data

I'll try to dig into the source to try and make sense of this, but maybe you already know.

tdp2110 commented May 5, 2017

Hi @sergey-serebryakov, thanks. I tried your def load_net_def from #323, with device_opts = core.DeviceOption(caffe2_pb2.CUDA, gpu_id) (having gpu_id = 0) on my predict_net in this example, and I get

Traceback (most recent call last):
  File "squeezenet/run_in_caffe2_simple.py", line 41, in <module>
    results = predictor.run(arg)
RuntimeError: [enforce fail at blob.h:76] IsType<T>(). wrong type for the Blob instance. Blob contains caffe2::Tensor<caffe2::CPUContext> while caller expects caffe2::Tensor<caffe2::CUDAContext> .
Offending Blob name: data.
Error from operator:
input: "data" input: "conv1_w" input: "conv1_b" output: "conv1" type: "Conv" arg { name: "stride" i: 2 } arg { name: "pad" i: 0 } arg { name: "kernel" i: 7 } device_option { device_type: 1 cuda_gpu_id: 0 }

If I use the same device_opts on both predict_net and init_net, I get

Traceback (most recent call last):
  File "squeezenet/run_in_caffe2_simple.py", line 39, in <module>
    results = predictor.run(arg)
RuntimeError: [enforce fail at predictor.cc:11] blob->template IsType<TensorCPU>(). Blob is not a CPU Tensor: data

I'll try to dig into the source to try and make sense of this, but maybe you already know.

@sergey-serebryakov

This comment has been minimized.

Show comment
Hide comment
@sergey-serebryakov

sergey-serebryakov May 5, 2017

@tdp2110 Do you have tensors that you feed into workspace? You can pass device_opts to FeedBlob method as well:

   workspace.FeedBlob(
        '%sdata' % (var_scope),
        np.random.randn(*shape).astype(np.float32),
        device_option=device_opt
    )

I think these are the only places (loading predict and init nets and feeding blobs) where I use device opts.

sergey-serebryakov commented May 5, 2017

@tdp2110 Do you have tensors that you feed into workspace? You can pass device_opts to FeedBlob method as well:

   workspace.FeedBlob(
        '%sdata' % (var_scope),
        np.random.randn(*shape).astype(np.float32),
        device_option=device_opt
    )

I think these are the only places (loading predict and init nets and feeding blobs) where I use device opts.

@tdp2110

This comment has been minimized.

Show comment
Hide comment
@tdp2110

tdp2110 May 5, 2017

I wasn't explicitly pushing into the workspace with FeedBlob, I was using the Predictor interface, as in some of the tutorials

    image = np.random.rand(1, 3, height, width).astype(np.float32)

    results = predictor.run([image])

I'll try looking into FeedBlob.

tdp2110 commented May 5, 2017

I wasn't explicitly pushing into the workspace with FeedBlob, I was using the Predictor interface, as in some of the tutorials

    image = np.random.rand(1, 3, height, width).astype(np.float32)

    results = predictor.run([image])

I'll try looking into FeedBlob.

@salexspb

This comment has been minimized.

Show comment
Hide comment
@salexspb

salexspb May 9, 2017

Contributor

@bwasti , did you have a chance to have a look into Predictor interface and running on GPU ? You have been looking into this if I am not mistaken.

Contributor

salexspb commented May 9, 2017

@bwasti , did you have a chance to have a look into Predictor interface and running on GPU ? You have been looking into this if I am not mistaken.

@salexspb

This comment has been minimized.

Show comment
Hide comment
@salexspb

salexspb May 9, 2017

Contributor

@tdp2110 , could you try the manual approach for now? (without Predictor)

Contributor

salexspb commented May 9, 2017

@tdp2110 , could you try the manual approach for now? (without Predictor)

@tdp2110

This comment has been minimized.

Show comment
Hide comment
@tdp2110

tdp2110 May 10, 2017

@salexspb thanks, I'll look into the lower-level approach for now.

tdp2110 commented May 10, 2017

@salexspb thanks, I'll look into the lower-level approach for now.

@raininglixinyu

This comment has been minimized.

Show comment
Hide comment
@raininglixinyu

raininglixinyu May 22, 2017

@tdp2110 Hi ,i met the same problem when i try to use the gpu, such as
blob->template IsType<TensorCPU>(). Blob is not a CPU Tensor: data
did you find the reason that cause the problem?

raininglixinyu commented May 22, 2017

@tdp2110 Hi ,i met the same problem when i try to use the gpu, such as
blob->template IsType<TensorCPU>(). Blob is not a CPU Tensor: data
did you find the reason that cause the problem?

@rams16592

This comment has been minimized.

Show comment
Hide comment
@rams16592

rams16592 May 22, 2017

@raininglixinyu Assign CPU and GPU separately for the different layers. Try something implemented by KeyKy here: https://github.com/KeyKy/caffe2/blob/master/caffe2/python/tutorials/Run_Alexnet_in_CPU_and_GPU_mode.ipynb

rams16592 commented May 22, 2017

@raininglixinyu Assign CPU and GPU separately for the different layers. Try something implemented by KeyKy here: https://github.com/KeyKy/caffe2/blob/master/caffe2/python/tutorials/Run_Alexnet_in_CPU_and_GPU_mode.ipynb

@tdp2110

This comment has been minimized.

Show comment
Hide comment
@tdp2110

tdp2110 May 23, 2017

Hi @raininglixinyu , I haven't looked at this in a few weeks, and caffe2 is moving pretty fast, but the last time I was working on this I believe my problem was basically that the Predictor interface didn't seem to expose a GPU mode. @salexspb is referring to a lower-level API for constructing nets which does allow you to run on GPU, but I haven't set up an example using it. If I do get a working example for this, I'll be happy to post it. That error in the blob setting you're getting is presumably an incompatibility between the types the net holds and the type you're trying to stuff into it.

tdp2110 commented May 23, 2017

Hi @raininglixinyu , I haven't looked at this in a few weeks, and caffe2 is moving pretty fast, but the last time I was working on this I believe my problem was basically that the Predictor interface didn't seem to expose a GPU mode. @salexspb is referring to a lower-level API for constructing nets which does allow you to run on GPU, but I haven't set up an example using it. If I do get a working example for this, I'll be happy to post it. That error in the blob setting you're getting is presumably an incompatibility between the types the net holds and the type you're trying to stuff into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment