Data Input by MemoryDataLayer example? #1168

slaweks17 · 2014-09-27T01:52:51Z

Hi,
Could someone please explain how I could convert say an .csv file with 20 columns, out of which first 15 are features and last 5 are outputs into something caffe would accept?
I am trying to use the Windows version, where, it appears, hdf5 data layer has not been ported.
Using MemoryDataLayer? But how do I specify number of inputs, outputs?

Thanks,
Slawek

shelhamer · 2014-10-04T05:55:18Z

@longjon the people need a Python solving + MemoryDataLayer example... once our models are serenely training of course.

vpestret · 2014-12-31T15:13:43Z

The #1665 pull request should be merged to enable MemoryDataLayer for loading data from Python.
For demonstration purposes I'll give cifar10_quick net and use examples/images/cat.jpg shipped with caffe.
First we should create database filled only one image (just for example). Then this DB will be used both by Python and DATA layer logic.
First step is to create file list (ASSUMING that the current directory is caffe root):

echo 'examples/images/cat.jpg 0' > examples/cifar10/file_list.txt

then convert it into data base

build/tools/convert_imageset --resize_height=32 --resize_width=32 ./ examples/cifar10/file_list.txt examples/cifar10/test_lmdb

Second we should create prototxt nets specification - one for python, next for caffe test command
First prototxt - place it into examples/cifar10/cifar10_quick_deploy.prototxt

name: "CIFAR10_quick_test"
input: "dummy_data"
input_dim: 1
input_dim: 3
input_dim: 32
input_dim: 32
layers {
  name: "data"
  type: MEMORY_DATA
  top: "data"
  top: "label"
  memory_data_param {
    batch_size: 1
    channels: 3
    height: 32
    width: 32
  }
  transform_param {
    mean_file: "examples/cifar10/mean.binaryproto"
  }
}
layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layers {
  name: "pool1"
  type: POOLING
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layers {
  name: "relu1"
  type: RELU
  bottom: "pool1"
  top: "pool1"
}
layers {
  name: "conv2"
  type: CONVOLUTION
  bottom: "pool1"
  top: "conv2"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layers {
  name: "relu2"
  type: RELU
  bottom: "conv2"
  top: "conv2"
}
layers {
  name: "pool2"
  type: POOLING
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layers {
  name: "conv3"
  type: CONVOLUTION
  bottom: "pool2"
  top: "conv3"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layers {
  name: "relu3"
  type: RELU
  bottom: "conv3"
  top: "conv3"
}
layers {
  name: "pool3"
  type: POOLING
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layers {
  name: "ip1"
  type: INNER_PRODUCT
  bottom: "pool3"
  top: "ip1"
  blobs_lr: 1
  blobs_lr: 2
  inner_product_param {
    num_output: 64
  }
}
layers {
  name: "ip2"
  type: INNER_PRODUCT
  bottom: "ip1"
  top: "ip2"
  blobs_lr: 1
  blobs_lr: 2
  inner_product_param {
    num_output: 10
  }
}
layers {
  name: "prob"
  type: SOFTMAX
  bottom: "ip2"
  top: "prob"
}

Second prototxt - place it into examples/cifar10/cifar10_quick_data.prototxt

name: "CIFAR10_quick_test"
layers {
  name: "data"
  type: DATA
  top: "data"
  top: "label"
  data_param {
    backend: LMDB
    source: "examples/cifar10/test_lmdb"
    batch_size: 1
  }
  transform_param {
    mean_file: "examples/cifar10/mean.binaryproto"
  }
}
layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layers {
  name: "pool1"
  type: POOLING
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layers {
  name: "relu1"
  type: RELU
  bottom: "pool1"
  top: "pool1"
}
layers {
  name: "conv2"
  type: CONVOLUTION
  bottom: "pool1"
  top: "conv2"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layers {
  name: "relu2"
  type: RELU
  bottom: "conv2"
  top: "conv2"
}
layers {
  name: "pool2"
  type: POOLING
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layers {
  name: "conv3"
  type: CONVOLUTION
  bottom: "pool2"
  top: "conv3"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
  }
}
layers {
  name: "relu3"
  type: RELU
  bottom: "conv3"
  top: "conv3"
}
layers {
  name: "pool3"
  type: POOLING
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layers {
  name: "ip1"
  type: INNER_PRODUCT
  bottom: "pool3"
  top: "ip1"
  blobs_lr: 1
  blobs_lr: 2
  inner_product_param {
    num_output: 64
  }
}
layers {
  name: "ip2"
  type: INNER_PRODUCT
  bottom: "ip1"
  top: "ip2"
  blobs_lr: 1
  blobs_lr: 2
  inner_product_param {
    num_output: 10
  }
}
layers {
  name: "prob"
  type: SOFTMAX
  bottom: "ip2"
  top: "prob"
}

The one could check how small they are differ from their origin examples/cifar10/cifar10_quick.prototxt.

Third part of prerequisites it the cifar-10 training should be ran to get examples/cifar10/cifar10_quick_iter_5000.caffemodel. For more info how to do that see Caffe docs link
The python script to load images into MemoryDataLayer (don't forget to make pycaffe and install lmdb module via pip, for example):

mport numpy as np
import lmdb
import sys
sys.path.insert(0, './python') # just to import caffe python library

import caffe
import caffe.proto
import caffe.io

# init caffe classifier
net = caffe.Classifier('examples/cifar10/cifar10_quick_deploy.prototxt',
                       'examples/cifar10/cifar10_quick_iter_5000.caffemodel')
net.set_phase_test()
net.set_mode_cpu()

# read image from database
env = lmdb.open('examples/cifar10/test_lmdb')

with env.begin() as txn:
    with txn.cursor() as curs:
        for key, value in curs:
            dat = value

datum = caffe.proto.caffe_pb2.Datum()
datum.ParseFromString(dat)
arr = np.array(caffe.io.datum_to_array(datum))
env.close()

# assign input
net.add_datum_vector(np.array([arr], dtype=np.float32),
                     np.array([0], dtype=np.float32))

# run
net._forward(0, len(net.layers) - 1)

# print result
print net.blobs[net.outputs[-1]].data.squeeze(axis=(2,3))

It will print this at the end:

[[  2.82827328e-04   8.86387534e-06   1.71649411e-01   5.60809523e-02
    7.10635662e-01   5.55399507e-02   1.31429185e-03   2.25703022e-03
    5.62651367e-05   2.17476999e-03]]

At last we this print can be compared with:

build/tools/caffe test --model=examples/cifar10/cifar10_quick_data.prototxt --weights=examples/cifar10/cifar10_quick_iter_5000.caffemodel --iterations 1 2>&1 | grep 'prob ='

However program crashes due to existence of SOFTMAX layer at the end of prototxt BUT it is still able to output the correct log, which indicates that Python and C++ versions match, Hurrray!

I1231 18:08:40.211896  3112 caffe.cpp:169] Batch 0, prob = 0.000282827
I1231 18:08:40.211917  3112 caffe.cpp:169] Batch 0, prob = 8.86388e-06
I1231 18:08:40.211932  3112 caffe.cpp:169] Batch 0, prob = 0.171649
I1231 18:08:40.211947  3112 caffe.cpp:169] Batch 0, prob = 0.056081
I1231 18:08:40.211961  3112 caffe.cpp:169] Batch 0, prob = 0.710636
I1231 18:08:40.211971  3112 caffe.cpp:169] Batch 0, prob = 0.05554
I1231 18:08:40.211983  3112 caffe.cpp:169] Batch 0, prob = 0.00131429
I1231 18:08:40.211998  3112 caffe.cpp:169] Batch 0, prob = 0.00225703
I1231 18:08:40.212013  3112 caffe.cpp:169] Batch 0, prob = 5.62651e-05
I1231 18:08:40.212025  3112 caffe.cpp:169] Batch 0, prob = 0.00217477
I1231 18:08:40.212116  3112 caffe.cpp:186] prob = 0.000282827

shelhamer · 2017-04-13T20:18:59Z

Closing as solved by the Python layer interface.

With Python layers almost all of these special data format issues are moot since if you have a line of Python that can load your data you can make a Python data layer to load it too.

shelhamer changed the title ~~data input from a text file~~ Data Input by MemoryDataLayer example? Oct 10, 2014

shelhamer added the question label Oct 10, 2014

vpestret mentioned this issue Dec 31, 2014

Bypassed memory_data_layer AddDatumVector to python #1665

Closed

longjon added documentation and removed question labels May 9, 2015

shelhamer closed this as completed Apr 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Input by MemoryDataLayer example? #1168

Data Input by MemoryDataLayer example? #1168

slaweks17 commented Sep 27, 2014

shelhamer commented Oct 4, 2014

vpestret commented Dec 31, 2014

shelhamer commented Apr 13, 2017

Data Input by MemoryDataLayer example? #1168

Data Input by MemoryDataLayer example? #1168

Comments

slaweks17 commented Sep 27, 2014

shelhamer commented Oct 4, 2014

vpestret commented Dec 31, 2014

shelhamer commented Apr 13, 2017