Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

How to bind 3 inputs using mxnet.io.NDArrayIter? #4159

Closed
fungtion opened this issue Dec 8, 2016 · 17 comments
Closed

How to bind 3 inputs using mxnet.io.NDArrayIter? #4159

fungtion opened this issue Dec 8, 2016 · 17 comments

Comments

@fungtion
Copy link

fungtion commented Dec 8, 2016

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Operating System: Ubuntu 14.04

Compiler:python 2.7

Package used (Python/R/Scala/Julia):python

I want to feed the convolution net with triple input: image, annotation, label. According to #2929, I define the input using mxnet.io.NDArrayIter as:

import mxnet as mx
import numpy as np

train = mx.io.NDArrayIter(data=np.zeros((120000, 3, 224, 224), dtype='float32'), 
    label={'label1': np.zeros((120000, 81), dtype='int8'),
    'label2': np.zeros((120000, ), dtype='int8')}, 
    batch_size=10)

However, I got an error: include/mxnet/./tensor_blob.h:742:check failed: (this->shape_.Size())==(shape.size()) TBlob.get_with_shape: new and old shape do not match total elements and
TypeError: Invalid type'<type 'numpy.ndarray'> for data, should be NDArray or numpy.ndarray'

Does it means data and label should match in shape? And how can I feed triple input (one data and two kinds of label) to network?

@kevinthesun
Copy link
Contributor

kevinthesun commented Dec 8, 2016

I didn't get this error when running your code. Did this error arise from the latter codes? Basically you can use a list or a dict of ndarray for both data and label. http://mxnet.io/api/python/io.html#mxnet.io.NDArrayIter

@fungtion
Copy link
Author

fungtion commented Dec 8, 2016

@kevinthesun I make a script only these codes above, it still reports this error

@kevinthesun
Copy link
Contributor

@fungtion Can you paste the full traceback message for your error?

@fungtion
Copy link
Author

fungtion commented Dec 9, 2016

@kevinthesun This is the traceback message:

/home/wfx/mxnet/dmlc-core/include/dmlc/logging.h:235: [07:54:27] include/mxnet/./tensor_blob.h:742: Check failed: (this->shape_.Size()) == (shape.Size()) TBlob.get_with_shape: new and old shape do not match total elements
Traceback (most recent call last):
  File "/home/wfx/PycharmProjects/kb/mxnet_triple_input/tmp.py", line 7, in <module>
    train = mx.io.NDArrayIter(data=np.zeros((120000, 3, 224, 224), dtype='float32'),label={'label1': np.zeros((120000, 81), dtype='int8'), 'label2': np.zeros((120000, ), dtype='int8')},batch_size=10)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/io.py", line 420, in __init__
    self.data = _init_data(data, allow_empty=False, default_name='data')
  File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/io.py", line 391, in _init_data
    "should be NDArray or numpy.ndarray")
TypeError: Invalid type '<type 'numpy.ndarray'>' for data, should be NDArray or numpy.ndarray

Process finished with exit code 1

@shuokay
Copy link
Contributor

shuokay commented Dec 9, 2016

I have implement triple input when doing image retrieval using triplet loss. I used DataIter instead of NDArrayIter, the followed code maybe give you a hint:

class DataBatch(object):
    def __init__(self, data, label):
        self.data = data
        self.label = label

class DataIter(mx.io.DataIter):
    def __init__(self, names, batch_size):
        super(DataIter, self).__init__()
        self.provide_data = [('same', (batch_size, 3, HEIGHT, WIDTH)),('diff', (batch_size, 3, HEIGHT, WIDTH)),('one', (batch_size, ))]
        self.provide_label = [('anchor', (batch_size, 3, HEIGHT, WIDTH))]
        self.started = True
        self.q = multiprocessing.Queue(4)
        self.pws = [multiprocessing.Process(target=self.write,) for i in range(2)]
        for pw in self.pws:
            pw.start()

    def write(self):
        while True:
            if not self.started:
                break
            batch = self.generate_batch(self.batch_size)
            batch_anchor = [x[0] for x in batch]
            batch_same = [x[1] for x in batch]
            batch_diff = [x[2] for x in batch]
            batch_one = np.ones(self.batch_size)
            data_all = [mx.nd.array(batch_same, ctx=dev), mx.nd.array(batch_diff, ctx=dev),mx.nd.array(batch_one, ctx=dev)]
            label_all = [mx.nd.array(batch_anchor, ctx=dev)]
            data_batch = DataBatch(data_all, label_all)
            self.q.put(data_batch)
    def next(self):
        if self.q.empty():
            logging.debug("waiting for data")
        if self.iter_next():
            return self.q.get(True)
        else:
            raise StopIteration

@kevinthesun
Copy link
Contributor

kevinthesun commented Dec 9, 2016

@fungtion Is your mxnet package updated version? Since I have no problem to run your code. Also the traceback shows the exception position is in line 391 of io.py, which should be in line 397 in current version.

Another possible reason is your mxnet python path is not correctly set. It should be .../mxnet/python

@fungtion
Copy link
Author

fungtion commented Dec 9, 2016

@shuokay Thank you, I will try it later

@fungtion
Copy link
Author

fungtion commented Dec 9, 2016

@kevinthesun I am sure that the path is the correct, and other operations I used in mxnet works well except this one. Can it be the problem of gcc, because I used gcc-4.8.2.

@kevinthesun
Copy link
Contributor

@fungtion I'm wondering why traceback reported error at line 391 at io.py. This exception should be at line 397.

@fungtion
Copy link
Author

fungtion commented Dec 12, 2016

@kevinthesun I reduced the numpy array size to 12000, it run without error, but if I increase to 120000, it still report memory error, and memory error occurred http://mxnet.io/zh/api/python/ndarray.html#mxnet.ndarray.array, the RAM in my computer is 32GB, is it not enough to run?

@fungtion
Copy link
Author

@shuokay where is the function generate_batch?

@szha
Copy link
Member

szha commented Sep 28, 2017

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

@szha szha closed this as completed Sep 28, 2017
@anjishnu
Copy link
Contributor

I just got this issue -

TypeError: Invalid type '<type 'numpy.ndarray'>' for x_2, should be NDArray, numpy.ndarray or h5py.Dataset

I only seem get it when my array is > certain number of elements. I reduce my array size and the code seems to run fine. Is the suggested solution to write my own DataIter?

@anjishnu
Copy link
Contributor

I am also using a network architecture like OP which takes in 2 streams as input.

@szha szha reopened this Nov 10, 2017
@szha
Copy link
Member

szha commented Feb 11, 2018

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

@kalyc
Copy link
Contributor

kalyc commented Jun 13, 2018

Hey @anjishnu thanks for submitting the issue. Were you able to resolve it?

@lanking520
Copy link
Member

Close this issue due to the inactivity. Please feel free to reopen this if you are still facing the same problems.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants