Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

index out of bound error when update eval metric #7664

Open
wenhe-jia opened this issue Aug 30, 2017 · 9 comments
Open

index out of bound error when update eval metric #7664

wenhe-jia opened this issue Aug 30, 2017 · 9 comments

Comments

@wenhe-jia
Copy link

wenhe-jia commented Aug 30, 2017

Hi ,I am training a binary classification model with my own dataset. I use mx.image.ImageIter API to load raw images according to the .lst file generated myself(without using img2rec.py).
I set the data iter as below,

train_iter = mx.image.ImageIter(
        batch_size   = batch_size,
        data_shape   = data_shape,
        path_imglist = '/database/liveness/data_prepare/liveness_train.lst',
        path_root    = '/',
        data_name    = 'data',
        label_name   = 'softmax_label',
        mean         = np.array([123.68, 116.78, 103.94]),
        resize       = 224,
        rand_mirror  = True,
        shuffle      = False,
        inter_method = 1)

And my .lst file is

16      1.0     /database/liveness/lf_face/real/5905c125337f3131b4f0856a_image_0.jpg
17      1.0     /database/liveness/lf_face/real/58dd0a71337f311c56026c9e_image_0.jpg
18      1.0     /database/liveness/lf_face/real/58fb11d6de6c741b3501fd52_image_0.jpg
19      1.0     /database/liveness/lf_face/real/59060ef0337f317a0f74a42e_image_0.jpg

Then i start training, the first epoch went well, but a error was reported at the second epoch(epoch 1) as follow,

epoch 0 / batch 3766 ======>> ('cross-entropy', 0.00092357182168862099)
2017-08-30 14:39:53.567102
Time cost(ms) on one batch: 797260
DataBatch: data shapes: [(128L, 3L, 224L, 224L)] label shapes: [(128L,)]
2017-08-30 14:39:54.532427
Traceback (most recent call last):
  File "train.py", line 111, in <module>
    mod.update_metric(metric, batch.label)
  File "/dlproject/incubator-mxnet/python/mxnet/module/module.py", line 735, in update_metric
    self._exec_group.update_metric(eval_metric, labels)
  File "/dlproject/incubator-mxnet/python/mxnet/module/executor_group.py", line 582, in update_metric
    eval_metric.update_dict(labels_, preds)
  File "/dlproject/incubator-mxnet/python/mxnet/metric.py", line 108, in update_dict
    self.update(label, pred)
  File "/dlproject/incubator-mxnet/python/mxnet/metric.py", line 916, in update
    prob = pred[numpy.arange(label.shape[0]), numpy.int64(label)]
IndexError: index 8285818191872 is out of bounds for axis 1 with size 2

batch 3766 is the second last batch of a epoch, and batch 3767 is the last batch of a epoch.
I set the eval metric in my training script with two components:

eval_metric = mx.metric.CompositeEvalMetric()
eval_metric.add(mx.metric.CrossEntropy())
eval_metric.add(mx.metric.Accuracy())

so what is wrong in my usage?
Thx for your answer!

@techzhou
Copy link

techzhou commented Sep 5, 2017

I have same problem

@changss
Copy link

changss commented Sep 25, 2017

same problem,too.

@tobechao
Copy link

I have same problem,too.

@wlbksy
Copy link
Contributor

wlbksy commented Nov 29, 2017

same problem here

@tobechao
Copy link

tobechao commented Jan 8, 2018

I add a while loop in image.py:
def next(self): ... try: while i < batch_size: label, s = self.next_sample() data = self.imdecode(s) try: self.check_valid_image(data) except RuntimeError as e: logging.debug('Invalid image, skipping: %s', str(e)) continue data = self.augmentation_transform(data) assert i < batch_size, 'Batch size must be multiples of augmenter output length' batch_data[i] = self.postprocess_data(data) batch_label[i] = label i += 1 except StopIteration: if not i: raise StopIteration while i < batch_size: import copy batch_data[i] = copy.deepcopy(batch_data[0]) batch_label[i] = copy.deepcopy(batch_label[0]) i += 1 ...

I copy the first batch_szie-i times, It can works.

@reminisce reminisce added the Bug label Mar 5, 2018
@wewan
Copy link

wewan commented Sep 14, 2018

I was training binary classification using .rec , met the same problem

@wenhe-jia
Copy link
Author

Maybe we should make our .rec files in our own ways to make sure it has no problem.

@vandanavk
Copy link
Contributor

@mxnet-label-bot add [Metric]

@anirudhacharya
Copy link
Member

@LeonJWH @techzhou @changss @tobechao @wlbksy can one of please share a minimum reproducible example for this bug?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants