Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

use 224x224 size to train mobilenet will encounter memory problem #8391

Closed
HaoLiuHust opened this issue Oct 23, 2017 · 7 comments
Closed

use 224x224 size to train mobilenet will encounter memory problem #8391

HaoLiuHust opened this issue Oct 23, 2017 · 7 comments

Comments

@HaoLiuHust
Copy link
Contributor

HaoLiuHust commented Oct 23, 2017

I was trying to use CIFAR dataset to train a mobilenet, since the origin size is 32x32 so I resize them to 224x224 which is the size of the mobilenet input:

def ExtractImageAndLabel(path,file):
    with open(path+file, 'rb') as f:
        data = cPickle.load(f)
        images = data['data']
        images = np.reshape(images, (10000, 3, 32, 32))
        images_resize = mx.nd.zeros(shape=(10000, 3, 224, 224))
        imagearray = mx.nd.array(images)

        for i, img in enumerate(imagearray):
            img=mx.nd.transpose(img,(2,1,0))
            images_rs = mx.image.imresize(img,224,224)
            images_resize[i]=mx.nd.transpose(images_rs,(2,1,0))

        labels = data['labels']

        labelarray = mx.nd.array(labels)

        return images_resize, labelarray_

but after I resize them, I will encounter this problem:
mxnet_source/dmlc-core/include/dmlc/./logging.h:308: [16:20:13] include/mxnet/././tensor_blob.h:275: Check failed: this->shape_.Size() == shape.Size() (4515840000 vs. 220872704) TBlob.get_with_shape: new and old shape do not match total elements

if I did not resize the image, it will be ok, and I check the meomery use, seems a memory leak happened, my computer has 64GB memory, the program will use them all.

the other parts of my script is:

_import mxnet as mx
from mxnet import nd, autograd, gluon
import numpy as np
import time
import Utils
import logging
import mobilenet

def ReadTrainSets(path):
    train_data = []
    train_label = []

    data_pre = 'data_batch_'
    for i in range(1,6):
        file_name = data_pre+str(i)
        data_batch, label_batch = Utils.ExtractImageAndLabel(path, file_name)
        if not len(train_data):
            train_data = data_batch
            train_label = label_batch

        else:
            train_data = mx.nd.concatenate([train_data,data_batch])
            train_label = mx.nd.concatenate([train_label,label_batch])

    return train_data, train_label

path = './cifar-10-batches-py/'
batch = 128
ctx =(mx.gpu(2),mx.gpu(3))
train_data, train_label=ReadTrainSets(path)

print train_data.shape
print train_label.shape

train_iter = mx.io.NDArrayIter(data=train_data,label=train_label,batch_size=batch,shuffle=True)

net = gluon.model_zoo.vision.get_mobilenet(pretrained=False,ctx=ctx, multiplier = 1,classes=10)
data=mx.sym.Variable('data')
output = net(data)
softmax = mx.symbol.SoftmaxOutput(data=output, name='softmax')
#softmax = mobilenet.get_symbol(10)

logging.basicConfig(level=logging.INFO)

mod = mx.mod.Module(symbol=softmax, context=ctx)

print mod.symbol.get_internals()

mod.bind(data_shapes=train_iter.provide_data, label_shapes=train_iter.provide_label)
mod.init_params(initializer=mx.init.Xavier(magnitude=2.))

mod.fit(train_iter, optimizer_params={'learning_rate':0.1, 'momentum':0.9}, num_epoch=100)
mod.save_checkpoint("mobilenet", epoch=100)_

can any one help?

@chinakook
Copy link
Contributor

use tiny imagenet but not cifar

@HaoLiuHust
Copy link
Contributor Author

I tried to use imagerecorditer, it will be ok, maybe its a bug in NDarray

@szha
Copy link
Member

szha commented Jan 23, 2018

@apache/mxnet-committers: This issue has been inactive for the past 90 days. It has no label and needs triage.

For general "how-to" questions, our user forum (and Chinese version) is a good place to get help.

@lanking520
Copy link
Member

@sandeep-krishnamurthy please add Python, Bug, NDArray, Memory to this topic

@vandanavk
Copy link
Contributor

@HaoLiuHust are you still seeing this issue?

with the latest code, I don't see the issue "Check failed: this->shape_.Size() == shape.Size() (4515840000 vs. 220872704) TBlob.get_with_shape: new and old shape do not match total elements".

Could you share the system configuration on which you see this issue?

@HaoLiuHust
Copy link
Contributor Author

@vandanavk It is a long time..., the program has lost, maybe it is ok now..

@sandeep-krishnamurthy
Copy link
Contributor

Closing the issue as it is not reproducible. Please reopen in case issue still persists.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants