Skip to content
This repository has been archived by the owner on Dec 21, 2023. It is now read-only.

Illegal instruction on training image similarity model from example #20

Closed
taflahi opened this issue Dec 9, 2017 · 11 comments
Closed
Assignees
Labels

Comments

@taflahi
Copy link

taflahi commented Dec 9, 2017

I am getting this error while training:

snip

I am using ubuntu 16.04 on Windows 10 WSL.

The code is as following

import turicreate as tc

reference_data = tc.image_analysis.load_images('./101_ObjectCategories')
reference_data = reference_data.add_row_number()

reference_data.save('caltech-101.sframe')

model = tc.image_similarity.create(reference_data)

model.save('img_sim_model.model')
@srikris
Copy link
Contributor

srikris commented Dec 9, 2017

Can you share some more information about your system for us to be able to debug this issue?

  • Are you in a x86_64?
  • How much memory do you have on this machine? I have a suspicion that the default batch size (set to 512)

I have opened another issue #26 to track the issue related to memory.

@srikris
Copy link
Contributor

srikris commented Dec 9, 2017

And if you happen to have a machine that is not x86_64, we don't support that and I've opened an issue to make sure future users will get an error earlier on #27.

@taflahi
Copy link
Author

taflahi commented Dec 9, 2017

Yes, I am using x86_64. But, the machine is only have mediocre memory (8 gb). I also suspect something like the batch size (since the tutorial is not explicitly state the number).

I have tried on much smaller dataset (<100 images) and it works just fine.

@znation
Copy link
Contributor

znation commented Dec 13, 2017

Assigning to you @gustavla - but if this turns out to be a WSL-specific issue feel free to assign to me. Thanks!

@francisbitontistudio
Copy link

francisbitontistudio commented Dec 17, 2017

Got a similar problem here, I'm using paper space ubuntu 16.04 ML-in-a-box, it is x86_64, 30G RAM, It happen not matter how large the batch size is. Any insights would be very appreciated!
image

@gustavla
Copy link
Collaborator

I have still not been able to reproduce this issue.

@francisbitontistudio Thanks for adding another sample point to this investigation. I also see that you are using the object detector, which means it is not limited to image similarity.

@taflahi @francisbitontistudio Can you please try the following?

>>> import mxnet as mx
>>> a = mx.nd.random_uniform(shape=(2, 3), ctx=mx.cpu())
>>> b = mx.nd.random_uniform(shape=(3, 2), ctx=mx.cpu())
>>> mx.nd.dot(a, b).asnumpy()

Can you also try this but replace mx.cpu() with mx.gpu() in case your machines have GPUs?

@francisbitontistudio
Copy link

@gustavla Thanks for reply. So with mx.gpu() it works, but same problem when using cpu for this script

image

image

@gustavla
Copy link
Collaborator

@francisbitontistudio Thanks, that is really useful! After further investigation, it turns out that the MXNet binaries for the 0.11.0 release were compiled with x86_64 extensions not available on all hardware. This seems to have been fixed in the 0.12.0 release, which unfortunately is not yet supported by Turi Create. We are working on this and consider it high priority (#17).

Until then, I really wish I had a better work-around, but the only sure way to fix it is to compile MXNet yourself from source. This could get tricky (instructions), so I will let you know if I think of a better work-around. Sorry about this and thanks again for helping track this down.

@gustavla
Copy link
Collaborator

I may have found a work-around, although proceed with caution. MXNet puts up what looks to be nightly builds. I tried one of the 0.11 ones and this seems to resolve this issue. I do not know much about these versions, so I can't guarantee everything else will work correctly. Turi Create will also still complain that you have the wrong MXNet version.

pip install -U mxnet==0.11.1b20170822

Another thing you can do is install version 0.12.1. I am actually working on support for that version right now and the only issue seems to be in the Activity Classifier. So, if you are not using that model, then it should work and you can ignore the warning. This is of course only a temporary work-around until official support has been added.

@francisbitontistudio
Copy link

@gustavla using 0.12.1 works for me. Thanks a lot!

@gustavla
Copy link
Collaborator

gustavla commented Jan 8, 2018

Since MXNet support will be extended beyond 0.11 in the next release (#129, #164), I'm closing this issue.

@gustavla gustavla closed this as completed Jan 8, 2018
shantanuchhabra pushed a commit to shantanuchhabra/turicreate that referenced this issue Mar 20, 2018
Models Store Train/Validation Accuracy/RMSE in Consistent Way
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants