Illegal instruction on training image similarity model from example #20

taflahi · 2017-12-09T11:04:59Z

I am getting this error while training:

I am using ubuntu 16.04 on Windows 10 WSL.

The code is as following

import turicreate as tc

reference_data = tc.image_analysis.load_images('./101_ObjectCategories')
reference_data = reference_data.add_row_number()

reference_data.save('caltech-101.sframe')

model = tc.image_similarity.create(reference_data)

model.save('img_sim_model.model')

The text was updated successfully, but these errors were encountered:

srikris · 2017-12-09T17:41:47Z

Can you share some more information about your system for us to be able to debug this issue?

Are you in a x86_64?
How much memory do you have on this machine? I have a suspicion that the default batch size (set to 512)

I have opened another issue #26 to track the issue related to memory.

srikris · 2017-12-09T17:45:30Z

And if you happen to have a machine that is not x86_64, we don't support that and I've opened an issue to make sure future users will get an error earlier on #27.

taflahi · 2017-12-09T22:43:23Z

Yes, I am using x86_64. But, the machine is only have mediocre memory (8 gb). I also suspect something like the batch size (since the tutorial is not explicitly state the number).

I have tried on much smaller dataset (<100 images) and it works just fine.

znation · 2017-12-13T02:06:29Z

Assigning to you @gustavla - but if this turns out to be a WSL-specific issue feel free to assign to me. Thanks!

francisbitontistudio · 2017-12-17T23:45:30Z

Got a similar problem here, I'm using paper space ubuntu 16.04 ML-in-a-box, it is x86_64, 30G RAM, It happen not matter how large the batch size is. Any insights would be very appreciated!

gustavla · 2017-12-18T22:55:32Z

I have still not been able to reproduce this issue.

@francisbitontistudio Thanks for adding another sample point to this investigation. I also see that you are using the object detector, which means it is not limited to image similarity.

@taflahi @francisbitontistudio Can you please try the following?

>>> import mxnet as mx
>>> a = mx.nd.random_uniform(shape=(2, 3), ctx=mx.cpu())
>>> b = mx.nd.random_uniform(shape=(3, 2), ctx=mx.cpu())
>>> mx.nd.dot(a, b).asnumpy()

Can you also try this but replace mx.cpu() with mx.gpu() in case your machines have GPUs?

francisbitontistudio · 2017-12-19T00:05:56Z

@gustavla Thanks for reply. So with mx.gpu() it works, but same problem when using cpu for this script

gustavla · 2017-12-19T06:20:38Z

@francisbitontistudio Thanks, that is really useful! After further investigation, it turns out that the MXNet binaries for the 0.11.0 release were compiled with x86_64 extensions not available on all hardware. This seems to have been fixed in the 0.12.0 release, which unfortunately is not yet supported by Turi Create. We are working on this and consider it high priority (#17).

Until then, I really wish I had a better work-around, but the only sure way to fix it is to compile MXNet yourself from source. This could get tricky (instructions), so I will let you know if I think of a better work-around. Sorry about this and thanks again for helping track this down.

gustavla · 2017-12-19T17:55:41Z

I may have found a work-around, although proceed with caution. MXNet puts up what looks to be nightly builds. I tried one of the 0.11 ones and this seems to resolve this issue. I do not know much about these versions, so I can't guarantee everything else will work correctly. Turi Create will also still complain that you have the wrong MXNet version.

pip install -U mxnet==0.11.1b20170822

Another thing you can do is install version 0.12.1. I am actually working on support for that version right now and the only issue seems to be in the Activity Classifier. So, if you are not using that model, then it should work and you can ignore the warning. This is of course only a temporary work-around until official support has been added.

francisbitontistudio · 2017-12-21T21:14:28Z

@gustavla using 0.12.1 works for me. Thanks a lot!

gustavla · 2018-01-08T16:20:07Z

Since MXNet support will be extended beyond 0.11 in the next release (#129, #164), I'm closing this issue.

Models Store Train/Validation Accuracy/RMSE in Consistent Way

srikris added the need user repro label Dec 9, 2017

znation assigned gustavla Dec 13, 2017

znation added the bug label Dec 13, 2017

gustavla removed the need user repro label Dec 19, 2017

gustavla closed this as completed Jan 8, 2018

shantanuchhabra pushed a commit to shantanuchhabra/turicreate that referenced this issue Mar 20, 2018

Merge pull request apple#20 from TobyRoseman/eval-metric-consistency

828d799

Models Store Train/Validation Accuracy/RMSE in Consistent Way

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Illegal instruction on training image similarity model from example #20

Illegal instruction on training image similarity model from example #20

taflahi commented Dec 9, 2017

srikris commented Dec 9, 2017

srikris commented Dec 9, 2017

taflahi commented Dec 9, 2017

znation commented Dec 13, 2017

francisbitontistudio commented Dec 17, 2017 •

edited

Loading

gustavla commented Dec 18, 2017

francisbitontistudio commented Dec 19, 2017

gustavla commented Dec 19, 2017

gustavla commented Dec 19, 2017

francisbitontistudio commented Dec 21, 2017

gustavla commented Jan 8, 2018

Illegal instruction on training image similarity model from example #20

Illegal instruction on training image similarity model from example #20

Comments

taflahi commented Dec 9, 2017

srikris commented Dec 9, 2017

srikris commented Dec 9, 2017

taflahi commented Dec 9, 2017

znation commented Dec 13, 2017

francisbitontistudio commented Dec 17, 2017 • edited Loading

gustavla commented Dec 18, 2017

francisbitontistudio commented Dec 19, 2017

gustavla commented Dec 19, 2017

gustavla commented Dec 19, 2017

francisbitontistudio commented Dec 21, 2017

gustavla commented Jan 8, 2018

francisbitontistudio commented Dec 17, 2017 •

edited

Loading