Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrained Image model doesnt work #10

Closed
iiidefektiii opened this issue Jul 27, 2018 · 19 comments
Closed

Retrained Image model doesnt work #10

iiidefektiii opened this issue Jul 27, 2018 · 19 comments

Comments

@iiidefektiii
Copy link

I went through the process of creating my own and everything worked in creating it. It output the .pb file but when I run it it doesn't work. The one thing I noticed is that your .bytes file looks like this:

0a36 0a05 696e 7075 7412 0b50 6c61 6365
686f 6c64 6572 2206 2f63 7075 3a30 2a0b
0a05 6474 7970 6512 0230 012a 0b0a 0573
6861 7065 1202 3a00 0ad2 a602 0a09 636f
6e76 3264 305f 7712 0543 6f6e 7374 2206
2f63 7075 3a30 2a0b 0a05 6474 7970 6512
0230 012a a7a6 020a 0576 616c 7565 129c

while the one that was output for me using the retrain.py looks like this all the way through where the sqaures are symbols it just wouldnt show on there:
O
�Placeholder��Placeholder
�dtype��0�
&
�shape��:����ÿÿÿÿÿÿÿÿÿ����«����«�����
å�
(module/InceptionV3/Conv2d_1a_3x3/weights��Const¤�
�value�š�B—�������������������� "€�Û�뾧�
½ áI»I!ʽù(
½�<F=ÞŒm>K÷¥>á

Any ideas as to how to get it into the format that you have? I'm thinking this may be the problem?

@iiidefektiii
Copy link
Author

I did do the freeze graph thing and it output the new frozen file that now looks like the one you have in the project but it detects nothing. Any Ideas or point me in the right direction?

@Syn-McJ
Copy link
Owner

Syn-McJ commented Jul 28, 2018

Could you send me your model and labels to check? Also, since your model is Inception you might have different Input and Output names.

@Syn-McJ
Copy link
Owner

Syn-McJ commented Jul 30, 2018

It seems like the problem might be in mismatching versions of TensorFlow you used to train the model and the one that is used in the Unity plugin. You can check this issue for details: #6

@iiidefektiii
Copy link
Author

iiidefektiii commented Jul 30, 2018 via email

@iiidefektiii
Copy link
Author

iiidefektiii commented Jul 30, 2018 via email

@iiidefektiii
Copy link
Author

iiidefektiii commented Jul 30, 2018

Ive also figured out my input output names (Placeholder, final_result) through tensorboard and has a size of {"size":299},{"size":299},{"size":3}]}}

I am guessing the 299 goes in the "classifyImageSize" or "detectImageSize" variable on the PhoneCamera.cs depending on which you are using? Because I cannot find private static int INPUT_SIZE anymore.

That leaves me with IMAGE_MEAN & IMAGE_STD what are those?

It also says in your readme that you didn't know if you used 1.4 or 1.5. I cannot get the retrain scripts to work for 1.4 only 1.5 and so far it hasn't worked on the Unity end. Any Ideas? Maybe send me the retrain.py you were using for 1.4?

Thanks!

@Syn-McJ
Copy link
Owner

Syn-McJ commented Jul 31, 2018

Yes, input size goes to classifyImageSize variable. For mean and std you might want to try 128, although label_image.py script uses input_mean = 0 and input_std = 255, so try that as well.

If 1.5 doesn't work in Unity then plugin probably doesn't support it yet, so you're gonna have to use 1.4, but to be sure it's better to check what the problem is exactly with adb logcat.

I see in ternsorflow repo that they changed retrain.py script with 1.7 version, so you should try to retrain using script from 1.4 release branch: https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/image_retraining/retrain.py

@Syn-McJ
Copy link
Owner

Syn-McJ commented Jul 31, 2018

Also, I can see here that latest version of the plugin (0.4) seems to use 1.7.1 version of TensorFlow, so you can try to install that version (in my readme I have link to the 0.3). Be aware that there is a migration guide.

@iiidefektiii
Copy link
Author

Getting Closer. I used 1.4 and the 1.4 retrain script and it retrained the images and the output looks like yours. There were no checkpoints with this version just output_graph and output_labels. I still can't get it to run in unity tho. I did an ADB and get this over and over till the app crashes now. It may be the values. I am trying to get them but this time tensorboard wont run. just says no active graphs when I know its running. So figuring that out now.

07-31 10:26:08.933 15494 15515 E Unity : [[Node: DecodeJpeg = DecodeJpegacceptable_fraction=1, channels=3, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false]]
07-31 10:26:08.933 15494 15515 E Unity : at TensorFlow.TFStatus.CheckMaybeRaise (TensorFlow.TFStatus incomingStatus, System.Boolean last) [0x0004a] in <252020d87a4e4581ad2cfe3f9cc7a0ac>:0
07-31 10:26:08.933 15494 15515 E Unity : at TensorFlow.TFSession.Run (TensorFlow.TFOutput[] inputs, TensorFlow.TFTensor[] inputValues, TensorFlow.TFOutput[] outputs, TensorFlow.TFOperation[] targetOpers, TensorFlow.TFBuffer runMetadata, TensorFlow.TFBuffer runOptions, TensorFlow.TFStatus status) [0x00144] in <252020d87a4e4581ad2cfe3f9cc7a0ac>:0
07-31 10:26:08.933 15494 15515 E Unity : at TensorFlow.TFSession+Runner.Run (TensorFlow.TFStatus status) [0x00033] in <252020d87a4e4581ad2cfe3f9cc7a0ac>:0
07-31 10:26:08.933 15494 15515 E Unity : at TFClassify.Classifier+c__AnonStorey0.<>m__0 () [0x00084] in <4bbe071d8c97431dad031894333c811e>:0

@Syn-McJ
Copy link
Owner

Syn-McJ commented Jul 31, 2018

DecodeJpeg operation seem suspicions to me, it isn't supported on mobile and that's why I have TransformInput method for transforming the image. Maybe tomorrow I'll check that example you sent me and see if there are any problems possibly arise from it.

@iiidefektiii
Copy link
Author

I got rid of that error. Not sure what it was but rebuilt and now I get a null reference to object exception. I need to find out what all the input mean and input std, size and input output names but the 1.4 version .pb file tells me there is no event data for the graph which is weird. I am going to upgrade and migrate to 1.7 and see if it works. I'm running out of ideas.

I can send you a zip of the project if you want. that's all I've done is retrain the flower images from the example and put them in the resources folder at this point.

@Syn-McJ
Copy link
Owner

Syn-McJ commented Jul 31, 2018

Sure, send it. I'll try to check tomorrow, but no promises. Definitely will check before next week.

@iiidefektiii
Copy link
Author

If not no worries. Pulling my hair out trying to figure out why it won't run. haha

@iiidefektiii
Copy link
Author

iiidefektiii commented Jul 31, 2018

@Syn-McJ
Copy link
Owner

Syn-McJ commented Aug 3, 2018

Hi @iiidefektiii,

I checked your model in my project and I still see DecodeJpeg error. This is definitely a problem, DecodeJpeg operation simply isn't supported on mobile.

I think the issue might be Inception model architecture which uses that operation. I only tried my example with Mobilenet architecture and it would makes sense if it doesn't have DecodeJpeg operation since it has been created specifically to work on mobile platforms.

So please try to train a Mobilenet model and check it with my example again. To be sure use 1.4 version of TensorFlow and the script.

You can train Mobilenet model by specifying architecture with a flag --architecture mobilenet_1.0_224, for example: python retrain.py --image_dir ~/flower_photos --architecture mobilenet_1.0_224

Let me know how it goes.

@Syn-McJ
Copy link
Owner

Syn-McJ commented Aug 12, 2018

Hi @iiidefektiii, I'm gonna clos this issue for now, feel free to reopen if you tried mobilenet model and still have problems.

@Syn-McJ Syn-McJ closed this as completed Aug 12, 2018
@iiidefektiii
Copy link
Author

iiidefektiii commented Aug 12, 2018 via email

@iiidefektiii
Copy link
Author

I deleted the project and started from scratch.
Verified TensorFlow 1.4 was installed.
Verified that TensorFlowSharp 0.3 was in the project
Retained using mobilenet_1.0_224 architecture
Verified that the classify image size was 224 in TensorBoard
Verified that input/output was input/final_result in Tensorboard
Changed the .pb to bytes and dropped it and the labels file into the project

Everything works now. Not sure where it was getting stuck but knowing that you used mobilenet_1.0_224 architecture was probably a HUGE help.

Thanks for the help.

@Syn-McJ
Copy link
Owner

Syn-McJ commented Aug 14, 2018

Hi @iiidefektiii , that's awesome, glad you fixed it. I should probably check inception model again to confirm that it won't work and update the readme.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants