Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem in the output #31

Closed
rmmal opened this issue Sep 11, 2017 · 6 comments
Closed

problem in the output #31

rmmal opened this issue Sep 11, 2017 · 6 comments

Comments

@rmmal
Copy link

rmmal commented Sep 11, 2017

I have trained a model with this command:

python multigpu_train.py --gpu_list=0,1,2 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/backup/EAST/
--text_scale=1024 --training_data_path=/DATA/EAST/data/ --geometry=RBOX --learning_rate=0.0001 --num_readers=12

and i've waited till:

Step 007130, model loss 0.0316, total loss 0.0827, 7.33 seconds/step, 5.73 examples/second

first Question should i make him , do more iterations or this is enough ???

second Question:
The output of all the images seems to be 1 size , why this is happening ?
i couldn't see many variations in the output dimensions

examples:
screenshot from 2017-09-11 09-42-23
screenshot from 2017-09-11 09-42-37
screenshot from 2017-09-11 09-42-58
screenshot from 2017-09-11 09-43-21

so what's missing to be able to detect blocks of text ?

@argman
Copy link
Owner

argman commented Sep 11, 2017

@rmmal , thanks for trying EAST, this is interesting,
I think you can get reasonable result training until 7000+ iterations.
can you provide a sample annotation of your data ?

In #18 , the result seems nice.

@rmmal
Copy link
Author

rmmal commented Sep 11, 2017

You are welcome @argman , Great effort btw.

As i said before i had a x1,y1 and x2,y2
where i converted them to x1,y1, x2,y1, x2,y2, x1,y2

screenshot from 2017-09-11 13-00-31
screenshot from 2017-09-11 13-00-44
screenshot from 2017-09-11 13-00-59
screenshot from 2017-09-11 13-01-08
screenshot from 2017-09-11 13-01-22
screenshot from 2017-09-11 13-01-31

so for example the annotations of the last image is

550,1667,833,1667,833,1737,550,1737,text
213,1679,493,1679,493,1731,213,1731,text
230,1741,802,1741,802,1777,230,1777,text
357,1330,687,1330,687,1458,357,1458,text
253,1492,790,1492,790,1539,253,1539,text
159,984,864,984,864,1093,159,1093,text

what could be wrong ? suggestions ?

@rmmal
Copy link
Author

rmmal commented Sep 11, 2017

setting text-scale 1024 , and input size 512 could make such a problem ?

@argman
Copy link
Owner

argman commented Sep 12, 2017

@rmmal , I think your problem is different. The definition of text maybe confusing for the network to learn. In the paper, text is defined as a single line. From your annotation, i cannot find what's the standard of a text line.
e.g. In the last image, why RADO TRUE AUTOMATIC DIAMONDS and HIGH-TECH forms a single block but not two lines

@rmmal
Copy link
Author

rmmal commented Oct 16, 2017

I edited the dataset and started now training again , I will see what will happen and tell you.

Thanks for your help @argman

@argman argman closed this as completed Dec 6, 2017
@ghost
Copy link

ghost commented Nov 13, 2018

@rmmal any updates regarding your training & results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants