Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chinese font problem #8

Closed
DLUTfangping opened this issue May 7, 2018 · 10 comments
Closed

chinese font problem #8

DLUTfangping opened this issue May 7, 2018 · 10 comments

Comments

@DLUTfangping
Copy link

some chinese fonts can not generate good samples(for example ,some word could not be generated),do you have some suggests to solve the problem .thank you in advance

@Belval
Copy link
Owner

Belval commented May 7, 2018

Hi!

Please include the chinese words you were unable to print.

Thanks

@DLUTfangping
Copy link
Author

Thank you for you replying!
If I use the font that you provided ,I can generate good sample but when I change the font(eg.use other .ttf file ) the result maybe not so good.

@Belval
Copy link
Owner

Belval commented May 7, 2018

That would be because the font you used does not support all characters. I'll try to provide more fonts in the future.

In the meantime, I know https://github.com/JarveeLee/SynthText_Chinese_version/tree/master/data/fonts has a lot of choice but I cannot add them to this project over copyright infringements concerns.

@DLUTfangping
Copy link
Author

Thank you very much !!!

@Belval Belval closed this as completed Jun 4, 2018
@liangshuang1993
Copy link

@DLUTfangping , have you tried training using dataset generated by this? I'm using crnn to train, but the result is not good.

@Belval
Copy link
Owner

Belval commented Jun 20, 2018

@liangshuang1993 I can't say for Chinese, but I got decent result in English when using lowercase only (lowercase plus uppercase was a challenge). Also, while I don't know which implementation of CRNN you used, mine takes a long time to train (+50 hours on GTX 1080Ti) so it's very normal if the initial performance is very poor.

@liangshuang1993
Copy link

Hi @Belval , thanks for you answer.

Maybe I generated training data wrongly.

First I generated one dataset with word length is 5, using Gaussian Noise as background. the performance is good on training dataset and validation dataset, bad on some real pictures.

Then I generated another dataset with word length is 8, using given pictures as background. And I trained crnn on the whole dataset(dataset1 and dataset2). I have trained 7000 epochs, but the training dataset accuracy is still 0.5590. Strange thing is when I did some test on training data, I found it can barely recognize the word. So is this means I must have same length dataset? Thanks a lot.

By the way, each dataset has 500,000 pictures, containing English, Chinese and number.(They may appear in the same picture).

@Belval
Copy link
Owner

Belval commented Jun 20, 2018

@liangshanghuang1993 That is indeed a rather ambitious idea to learn both English and Chinese. Most implementation I know only do one.

But yes, same word count is required. The original author even went as far as only recognizing single words instead of multi-word sentences.

@liangshuang1993
Copy link

OK. Thanks a lot!!

@yts19871111
Copy link

您好@Belval,谢谢您的回答。

也许我错误地生成了训练数据。

首先,我以高斯噪声为背景,生成了一个字长为5的数据集。在训练数据集和验证数据集上表现良好,在某些真实图片上表现不佳。

然后,使用给定图片作为背景,生成了另一个字长为8的数据集。我对整个数据集(数据集1和数据集2)进行了crnn训练。我已经训练了7000个时期,但是训练数据集的准确性仍然是0.5590。奇怪的是,当我对训练数据进行一些测试时,我发现它几乎无法识别该词。那么这是否意味着我必须具有相同长度的数据集?非常感谢。

顺便说一下,每个数据集都有500,000张图片,包含英语,中文和数字(它们可能出现在同一张图片中)。

你好,请问这个问题你解决了吗,我也遇到这个问题了,在真实样本上的效果很差

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants