chinese font problem #8

DLUTfangping · 2018-05-07T05:54:10Z

some chinese fonts can not generate good samples(for example ,some word could not be generated),do you have some suggests to solve the problem .thank you in advance

Belval · 2018-05-07T10:13:04Z

Hi!

Please include the chinese words you were unable to print.

Thanks

DLUTfangping · 2018-05-07T10:49:41Z

Thank you for you replying!
If I use the font that you provided ,I can generate good sample but when I change the font(eg.use other .ttf file ) the result maybe not so good.

Belval · 2018-05-07T11:18:44Z

That would be because the font you used does not support all characters. I'll try to provide more fonts in the future.

In the meantime, I know https://github.com/JarveeLee/SynthText_Chinese_version/tree/master/data/fonts has a lot of choice but I cannot add them to this project over copyright infringements concerns.

DLUTfangping · 2018-05-07T11:21:51Z

Thank you very much !!!

liangshuang1993 · 2018-06-20T01:47:07Z

@DLUTfangping , have you tried training using dataset generated by this? I'm using crnn to train, but the result is not good.

Belval · 2018-06-20T02:09:51Z

@liangshuang1993 I can't say for Chinese, but I got decent result in English when using lowercase only (lowercase plus uppercase was a challenge). Also, while I don't know which implementation of CRNN you used, mine takes a long time to train (+50 hours on GTX 1080Ti) so it's very normal if the initial performance is very poor.

liangshuang1993 · 2018-06-20T10:14:13Z

Hi @Belval , thanks for you answer.

Maybe I generated training data wrongly.

First I generated one dataset with word length is 5, using Gaussian Noise as background. the performance is good on training dataset and validation dataset, bad on some real pictures.

Then I generated another dataset with word length is 8, using given pictures as background. And I trained crnn on the whole dataset(dataset1 and dataset2). I have trained 7000 epochs, but the training dataset accuracy is still 0.5590. Strange thing is when I did some test on training data, I found it can barely recognize the word. So is this means I must have same length dataset? Thanks a lot.

By the way, each dataset has 500,000 pictures, containing English, Chinese and number.(They may appear in the same picture).

Belval · 2018-06-20T10:25:13Z

@liangshanghuang1993 That is indeed a rather ambitious idea to learn both English and Chinese. Most implementation I know only do one.

But yes, same word count is required. The original author even went as far as only recognizing single words instead of multi-word sentences.

liangshuang1993 · 2018-06-20T13:23:23Z

OK. Thanks a lot!!

yts19871111 · 2020-10-14T03:17:20Z

您好@Belval，谢谢您的回答。

也许我错误地生成了训练数据。

首先，我以高斯噪声为背景，生成了一个字长为5的数据集。在训练数据集和验证数据集上表现良好，在某些真实图片上表现不佳。

然后，使用给定图片作为背景，生成了另一个字长为8的数据集。我对整个数据集（数据集1和数据集2）进行了crnn训练。我已经训练了7000个时期，但是训练数据集的准确性仍然是0.5590。奇怪的是，当我对训练数据进行一些测试时，我发现它几乎无法识别该词。那么这是否意味着我必须具有相同长度的数据集？非常感谢。

顺便说一下，每个数据集都有500,000张图片，包含英语，中文和数字（它们可能出现在同一张图片中）。

你好，请问这个问题你解决了吗，我也遇到这个问题了，在真实样本上的效果很差

Belval closed this as completed Jun 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chinese font problem #8

chinese font problem #8

DLUTfangping commented May 7, 2018

Belval commented May 7, 2018

DLUTfangping commented May 7, 2018

Belval commented May 7, 2018

DLUTfangping commented May 7, 2018

liangshuang1993 commented Jun 20, 2018

Belval commented Jun 20, 2018

liangshuang1993 commented Jun 20, 2018

Belval commented Jun 20, 2018

liangshuang1993 commented Jun 20, 2018

yts19871111 commented Oct 14, 2020

chinese font problem #8

chinese font problem #8

Comments

DLUTfangping commented May 7, 2018

Belval commented May 7, 2018

DLUTfangping commented May 7, 2018

Belval commented May 7, 2018

DLUTfangping commented May 7, 2018

liangshuang1993 commented Jun 20, 2018

Belval commented Jun 20, 2018

liangshuang1993 commented Jun 20, 2018

Belval commented Jun 20, 2018

liangshuang1993 commented Jun 20, 2018

yts19871111 commented Oct 14, 2020