Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Procedure for training a new dataset #13

Open
rohit12 opened this issue Mar 13, 2018 · 28 comments
Open

Procedure for training a new dataset #13

rohit12 opened this issue Mar 13, 2018 · 28 comments

Comments

@rohit12
Copy link

rohit12 commented Mar 13, 2018

Hello Christian,

I am starting a new issue here so that others who want to train your code on a new dataset in the future are able to do so.

According to me, the steps to train a new model are as follows:

  1. Create a char_map for the model.
  2. Create a curriculum (which is optional).
  3. Create a ground truth for the dataset.

How do I actually create a char_map? I want the model to predict words rather than individual characters. I guess I am sort of confused about the nature of the char_map.

How should the GT be? I have an image with 4-5 text regions and each text region has some number of words in it. So should my GT file start with 5 40 indicating that there are 5 text regions with 40 characters each? What if the number of text regions that I indicate is far more than the actual number of text regions in some image?

When I create the GT file, do I just mention write down each character separated by a tab? Can I include words in the GT file?

Thanks!

@Bartzi
Copy link
Owner

Bartzi commented Mar 14, 2018

Lets go through this step-by-step:

  1. New char_map:
  • you might not need to create a new char map for each new dataset, most of the times it will be enough to have a look at the existing ones and extend one of them, shrink one of them, or just it as it is
  • Remember: the char_map is just a mapping from predicted class (of the classifier) to the semantic meaning of this class! So basically you could do wahtever you want! I would not recommend to use entire words, as this would mean that you will have to train a very huge classifier in the end.
  • the char_map is also not required for training the model is just important for decoding the results, without it you won't be able to make sense of the predictions of the model, as you have no chance to get the semantics of the model!
  • for an example of how to create a char_map have a look at this post
  1. Curriculum
  • yes a curriculum is optional, but you will need the curriculum.json file, that describes the dataset you want to use for training
  1. Creating a groundtruth
  • the way you create a new groundtruth is up to you! You'll only need to write some code to load the data and assign the labels (see for instance this file. You could also get rid of the curriculum entirely (should make things easier, but means that you have to modify code)
  • I do have some recommendations for your data:
    • Don't try to make the model learn to find more than one word at the same time. This won't work with the current code.
    • You should rather create your groundtruth in a way that you handle each word as a text region in the image. This also means that you'll need to find a way to distinguish between several text regions, but you should be clever enough to find a way around this problem.
  • It should not be a problem if the number of text regions is far more than the actual number of text regions in some images. As long as you represent the missing text lines with a blank_label meaning the network should produce the no character output, the network will learn to give the correct output for those cases.

@rohit12
Copy link
Author

rohit12 commented Mar 16, 2018

Thanks a lot! I am close to training it on a new dataset. However, I have one question. I have been trying to duplicate your results on the svhn dataset. I used the data that you linked to in the initial comments in #6. However, I am unable to get much accuracy. You can see the accuracy and loss graphs here. Any ideas on why this might be happening?

Also, can I see what happens internally in the network while training? Does the boxes folder contain the improvements over time?

loss

accuracy

@Bartzi
Copy link
Owner

Bartzi commented Mar 16, 2018

Your plots look quite good... I suggest letting the training run until loss/accuracy level out.
Once that happens, you should restart the training and load the weights you trained with -r <path to trained model> and also add the command line switch --load-localization you should get more improvements while doing this.

Every image in the bboxes folder contains the predictions of the network at a given iteration (indicated by the filename), so if there are any improvements over time, you should see them in the images. You can create a video out of those images, if you are using the script utils/create_video.py.

@rohit12
Copy link
Author

rohit12 commented Mar 18, 2018

So, I looked at the video created by create_video.py. I have a couple of questions:

  1. How do I actually interpret what is happening in the network? I am trying to understand what the network does and why so I can better understand the limitations of the model.
  2. The video is really huge. It was 200MB and an hour long. How can I shorten it (if possible)?
  3. Can I create the video for any custom image? I see that the video is created only for a specific image.
  4. Can you tell me for how many epochs did you train? I have trained for 30 epochs and I don't think the network has yet converged. Also, by the trained model, you mean the .npz file, right? What does --load-localization do?

@Bartzi
Copy link
Owner

Bartzi commented Mar 19, 2018

  1. the video shows you how the network evolves over time. You can see what the network predicts on a validation image for each iteration the network did in the training process. The video should show you the original input image with the location of the predicted bboxes, you should also see the cropped text regions and in the top right corner, the current prediction of the text in the image. Furthermore you might see (depends on the configuration) a visualization of all the things the network deems interesting (produced using visual backprop)
  2. Yes the video is huge, that is right^^ You can change start and stop iteration to take the images from, other than that, you can right now only increase the playback speed.
  3. Yes, with the command line switch --test-image you can choose an image. This image will then be used during your training to create these visualizations
  4. Normally I traines around 20 to 40 epochs, then I restarted the training, using the produced .npz file (yes, that is the model) and the --load-localization switch. This switch tells the program to only load the weights for the localization part of the network, but not the weights of the recognition net. The recognition net will be randomly initialized. I found that this trick makes it possible to get reasonable accuracies.

By the way: which SVHN dataset are you using?

@rohit12
Copy link
Author

rohit12 commented Mar 19, 2018

I am using the SVHN dataset that you linked to in this comment.

@rohit12
Copy link
Author

rohit12 commented Mar 19, 2018

BTW, I tried running the text recognition model on an image in my dataset. I used the following commmand:

python3 chainer/fsns_demo.py text_rec/model/ model_190000.npz test_img/9.png da tasets/fsns/fsns_char_map.json

I got the following error:
Traceback (most recent call last):
File "chainer/fsns_demo.py", line 140, in
network = create_network(args, log_data)
File "chainer/fsns_demo.py", line 60, in create_network
localization_net = build_localization_net(localization_net_class, args)
File "chainer/fsns_demo.py", line 48, in build_localization_net
return localization_net_class(args.dropout_ratio, args.timesteps)
TypeError: init() missing 2 required positional arguments: 'num_refinement_steps' and 'target_shape'

Is there any reason for this? I remember successfully running this once.

@Bartzi
Copy link
Owner

Bartzi commented Mar 19, 2018

Yeah, but which of the three datasets in this archive are you using?

You should not use fsns_demo.py, but rather text_recognition_demo.py

@rohit12
Copy link
Author

rohit12 commented Mar 19, 2018

I am using the easy and centered. My curriculum file is as follow:

[
{
"train":"/home/rohit_shinde12194/see/data/generated/easy/train.csv",
"validation":"/home/rohit_shinde12194/see/data/generated/easy/valid.csv"
},
{
"train":"/home/rohit_shinde12194/see/data/generated/centered/train.csv",
"validation":"/home/rohit_shinde12194/see/data/generated/centered/valid.csv"
}
]

@rohit12
Copy link
Author

rohit12 commented Mar 19, 2018

I am getting the following output for this image.
9

OrderedDict([('mu',
              [OrderedDict([('top_left', (0.0, 0.0)),
                            ('bottom_right', (95.19923400878906, 64.0))]),
               OrderedDict([('top_left', (53.05111312866211, 0.0)),
                            ('bottom_right',
                             (141.64654541015625, 63.123931884765625))]),
               OrderedDict([('top_left', (96.989990234375, 0.0)),
                            ('bottom_right',
                             (188.36154174804688, 62.051673889160156))]),
               OrderedDict([('top_left', (129.80397033691406, 0.0)),
                            ('bottom_right', (200.0, 62.11241912841797))]),
               OrderedDict([('top_left',
                             (155.7512969970703, 1.4289875030517578)),
                            ('bottom_right', (200.0, 62.311012268066406))]),
               OrderedDict([('top_left',
                             (174.31436157226562, 2.9148826599121094)),
                            ('bottom_right', (200.0, 61.07978820800781))]),
               OrderedDict([('top_left',
                             (189.35855102539062, 5.093990325927734)),
                            ('bottom_right', (200.0, 60.43505096435547))]),
               OrderedDict([('top_left',
                             (199.89474487304688, 8.387855529785156)),
                            ('bottom_right', (200.0, 61.70392990112305))]),
               OrderedDict([('top_left', (200.0, 10.520776748657227)),
                            ('bottom_right', (200.0, 62.61314392089844))]),
               OrderedDict([('top_left', (200.0, 11.79239273071289)),
                            ('bottom_right', (200.0, 63.15799331665039))]),
               OrderedDict([('top_left', (200.0, 12.502008438110352)),
                            ('bottom_right', (200.0, 63.433509826660156))]),
               OrderedDict([('top_left', (200.0, 12.910751342773438)),
                            ('bottom_right', (200.0, 63.58705139160156))]),
               OrderedDict([('top_left', (200.0, 13.149986267089844)),
                            ('bottom_right', (200.0, 63.673980712890625))]),
               OrderedDict([('top_left', (200.0, 13.29361343383789)),
                            ('bottom_right', (200.0, 63.72489547729492))]),
               OrderedDict([('top_left', (200.0, 13.382709503173828)),
                            ('bottom_right', (200.0, 63.75628662109375))]),
               OrderedDict([('top_left', (200.0, 13.440217971801758)),
                            ('bottom_right', (200.0, 63.776947021484375))]),
               OrderedDict([('top_left', (200.0, 13.47906494140625)),
                            ('bottom_right', (200.0, 63.791587829589844))]),
               OrderedDict([('top_left', (200.0, 13.506628036499023)),
                            ('bottom_right', (200.0, 63.80271911621094))]),
               OrderedDict([('top_left', (200.0, 13.527135848999023)),
                            ('bottom_right', (200.0, 63.811683654785156))]),
               OrderedDict([('top_left', (200.0, 13.543067932128906)),
                            ('bottom_right', (200.0, 63.81919860839844))]),
               OrderedDict([('top_left', (200.0, 13.555877685546875)),
                            ('bottom_right', (200.0, 63.825653076171875))]),
               OrderedDict([('top_left', (200.0, 13.566448211669922)),
                            ('bottom_right', (200.0, 63.83125305175781))]),
               OrderedDict([('top_left', (200.0, 13.575319290161133)),
                            ('bottom_right', (200.0, 63.83612823486328))])])])

I think it is just giving me the bounding boxes. I guess I am missing something again. Is it correct?

@Bartzi
Copy link
Owner

Bartzi commented Mar 19, 2018

Oh okay,
I suggest not mixing the easy and centered datasets, because they are visually very different! Use either easy or centered but not both for the same model.

Your prediction is 'mu' (first key of the dictionary)

@rohit12
Copy link
Author

rohit12 commented Mar 19, 2018

Oh! I get it. I used the model that you provided with on your site. I thought it could give good results on non-FSNS datasets. I know that models aren't really interchangeable across datasets but I thought I could expect a bit more accuracy.

What are the remaining dictionaries for?

@Bartzi
Copy link
Owner

Bartzi commented Mar 19, 2018

You tried to use the FSNS model for your image? Yeah that won't work -.- deep learning systems are unfortunately not good enough for such kind of transfer (at least, yet). If you try with the text recognition model, it should work better 😄

It is actually a dictionary of lists. Each key in the dictionary is the predicted text and the value to the key is a list of bboxes, with top_left and bottom_right corners

@rohit12
Copy link
Author

rohit12 commented Mar 19, 2018

Ah! I thought so! I hoped that the FSNS model could at least recognize latin characters! :D I guess the only thing remaining is to train it on my own dataset.

@Bartzi
Copy link
Owner

Bartzi commented Mar 19, 2018

The FSNS model is able to recognize latin characters, but the smple you are using, is totally different from everything the model has seen before...

@lmolhw5252
Copy link

Hi @rohit12 I got the same problem in your closed issues,
Traceback (most recent call last):
File "chainer_model/train_svhn.py", line 76, in
train_dataset, validation_dataset = curriculum.load_dataset(0)
File "/home/user/PycharmProjects/see/chainer_model/utils/baby_step_curriculum.py", line 38, in load_dataset
train_dataset = self.dataset_class(self.train_curriculum[level], **self.dataset_args)
File "/home/user/PycharmProjects/see/chainer_model/datasets/file_dataset.py", line 31, in init
self.num_timesteps, self.num_labels = (int(i) for i in next(reader))
File "/home/user/PycharmProjects/see/chainer_model/datasets/file_dataset.py", line 31, in
self.num_timesteps, self.num_labels = (int(i) for i in next(reader))
ValueError: invalid literal for int() with base 10: '/home/user/Dataset/SVHN-stnocr/Random_Dataset/train/0.png'
Do you fix it? Can you help me what's the problem is? Thanks a lot!!

@Bartzi
Copy link
Owner

Bartzi commented Mar 23, 2018

you forgot to add metainformation to the label_file. This is described here.

@lmolhw5252
Copy link

@Bartzi Thank you for your reply!!! Do you mean this item?
Add one line to the beginning of each ground truth file: <number of house numbers in image> <max number of chars per house number> (both values need to be separated by a tab character). If you are using the grid dataset it could look like that: 4 4.
And I use Random Dataset with commond
python datasets/svhn/create_svhn_dataset.py /home/user/Dataset/SVHN/train /home/user/Dataset/SVHN/train/digitStruct.json /home/user/Dataset/SVHN-stnocr/Random_Dataset --numbers_per_image 1 10000 2
Where is the ground truth file ?

@rohit12
Copy link
Author

rohit12 commented Mar 23, 2018

@lmolhw5252 If you take a look at the #6, the second or the third comment from Christian will be pointing towards a download. That is the SVHN dataset that he has trained on. Add 4 4 to the train.csv file as the very first line.

This tells the network that there are 4 text regions with a maximum of 4 characters in there.

@lmolhw5252
Copy link

@rohit12 The train.cvs file's format should be this?
4 4 /path/to/image/0.png 1 1 1 1 1 1 1
I can‘t download the dataset by #6 , so I want to change the train file.
Could you please show me your train.cvs? I want to know what is the right format. Thank you very much!!

@rohit12
Copy link
Author

rohit12 commented Mar 29, 2018

Hi Christian,
What dimension image does the network expect? Do I need to normalize my images?

I am getting the following error while training on a new dataset. Do you have any hints for me?

Exception in main training loop: 
Invalid operation is performed in: SoftmaxCrossEntropy (Forward)
Expect: in_types[0].shape[0] == in_types[1].shape[0]
Actual: 2048 != 896
Traceback (most recent call last):
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 206, in update_core
    loss = _calc_loss(self._master, batch)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in _calc_loss
    return model(*in_arrays)
  File "/home/rohit_shinde12194/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
    self.loss = self.lossfun(self.y, t)
  File "/home/rohit_shinde12194/see/chainer/metrics/svhn_softmax_metrics.py", line 26, in calc_loss
    losses.append(F.softmax_cross_entropy(predictions, labels))
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 381, in softmax_cross_entropy
    normalize, cache_score, class_weight, ignore_label, reduce)(x, t)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/function.py", line 235, in __call__
    ret = node.apply(inputs)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/function_node.py", line 230, in apply
    self._check_data_type_forward(in_data)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/function_node.py", line 298, in _check_data_type_forward
    self.check_type_forward(in_type)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/function.py", line 130, in check_type_forward
    self._function.check_type_forward(in_types)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 78, in check_type_forward
    x_type.shape[2:] == t_type.shape[1:],
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/utils/type_check.py", line 524, in expect
    expr.expect()
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/utils/type_check.py", line 482, in expect
    '{0} {1} {2}'.format(left, self.inv, right))
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "chainer/train_svhn.py", line 260, in <module>
    trainer.run()
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/training/trainer.py", line 313, in run
    six.reraise(*sys.exc_info())
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
    update()
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
    self.update_core()
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 206, in update_core
    loss = _calc_loss(self._master, batch)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in _calc_loss
    return model(*in_arrays)
  File "/home/rohit_shinde12194/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
    self.loss = self.lossfun(self.y, t)
  File "/home/rohit_shinde12194/see/chainer/metrics/svhn_softmax_metrics.py", line 26, in calc_loss
    losses.append(F.softmax_cross_entropy(predictions, labels))
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 381, in softmax_cross_entropy
    normalize, cache_score, class_weight, ignore_label, reduce)(x, t)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/function.py", line 235, in __call__
    ret = node.apply(inputs)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/function_node.py", line 230, in apply
    self._check_data_type_forward(in_data)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/function_node.py", line 298, in _check_data_type_forward
    self.check_type_forward(in_type)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/function.py", line 130, in check_type_forward
    self._function.check_type_forward(in_types)
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/functions/loss/softmax_cross_entropy.py", line 78, in check_type_forward
    x_type.shape[2:] == t_type.shape[1:],
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/utils/type_check.py", line 524, in expect
    expr.expect()
  File "/home/rohit_shinde12194/.local/lib/python3.5/site-packages/chainer/utils/type_check.py", line 482, in expect
    '{0} {1} {2}'.format(left, self.inv, right))
chainer.utils.type_check.InvalidType: 
Invalid operation is performed in: SoftmaxCrossEntropy (Forward)
Expect: in_types[0].shape[0] == in_types[1].shape[0]
Actual: 2048 != 896

@Bartzi
Copy link
Owner

Bartzi commented Apr 3, 2018

Yes, the input size is defined by the variable image_size here all images will be resized to this size. They will also be normalized automatically, so there shouldn't be anything extra you need to do.

It could be that you are experiencing the same bug, as somebody else. I fixed this in the repository.

The problem seems to be that the shape of the supplied labels does not fit to the shape of your predictons.

@Bartzi
Copy link
Owner

Bartzi commented Nov 28, 2018

Yes, you'll need to create the curriculum.json on your own, as indicated in the README. You should then point to the location of the file curriculum.json instead of using the files .../train.csv and .../valid.csv
Look here.

@swatiyadav4420
Copy link

swatiyadav4420 commented Jun 2, 2019

I am facing the same issue. I have pulled the updated repository . Kindly help.
I am trying to train my own model using train_text_recognition.py

Traceback (most recent call last):
  File "chainer/train_text_recognition.py", line 299, in <module>
    trainer.run()
  File "/usr/lib/python3.6/dist-packages/chainer/training/trainer.py", line 349, in run
    six.reraise(*exc_info)
  File "/usr/local/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/usr/lib/python3.6/dist-packages/chainer/training/trainer.py", line 316, in run
    update()
  File "/usr/lib/python3.6/dist-packages/chainer/training/updaters/standard_updater.py", line 175, in update
    self.update_core()
  File "/usr/lib/python3.6/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 235, in update_core
    loss = _calc_loss(self._master, batch)
  File "/usr/lib/python3.6/dist-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 269, in _calc_loss
    return model(*in_arrays)
  File "/opt/see/chainer/utils/multi_accuracy_classifier.py", line 45, in __call__
    self.loss = self.lossfun(self.y, t)
  File "/opt/see/chainer/metrics/textrec_metrics.py", line 14, in calc_loss
    loss = self.calc_actual_loss(batch_predictions, None, t)
  File "/opt/see/chainer/metrics/textrec_metrics.py", line 96, in calc_actual_loss
    return F.softmax_cross_entropy(predictions, labels)
  File "/usr/lib/python3.6/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 500, in softmax_cross_entropy
    loss, = func.apply((x, t))
  File "/usr/lib/python3.6/dist-packages/chainer/function_node.py", line 297, in apply
    self._check_data_type_forward(in_data)
  File "/usr/lib/python3.6/dist-packages/chainer/function_node.py", line 400, in _check_data_type_forward
    self.check_type_forward(in_type)
  File "/usr/lib/python3.6/dist-packages/chainer/functions/loss/softmax_cross_entropy.py", line 92, in check_type_forward
    x_type.shape[2:] == t_type.shape[1:],
  File "/usr/lib/python3.6/dist-packages/chainer/utils/type_check.py", line 550, in expect
    expr.expect()
  File "/usr/lib/python3.6/dist-packages/chainer/utils/type_check.py", line 483, in expect
    '{0} {1} {2}'.format(left, self.inv, right))
chainer.utils.type_check.InvalidType:
Invalid operation is performed in: SoftmaxCrossEntropy (Forward)

Expect: x.shape[0] == t.shape[0]
Actual: 1196 != 2080

@Bartzi
Copy link
Owner

Bartzi commented Jun 3, 2019

The shape of the arrays does not fit, as we can see from the error.
What was the command line call you used to start the training? It is likely that your groundtruth file is wrong. Are you using a different set of characters for recognition?

@swatiyadav4420
Copy link

Thanks for the response. Yes , my data has some punctuations. I have added them in character map .

@vamsiadari95
Copy link

Are there any punctuations that are not allowed in the data labels? @Bartzi

@Bartzi
Copy link
Owner

Bartzi commented May 20, 2020

No, there should be no restriction of punctuations. It might be difficult for the network to learn to predict the punctuation, but this might be circumvented with extra training data containing such punctuation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants