Floating point exception #1961

sanjaymeena · 2017-05-02T09:44:37Z

Docker image : docker.paddlepaddle.org/book:0.10.0rc3

I am training my own Semantic role labeler model using modified 06.semantic role code. My training data size is 24000 sentences. If i train using sample data size ~1000 sentences, i can successfully train a model without errors and save all parameters properly. However for 25k sentences i get this following error.

I have been stuck here since many days now : ( . I verified that this may not be due to wrong data as all sentences have been iterated over in previous passes.

lcy-seso · 2017-05-03T03:32:39Z

I can reproduce this error. I am checking it now and will provide the solution as soon as possible.

lcy-seso · 2017-05-03T05:07:29Z

From the last two lines of the log file, the cost is increased from about 590 to 2974, I think you encounter the gradient explosion problem. One possible solution is to use gradient clipping, but I am really sorry, currently, PaddlePaddle has a problem with it (it may not be activated even if you specify the clipping threshold), we are working on it.

Gradient explosion can be avoided by carefully tuning the training parameters, maybe it is affected by the weight initialization, the learning algorithm, the learning rate, the length of the training sample, batch size and so on. All these parameters can be tried.

sanjaymeena · 2017-05-03T05:51:06Z

Hi Icy-seso ,

i had tried changing batch size, amount of training data . Still had gradient explosion. With smaller batch size it takes more passes but still happens .
other parameters are default ones which came in demo train.py.
I will try changing other params.

Gradient clipping seems like an important problem to resolve. Thanks !

lcy-seso · 2017-05-03T07:23:17Z

Thank you and very sorry for the inconvenience, we will fix this as soon as possible. Maybe you can try to reduce the learning rate, or change to another optimization algorithm, that is what I always do.

sanjaymeena · 2017-05-03T10:35:48Z

Thanks Icy-seso, i reduced the learning rate to 1e-2 . what other optimization algorithm would you suggest?

lcy-seso · 2017-05-03T11:09:00Z

Roughly speaking, I like to use Adam and Adadelta when training a complicated RNN model.
In PaddlePaddle, the gradient is not divided by batch size, perhaps, 1e-2 is a large learning rate.
If you use momentum, and the learning rate is 1e-2, it is recommended to explicitly divide the batch size like this:

learning_rate = 1e-2 / batch_size

lcy-seso · 2017-05-03T11:11:32Z

I will check whether gradient clipping is problematic in PaddlePaddle and give you a reply. Thanks very much for your issue.

sanjaymeena · 2017-05-04T02:27:25Z

Thanks, I am trying now with momentum and learning rate as you mentioned.

sanjaymeena · 2017-05-04T08:36:31Z

Hi lcy-seso , Is there a way to get precision, recall, f1 from paddle.v2 .trainer.test or is it possible to get using some another function?

PS : I changed the learning parameter and training so far has seems to be going on. Thanks.

lcy-seso · 2017-05-04T09:12:22Z

You can use the chunk_evalutor to get the precision and recall. But I am sorry to find that evaluators are not included in our doc and I create the issue #2009, we are fixing it.

Maybe you can try to read the explanation of the source codes. I will also try it and give an example later.

Thanks for your question.

sanjaymeena · 2017-05-09T03:13:57Z

Hi Icy-Seso, one another question. I am trying to run inference on about 1000 sentences in using paddle.infer.

for element in test_datax:
        #print element
        probs = paddle.infer(
                output_layer=predict,
                parameters=parameters,
                input=element,
                field='id')
        print probs
        #assert len(probs) == len(element[0][0])
        labels_reverse = {}
        for (k, v) in label_dict.items():
            labels_reverse[v] = k
        pre_lab = [labels_reverse[i] for i in probs]

I can get the results . However inference seems very slow.
I have following questions :

Is there some tuning required from my side?
How to do inference on batch data?

For reference, the word embedding i use has dimension 150 compared to 32 as mentioned in the demo semantic role labeling program.

sanjaymeena · 2017-05-09T03:57:07Z

I have already used the shared link and done inference before. However, it works for only one input element . For more than one i am looping through all the elements and the performance is very slow . I am trying (~3000) sentences . Regards, Sanjay

…

On May 9, 2017, at 11:43 AM, Cao Ying ***@***.***> wrote: As you mentioned the SRL demo, are you running a sequence labeling task and want to get the labeling results? Maybe the labeling results is a little different in reference. Will this be helpful? https://github.com/lcy-seso/book/blob/develop/07.label_semantic_roles/train.py#L205 <https://github.com/lcy-seso/book/blob/develop/07.label_semantic_roles/train.py#L205> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#1961 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AE6EVYc-0i_ilHrJIiXz32rUc4QOJmXOks5r3-DkgaJpZM4NN7PA>.

lcy-seso · 2017-05-09T04:11:53Z

Sorry to delete the last reply because I don't find any problem in your codes. I am asking others for help. @reyoung

sanjaymeena · 2017-05-09T07:08:46Z

Hi Cao, Also it will be very helpful if you can share an example for using the chunk evaluation to calculate precision recall in result metrics :)

lcy-seso · 2017-05-09T09:36:06Z

Sorry for the late reply. I add chunk_evaluator for SRL. https://github.com/lcy-seso/book/blob/add_chunk_evalator/07.label_semantic_roles/train.py#L140

lcy-seso · 2017-05-10T04:42:47Z

hi, @sanjaymeena I really sorry, I find chunk evaluator has bugs in V2 API. #2078 This will cause trouble for you...

sanjaymeena · 2017-05-10T04:50:39Z

@lcy-seso Yeah, I found that it always print 0 in the results . Thanks for proactive measure. Also on the other note, if i want to use a web server with paddle model, do i need to build docker container from scratch?

lcy-seso · 2017-05-10T06:18:52Z

There is a documentation about PaddlePaddle in docker. and we provide built docker images. You can check this page and tell us what we do not do well.

P.S.
I also find the documentation of chunk evaluator is awful #2079.

We are now working on making a time schedule to fix bugs found in V2 API, and I promise to keep replying to this issue to tell you the progress. Thanks so much for your problems and sorry for the inconvenience.

I also create this issue #2080. It will be assigned to an owner to fix this bug.

sanjaymeena · 2017-05-10T07:08:32Z

thanks @lcy-seso looking forward to using paddle for more nlp related projects :)

sanjaymeena · 2017-05-15T07:36:05Z

@lcy-seso Hi, is there any update on inference speed from model? It will be great if this checklist point is in a higher priority. Thanks for your help!

lcy-seso · 2017-05-15T08:20:03Z

@sanjaymeena I will fix this.

lcy-seso · 2017-05-17T10:19:16Z

hi @sanjaymeena , sorry for the late reply. I have updated the SRL demo. You can check this.

One reason that the inferring speed is much slower than before is that if paddle.infer is called, the model is loaded again and again at every test batch.

You can initialize the model by https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L242

inferer = paddle.inference.Inference(
                 output_layer=predict, parameters=parameters)

then test batch by batch https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L206

probs = inferer.infer(input=test_data, field='id')

I use model saved by the first pass and test the entire test set which contains 5267 samples.
To avoid loading the parameters in every tested batch, the inferring time is reduced from 96.5s to 78.19s.

There may still exist some other things should be optimized. I will keep on profiling. Thanks very much for your problem.

sanjaymeena · 2017-05-18T02:23:33Z

@lcy-seso thank you so much!! the speed improvement for me is almost ~100 times as my data batch contains one sentence each :)

sanjaymeena · 2017-05-18T07:50:11Z

@lcy-seso is it possible to calculate precision,recall,f1 given gold_labels in IOB and predicted labels in IOB? This is after model has been created.

lcy-seso · 2017-05-18T08:43:40Z

@sanjaymeena This PR #2165 fix the bug of chunk evaluator, but we haven't merged it yet.

After it is merged, precision, recall and f1 can be easily printed. We will try our best to merge it as soon as possible, and gives an example on how to use it.

sanjaymeena · 2017-05-22T06:40:35Z

@lcy-seso Thank you for your help . I am also using demo/sequence_labelling project.
Docker image : paddle:0.10.0

The demo program can run and save the model just following the code. However i am unable to locate how to load the saved model and use it on test data :

File : srl_linear_crf.py

inputs(word, pos, chunk, features)
outputs(crf)

In other words, i am facing problem on how to do prediction using the saved model.

lcy-seso · 2017-05-22T10:44:13Z

Hi, @sanjaymeena I am not quite sure about how do you use srl_linear_crf.py. It seems that this file does not in PaddlePaddle‘s repo.

To predict by using the saved model, there are usually three steps (you can check this):

define the topology for inference, (line 233 ~ 234)
https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L233
load the saved model (line 240 ~ 243)
https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L240
infer a batch (line 206)
https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L206

But, as I known, you have already known this script, I think I don't unstandstand your question well...

Is it a problem with the docker environment or a problem about how to use the paddle.infer interface?

lcy-seso · 2017-05-22T10:59:28Z

In the directory demo/semantic_role_labeling, there exists two versions of SRL demo written in V1 and V2 PaddlePaddle API interface. (And I think we should clean the old version. )

the v1 API version

dataprovider.py, a data provider used in training
db_lstm.py, define the neural network for training and inferring
predict.py, a python script to load the saved model and then predict the test sample
test.sh, a script to run training, becaues in V1 version, Paddle is a standalone executable, not a python module.
train.sh, a script to run inferring, becaues in V1 version, Paddle is a standalone executable, not a python module.

the v2 API version

api_train_v2.py, a python script, including training and inferring

sanjaymeena · 2017-05-22T12:20:47Z

@lcy-seso I am sorry. What i meant is i am using
sequence tagging : https://github.com/PaddlePaddle/Paddle/tree/develop/demo/sequence_tagging

I can use the provided linear_crf.py and train_linear.sh to train my own sequence labeler model. However i cannot find any code for prediction or loading the model :(

I am planning on using the model created by sequence_tagging for semantic role labeler project. So it was confusing in my explanation.

lcy-seso · 2017-05-25T02:39:17Z

Very sorry for the late reply. Sorry I did notice this demo before. I am checking it.

lcy-seso · 2017-05-25T02:50:46Z

P.S. some updates:

chunk evaluator has been merged, this is an example: https://github.com/PaddlePaddle/models/blob/develop/sequence_tagging_for_ner/ner.py#L178
error clipping is also fixed, my parterner write this guide.
A temporary guide to stablize models' training #2262

sanjaymeena · 2017-05-25T04:20:46Z

@lcy-seso thanks for your help :) I will try the sequence tagging for ner for my work :) It is also sequence tagging which i need for identifying predicates in a sentence.

lcy-seso · 2017-05-25T08:05:04Z

hi @sanjaymeena sorry for the inconvenience that there actually exists two versions of PaddlePaddle demos in our repo (the demo directory). I try to make some explanations.

the demo/sequence_tagging directory only provides a V1 version demo. demo/semantic_role_labeling is a SRL demo which provides both V1 and V2 version.
In the V1 version, PaddlePaddle is a standalone executable, while in V2 version, we make it a Python module.
Because, the V2 APIs is still under enhancement, currently we do not delete demos in V1 version. Sorry for the inconvenience.
Indeed, there is no testing or decoding process provided in demo/sequence_tagging directory.

To test or to decode (get the tagging result), there are two solutions (I recommend the second one):

rewrite the demo in V2 APIs as SRL demo does, and use the V2 paddle.infer interface. This causes extra works.
If you already use this demo by running sh train.sh to train the model. There is a very simple way to test which requires almost no extract work, but a simple postprocessing.
You can check this:
develop...lcy-seso:add_infer_for_sequence_tagging

There are some explanations of the second way:

In the V1 version, PaddlePaddle is a standalone executable, you can use the following script to test, which goes as follows:
```
paddle train \
       --job=test \
       --config=rnn_crf.py \
       --parallel_nn=1 \
       --model_list="model.list" \
       --use_gpu=true \
       --predict_output_dir="./" \
       --config_args="is_predict=1" \
       --trainer_count=1 \
2>&1 | tee 'test.log'
```
- explanations for the parameters:
  1. job=test: start a testing process which only executes the forwards
  2. config: path of the network configuration file
  3. parallel_nn=1: this is optional, currently, the CRF layer is not implemented in GPU, parallel_nn means some layers can run in CUP while others run in GPU.
  4. model_list: a plain text file, in which is the path of the trained model. It may be like this: ./output/model/pass-00000
  5. use_gpu: whether to use gpu or not
  6. predict_output_dir=A: specify this parameter, the predict results will be saved to the file A/rank-00000
  7. trainer_count: using how many threads to test.

lcy-seso · 2017-05-25T08:21:02Z

Also about predicting for demo/sequence_tagging:

by calling the following function, you can print output of any layer defined in the network:

outputs(layer1_in_the_network, layer2_in_the_network, ...)

Please check this line:
https://github.com/lcy-seso/Paddle/blob/add_infer_for_sequence_tagging/demo/sequence_tagging/rnn_crf.py#L105

the prediction results will be printed to the file A/rank-00000 (A is a directory specified by predict_output_dir )
A/rank-00000 looks like this:
```
10;
11;
11;
10;
11;
11;
20;
10;
20;
10;
```
There is a problem that the prediction results in A/rank-00000 are flattened. Each line of A/rank-00000 is the index of tagging label in one time step. To get the final tagging results, you still need some simple post-processing.
format of file A/rank-00000
- each line of A/rank-00000 is the outputs of all the layers specified in outputs. Multiple outputs are separated by ";". If output of a layer is a vector, each element of the vector is separated by " " (a space).
- If you put two layers in outputs, like this:
```
outputs(crf_decoding, crf_input)
```
  A/rank-00000 looks like this:
```
10;1.66071 0.441494 1.11911 -0.0893781 -0.99124 -1.61581 -1.05307 -1.56547 -1.21166 -1.29977 7.85641 4.47389 -1.51868 -1.88707 -1.66359 -1.6472 -0.0668767 -0.473394 -1.64615 -1.28447 1.11682 -0.0599329 2.5225;
```

sanjaymeena · 2017-05-26T02:59:55Z

Hi @lcy-seso thank you for the explained answer. My goal was to do real time prediction on each user input. I was interested in this as this sequence tagger demo (Chunker) uses Part of Speech (POS) as features. Thanks for your help :) I will try to use the NER Sequence Tagger for now :)

wanghaoshuang · 2017-08-03T12:26:24Z

Closing this issue due to inactivity, feel free to reopen it.

lcy-seso self-assigned this May 2, 2017

lcy-seso mentioned this issue May 3, 2017

Enable gradient clipping. #1894

Closed

lcy-seso added the enhancement label May 9, 2017

lcy-seso mentioned this issue May 9, 2017

Summary of Bugs of V2 APIs PaddlePaddle/models#33

Closed

11 tasks

luotao1 added this to 已有BUG in V2 API Enhancement May 9, 2017

luotao1 added Bug and removed enhancement labels May 9, 2017

kuke mentioned this issue May 25, 2017

A temporary guide to stablize models' training #2262

Closed

lcy-seso moved this from BUG to 已完成 in V2 API Enhancement Jun 7, 2017

wanghaoshuang closed this as completed Aug 3, 2017

Floating point exception #1961

Floating point exception #1961

Comments

sanjaymeena commented May 2, 2017

lcy-seso commented May 3, 2017

lcy-seso commented May 3, 2017

sanjaymeena commented May 3, 2017

lcy-seso commented May 3, 2017 • edited Loading

sanjaymeena commented May 3, 2017

lcy-seso commented May 3, 2017

lcy-seso commented May 3, 2017

sanjaymeena commented May 4, 2017

sanjaymeena commented May 4, 2017

lcy-seso commented May 4, 2017 • edited Loading

sanjaymeena commented May 9, 2017 • edited Loading

sanjaymeena commented May 9, 2017 via email

lcy-seso commented May 9, 2017 • edited Loading

sanjaymeena commented May 9, 2017

lcy-seso commented May 9, 2017

lcy-seso commented May 10, 2017

sanjaymeena commented May 10, 2017

lcy-seso commented May 10, 2017 • edited Loading

sanjaymeena commented May 10, 2017

sanjaymeena commented May 15, 2017

lcy-seso commented May 15, 2017

lcy-seso commented May 17, 2017 • edited Loading

sanjaymeena commented May 18, 2017

sanjaymeena commented May 18, 2017 • edited Loading

lcy-seso commented May 18, 2017

sanjaymeena commented May 22, 2017 • edited Loading

lcy-seso commented May 22, 2017

lcy-seso commented May 22, 2017

sanjaymeena commented May 22, 2017 • edited Loading

lcy-seso commented May 25, 2017

lcy-seso commented May 25, 2017

sanjaymeena commented May 25, 2017

lcy-seso commented May 25, 2017 • edited Loading

lcy-seso commented May 25, 2017 • edited Loading

sanjaymeena commented May 26, 2017

wanghaoshuang commented Aug 3, 2017

lcy-seso commented May 3, 2017 •

edited

Loading

lcy-seso commented May 4, 2017 •

edited

Loading

sanjaymeena commented May 9, 2017 •

edited

Loading

lcy-seso commented May 9, 2017 •

edited

Loading

lcy-seso commented May 10, 2017 •

edited

Loading

lcy-seso commented May 17, 2017 •

edited

Loading

sanjaymeena commented May 18, 2017 •

edited

Loading

sanjaymeena commented May 22, 2017 •

edited

Loading

sanjaymeena commented May 22, 2017 •

edited

Loading

lcy-seso commented May 25, 2017 •

edited

Loading

lcy-seso commented May 25, 2017 •

edited

Loading