Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating point exception #1961

Closed
sanjaymeena opened this issue May 2, 2017 · 36 comments
Closed

Floating point exception #1961

sanjaymeena opened this issue May 2, 2017 · 36 comments
Assignees
Labels

Comments

@sanjaymeena
Copy link

Docker image : docker.paddlepaddle.org/book:0.10.0rc3

I am training my own Semantic role labeler model using modified 06.semantic role code. My training data size is 24000 sentences. If i train using sample data size ~1000 sentences, i can successfully train a model without errors and save all parameters properly. However for 25k sentences i get this following error.
image

I have been stuck here since many days now : ( . I verified that this may not be due to wrong data as all sentences have been iterated over in previous passes.

@lcy-seso lcy-seso self-assigned this May 2, 2017
@lcy-seso
Copy link
Contributor

lcy-seso commented May 3, 2017

I can reproduce this error. I am checking it now and will provide the solution as soon as possible.

@lcy-seso
Copy link
Contributor

lcy-seso commented May 3, 2017

From the last two lines of the log file, the cost is increased from about 590 to 2974, I think you encounter the gradient explosion problem. One possible solution is to use gradient clipping, but I am really sorry, currently, PaddlePaddle has a problem with it (it may not be activated even if you specify the clipping threshold), we are working on it.

Gradient explosion can be avoided by carefully tuning the training parameters, maybe it is affected by the weight initialization, the learning algorithm, the learning rate, the length of the training sample, batch size and so on. All these parameters can be tried.

@sanjaymeena
Copy link
Author

Hi Icy-seso ,

  • i had tried changing batch size, amount of training data . Still had gradient explosion. With smaller batch size it takes more passes but still happens .
  • other parameters are default ones which came in demo train.py.
  • I will try changing other params.

Gradient clipping seems like an important problem to resolve. Thanks !

@lcy-seso
Copy link
Contributor

lcy-seso commented May 3, 2017

Thank you and very sorry for the inconvenience, we will fix this as soon as possible. Maybe you can try to reduce the learning rate, or change to another optimization algorithm, that is what I always do.

@sanjaymeena
Copy link
Author

Thanks Icy-seso, i reduced the learning rate to 1e-2 . what other optimization algorithm would you suggest?

@lcy-seso
Copy link
Contributor

lcy-seso commented May 3, 2017

  • Roughly speaking, I like to use Adam and Adadelta when training a complicated RNN model.
  • In PaddlePaddle, the gradient is not divided by batch size, perhaps, 1e-2 is a large learning rate.
  • If you use momentum, and the learning rate is 1e-2, it is recommended to explicitly divide the batch size like this:
learning_rate = 1e-2 / batch_size

@lcy-seso
Copy link
Contributor

lcy-seso commented May 3, 2017

I will check whether gradient clipping is problematic in PaddlePaddle and give you a reply. Thanks very much for your issue.

@sanjaymeena
Copy link
Author

Thanks, I am trying now with momentum and learning rate as you mentioned.

@sanjaymeena
Copy link
Author

Hi lcy-seso , Is there a way to get precision, recall, f1 from paddle.v2 .trainer.test or is it possible to get using some another function?

PS : I changed the learning parameter and training so far has seems to be going on. Thanks.

@lcy-seso
Copy link
Contributor

lcy-seso commented May 4, 2017

You can use the chunk_evalutor to get the precision and recall. But I am sorry to find that evaluators are not included in our doc and I create the issue #2009, we are fixing it.

Maybe you can try to read the explanation of the source codes. I will also try it and give an example later.

Thanks for your question.

@sanjaymeena
Copy link
Author

sanjaymeena commented May 9, 2017

Hi Icy-Seso, one another question. I am trying to run inference on about 1000 sentences in using paddle.infer.

for element in test_datax:
        #print element
        probs = paddle.infer(
                output_layer=predict,
                parameters=parameters,
                input=element,
                field='id')
        print probs
        #assert len(probs) == len(element[0][0])
        labels_reverse = {}
        for (k, v) in label_dict.items():
            labels_reverse[v] = k
        pre_lab = [labels_reverse[i] for i in probs]

I can get the results . However inference seems very slow.
I have following questions :

  • Is there some tuning required from my side?
  • How to do inference on batch data?

For reference, the word embedding i use has dimension 150 compared to 32 as mentioned in the demo semantic role labeling program.

@sanjaymeena
Copy link
Author

sanjaymeena commented May 9, 2017 via email

@lcy-seso
Copy link
Contributor

lcy-seso commented May 9, 2017

Sorry to delete the last reply because I don't find any problem in your codes. I am asking others for help. @reyoung

@sanjaymeena
Copy link
Author

Hi Cao, Also it will be very helpful if you can share an example for using the chunk evaluation to calculate precision recall in result metrics :)

@luotao1 luotao1 added this to 已有BUG in V2 API Enhancement May 9, 2017
@luotao1 luotao1 added Bug and removed enhancement labels May 9, 2017
@lcy-seso
Copy link
Contributor

lcy-seso commented May 9, 2017

Sorry for the late reply. I add chunk_evaluator for SRL. https://github.com/lcy-seso/book/blob/add_chunk_evalator/07.label_semantic_roles/train.py#L140

@lcy-seso
Copy link
Contributor

hi, @sanjaymeena I really sorry, I find chunk evaluator has bugs in V2 API. #2078 This will cause trouble for you...

@sanjaymeena
Copy link
Author

@lcy-seso Yeah, I found that it always print 0 in the results . Thanks for proactive measure. Also on the other note, if i want to use a web server with paddle model, do i need to build docker container from scratch?

@lcy-seso
Copy link
Contributor

lcy-seso commented May 10, 2017

There is a documentation about PaddlePaddle in docker. and we provide built docker images. You can check this page and tell us what we do not do well.

P.S.
I also find the documentation of chunk evaluator is awful #2079.

We are now working on making a time schedule to fix bugs found in V2 API, and I promise to keep replying to this issue to tell you the progress. Thanks so much for your problems and sorry for the inconvenience.

I also create this issue #2080. It will be assigned to an owner to fix this bug.

@sanjaymeena
Copy link
Author

thanks @lcy-seso looking forward to using paddle for more nlp related projects :)

@sanjaymeena
Copy link
Author

@lcy-seso Hi, is there any update on inference speed from model? It will be great if this checklist point is in a higher priority. Thanks for your help!

@lcy-seso
Copy link
Contributor

@sanjaymeena I will fix this.

@lcy-seso
Copy link
Contributor

lcy-seso commented May 17, 2017

hi @sanjaymeena , sorry for the late reply. I have updated the SRL demo. You can check this.

One reason that the inferring speed is much slower than before is that if paddle.infer is called, the model is loaded again and again at every test batch.

  1. You can initialize the model by https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L242
inferer = paddle.inference.Inference(
                 output_layer=predict, parameters=parameters)
  1. then test batch by batch https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L206
probs = inferer.infer(input=test_data, field='id')

I use model saved by the first pass and test the entire test set which contains 5267 samples.
To avoid loading the parameters in every tested batch, the inferring time is reduced from 96.5s to 78.19s.

There may still exist some other things should be optimized. I will keep on profiling. Thanks very much for your problem.

@sanjaymeena
Copy link
Author

@lcy-seso thank you so much!! the speed improvement for me is almost ~100 times as my data batch contains one sentence each :)

@sanjaymeena
Copy link
Author

sanjaymeena commented May 18, 2017

@lcy-seso is it possible to calculate precision,recall,f1 given gold_labels in IOB and predicted labels in IOB? This is after model has been created.

@lcy-seso
Copy link
Contributor

@sanjaymeena This PR #2165 fix the bug of chunk evaluator, but we haven't merged it yet.

After it is merged, precision, recall and f1 can be easily printed. We will try our best to merge it as soon as possible, and gives an example on how to use it.

@sanjaymeena
Copy link
Author

sanjaymeena commented May 22, 2017

@lcy-seso Thank you for your help . I am also using demo/sequence_labelling project.
Docker image : paddle:0.10.0

The demo program can run and save the model just following the code. However i am unable to locate how to load the saved model and use it on test data :

File : srl_linear_crf.py

inputs(word, pos, chunk, features)
outputs(crf)

In other words, i am facing problem on how to do prediction using the saved model.

@lcy-seso
Copy link
Contributor

Hi, @sanjaymeena I am not quite sure about how do you use srl_linear_crf.py. It seems that this file does not in PaddlePaddle‘s repo.

To predict by using the saved model, there are usually three steps (you can check this):

  1. define the topology for inference, (line 233 ~ 234)
    https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L233

  2. load the saved model (line 240 ~ 243)
    https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L240

  3. infer a batch (line 206)
    https://github.com/PaddlePaddle/Paddle/blob/develop/demo/semantic_role_labeling/api_train_v2.py#L206

But, as I known, you have already known this script, I think I don't unstandstand your question well...

Is it a problem with the docker environment or a problem about how to use the paddle.infer interface?

@lcy-seso
Copy link
Contributor

In the directory demo/semantic_role_labeling, there exists two versions of SRL demo written in V1 and V2 PaddlePaddle API interface. (And I think we should clean the old version. )

  1. the v1 API version
  • dataprovider.py, a data provider used in training
  • db_lstm.py, define the neural network for training and inferring
  • predict.py, a python script to load the saved model and then predict the test sample
  • test.sh, a script to run training, becaues in V1 version, Paddle is a standalone executable, not a python module.
  • train.sh, a script to run inferring, becaues in V1 version, Paddle is a standalone executable, not a python module.
  1. the v2 API version
  • api_train_v2.py, a python script, including training and inferring

@sanjaymeena
Copy link
Author

sanjaymeena commented May 22, 2017

@lcy-seso I am sorry. What i meant is i am using
sequence tagging : https://github.com/PaddlePaddle/Paddle/tree/develop/demo/sequence_tagging

I can use the provided linear_crf.py and train_linear.sh to train my own sequence labeler model. However i cannot find any code for prediction or loading the model :(

I am planning on using the model created by sequence_tagging for semantic role labeler project. So it was confusing in my explanation.

@lcy-seso
Copy link
Contributor

Very sorry for the late reply. Sorry I did notice this demo before. I am checking it.

@lcy-seso
Copy link
Contributor

P.S. some updates:

  1. chunk evaluator has been merged, this is an example: https://github.com/PaddlePaddle/models/blob/develop/sequence_tagging_for_ner/ner.py#L178
  2. error clipping is also fixed, my parterner write this guide.
    A temporary guide to stablize models' training #2262

@sanjaymeena
Copy link
Author

@lcy-seso thanks for your help :) I will try the sequence tagging for ner for my work :) It is also sequence tagging which i need for identifying predicates in a sentence.

@lcy-seso
Copy link
Contributor

lcy-seso commented May 25, 2017

hi @sanjaymeena sorry for the inconvenience that there actually exists two versions of PaddlePaddle demos in our repo (the demo directory). I try to make some explanations.

  1. the demo/sequence_tagging directory only provides a V1 version demo. demo/semantic_role_labeling is a SRL demo which provides both V1 and V2 version.
  2. In the V1 version, PaddlePaddle is a standalone executable, while in V2 version, we make it a Python module.
  3. Because, the V2 APIs is still under enhancement, currently we do not delete demos in V1 version. Sorry for the inconvenience.
  4. Indeed, there is no testing or decoding process provided in demo/sequence_tagging directory.

To test or to decode (get the tagging result), there are two solutions (I recommend the second one):

  1. rewrite the demo in V2 APIs as SRL demo does, and use the V2 paddle.infer interface. This causes extra works.

  2. If you already use this demo by running sh train.sh to train the model. There is a very simple way to test which requires almost no extract work, but a simple postprocessing.
    You can check this:
    develop...lcy-seso:add_infer_for_sequence_tagging

There are some explanations of the second way:

  • In the V1 version, PaddlePaddle is a standalone executable, you can use the following script to test, which goes as follows:
    paddle train \
           --job=test \
           --config=rnn_crf.py \
           --parallel_nn=1 \
           --model_list="model.list" \
           --use_gpu=true \
           --predict_output_dir="./" \
           --config_args="is_predict=1" \
           --trainer_count=1 \
    2>&1 | tee 'test.log'
    • explanations for the parameters:
      1. job=test: start a testing process which only executes the forwards
      2. config: path of the network configuration file
      3. parallel_nn=1: this is optional, currently, the CRF layer is not implemented in GPU, parallel_nn means some layers can run in CUP while others run in GPU.
      4. model_list: a plain text file, in which is the path of the trained model. It may be like this: ./output/model/pass-00000
      5. use_gpu: whether to use gpu or not
      6. predict_output_dir=A: specify this parameter, the predict results will be saved to the file A/rank-00000
      7. trainer_count: using how many threads to test.

@lcy-seso
Copy link
Contributor

lcy-seso commented May 25, 2017

Also about predicting for demo/sequence_tagging:

  • by calling the following function, you can print output of any layer defined in the network:
outputs(layer1_in_the_network, layer2_in_the_network, ...)

Please check this line:
https://github.com/lcy-seso/Paddle/blob/add_infer_for_sequence_tagging/demo/sequence_tagging/rnn_crf.py#L105

  • the prediction results will be printed to the file A/rank-00000 (A is a directory specified by predict_output_dir )

  • A/rank-00000 looks like this:

    10;
    11;
    11;
    10;
    11;
    11;
    20;
    10;
    20;
    10;
    

    There is a problem that the prediction results in A/rank-00000 are flattened. Each line of A/rank-00000 is the index of tagging label in one time step. To get the final tagging results, you still need some simple post-processing.

  • format of file A/rank-00000

    • each line of A/rank-00000 is the outputs of all the layers specified in outputs. Multiple outputs are separated by ";". If output of a layer is a vector, each element of the vector is separated by " " (a space).
    • If you put two layers in outputs, like this:
      outputs(crf_decoding, crf_input)
      A/rank-00000 looks like this:
      10;1.66071 0.441494 1.11911 -0.0893781 -0.99124 -1.61581 -1.05307 -1.56547 -1.21166 -1.29977 7.85641 4.47389 -1.51868 -1.88707 -1.66359 -1.6472 -0.0668767 -0.473394 -1.64615 -1.28447 1.11682 -0.0599329 2.5225;
      

@sanjaymeena
Copy link
Author

Hi @lcy-seso thank you for the explained answer. My goal was to do real time prediction on each user input. I was interested in this as this sequence tagger demo (Chunker) uses Part of Speech (POS) as features. Thanks for your help :) I will try to use the NER Sequence Tagger for now :)

@lcy-seso lcy-seso moved this from BUG to 已完成 in V2 API Enhancement Jun 7, 2017
@wanghaoshuang
Copy link
Contributor

Closing this issue due to inactivity, feel free to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants