Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix on dev branch: cpu batchnnorm num of args does not match #589

Closed
wants to merge 2 commits into from

Conversation

chrishkchris
Copy link
Contributor

Maybe there are some errors in DNNL needed to be fixed before functioning correctly.

I currently have solved a build error due to reassign of variable, so now I can built the code. The c test case test_singa.o and python mlp.py can now pass.

There are two more errors I encountered but I did not solve yet

  1. mnist_cnn (after changing to cpu), it hangs.
  2. test_operation.py
ubuntu@ip-172-31-24-48:~/singa/test/python$ python3 test_operation.py
.......................................Segmentation fault (core dumped)

@chrishkchris

This comment has been minimized.

@nudles
Copy link
Member

nudles commented Feb 3, 2020

error: 'transform' is not a member of 'std'
Did you include #include <algorithm>

@chrishkchris

This comment has been minimized.

@chrishkchris

This comment has been minimized.

@chrishkchris

This comment has been minimized.

@chrishkchris
Copy link
Contributor Author

Maybe there are some errors in DNNL needed to be fixed before functioning correctly.
I currently have solved a build error due to reassign of variable, so now I can built the code. The c test case test_singa.o and python mlp.py can now pass.
There are two more errors I encountered but I did not solve yet

mnist_cnn (after changing to cpu), it hangs.
test_operation.py

ubuntu@ip-172-31-24-48:~/singa/test/python$ python3 test_operation.py
.......................................Segmentation fault (core dumped)

@dcslin
I tried it in panda 5 using docker, it was the same as that in AWS without docker

root@04484c49d78c:~/singa/test/python# python3 test_operation.py
.......................................Segmentation fault (core dumped)

@dcslin
Copy link
Member

dcslin commented Feb 3, 2020

HI @chrishkchris , there is error in TestPythonOperation.test_batchnorm2d_cpu. I am checking

@chrishkchris
Copy link
Contributor Author

HI @chrishkchris , there is error in TestPythonOperation.test_batchnorm2d_cpu. I am checking
@dcslin
the batchnorm2d_gpu also has error of dnnl, seems due to the control logic

ubuntu@ip-172-31-35-63:~/singa/test/python$ python3 test_operation.py TestPythonOperation.test_batchnorm2d_gpu
terminate called after throwing an instance of 'dnnl::error'
  what():  attempt to use uninitialized object
Aborted (core dumped)

@dcslin
Copy link
Member

dcslin commented Feb 3, 2020

Hi @chris, some errors are due to enabling both USE DNNL and USE CUDNN. I fixed them in #590

@dcslin
Copy link
Member

dcslin commented Feb 3, 2020

Thus test/python/test_api.py is passing with no error, meaning C++ side should be fine. As for test_operation.py, I think there are some changes in autograd.py...

@chrishkchris
Copy link
Contributor Author

Thus test/python/test_api.py is passing with no error, meaning C++ side should be fine. As for test_operation.py, I think there are some changes in autograd.py...

@dcslin great! thank you so much!

@nudles merging this PR #589 and PR #590 will fix two errors, I will re-test after merging the two bug fix PRs.

@chrishkchris chrishkchris changed the title [WIP] hotfix on dev branch (DNNL) hotfix on dev branch (remove of duplicate vatiable) Feb 3, 2020
@chrishkchris chrishkchris changed the title hotfix on dev branch (remove of duplicate vatiable) hotfix on dev branch (remove of duplicate variable) Feb 3, 2020
@chrishkchris
Copy link
Contributor Author

the build error due to duplicate of variable has been resolved in #590, and hence this PR can be closed. I will retest the code.

@chrishkchris chrishkchris changed the title hotfix on dev branch (remove of duplicate variable) bugfix on dev branch: cpu batchnnorm num of args does not match Feb 9, 2020
@chrishkchris
Copy link
Contributor Author

chrishkchris commented Feb 9, 2020

There is an new error in cpu batchnorm, where the number of output arguments do not match with the python autograd code. This PR simply fix the issue.

root@b8dfa5473dea:~/dcsysh/singa/test/python# python3 test_operation.py
................................................................E............................................................
======================================================================
ERROR: test_batchnorm2d_cpu (__main__.TestPythonOperation)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_operation.py", line 286, in test_batchnorm2d_cpu
    y = batchnorm_0(cpu_input_tensor)
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 1502, in __call__
    self.running_var,
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 1581, in batchnorm_2d
    return _BatchNorm2d(handle, running_mean, running_var)(x, scale, bias)[0]
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 244, in __call__
    return self._do_forward(*xs)
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 294, in _do_forward
    ys = self.forward(*xs)
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 1532, in forward
    self.running_var)
ValueError: too many values to unpack (expected 3)

----------------------------------------------------------------------
Ran 125 tests in 0.766s

FAILED (errors=1)

root@b8dfa5473dea:~/dcsysh/singa/examples/autograd# python3 resnet.py
Start intialization............
  0%|                                                                                                                                                                                  | 0/100 [00:26<?, ?it/s]
Traceback (most recent call last):
  File "resnet.py", line 278, in <module>
    x = model(tx)
  File "resnet.py", line 179, in __call__
    x = self.bn1(x)
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 1502, in __call__
    self.running_var,
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 1581, in batchnorm_2d
    return _BatchNorm2d(handle, running_mean, running_var)(x, scale, bias)[0]
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 244, in __call__
    return self._do_forward(*xs)
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 294, in _do_forward
    ys = self.forward(*xs)
  File "/root/dcsysh/singa/build/python/singa/autograd.py", line 1532, in forward
    self.running_var)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants