Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipdevpoll multiprocess mode broken in v4.8.0 #1618

Closed
Jo-Oiongen opened this issue Nov 24, 2017 · 6 comments
Closed

ipdevpoll multiprocess mode broken in v4.8.0 #1618

Jo-Oiongen opened this issue Nov 24, 2017 · 6 comments

Comments

@Jo-Oiongen
Copy link

Jo-Oiongen commented Nov 24, 2017

Hi!

Just upgraded from 4.7.3 to 4.8.0 and now ipdepoll no longer spawns to multiple processes.

In /etc/nav/init.d/ipdevpoll I have the following setting enabled.

# Run time options to ipdevpoll. E.g. -m to enable multiprocess mode.
OPTIONS="-m 10"

Since there no longer is multiprocessing going on I'm loosing out on lots of collected data and graphs.

nav status gives:

Up: activeip alertengine dbclean emailreports eventengine ipdevpoll logengine mactrace maintengine navstats netbiostracker pping psuwatch servicemon smsd snmptrapd thresholdmon topology
@lunkwill42
Copy link
Member

lunkwill42 commented Nov 24, 2017

Multithreading and multiprocess are two different concepts, so I suppose you are referring to multiprocess, since this is what you are enabling with the -m option.

How, exactly, have you confirmed that ipdevpoll is not running multiple processes?

(Have you read https://nav.uninett.no/doc/4.8/reference/ipdevpoll.html#multiprocess-mode , BTW?)

@Jo-Oiongen
Copy link
Author

By observing difference in behaviour between the previous 4.7.3 install and the now upgraded install 4.8.0

I'm running two seperate ssh sessions where one is running top. When i do "nav start ipdevpoll" I can see in top ipdevpolld spanning into seperate prosesses. These dissapear as fast as they appear and I'm left with a single devpolld process running now and then at 100%. Where earlier I had ten ipdevpolld processes running all the time using 50 - 100% of a core each. Looking into NAV and different graphs there is missing data all over the place, just like it used to do before I enabled "-m".

But, further troubleshoting using "pstree -g" shows ipdevpolld branching out to ten processes, all with the same PID(?). I'm a bit on thin ice here in regards what pstree actually is showing.

@lunkwill42
Copy link
Member

Troubleshooting rule no. 1: Check the logs; in this case: ipdevpoll.log

@Jo-Oiongen
Copy link
Author

Jo-Oiongen commented Nov 24, 2017

The log file grovs with 200+MB every minute and contains a lot of:

2017-11-24 15:23:36,059 [4454] [ERROR schedule.netboxjobscheduler] [statuscheck IPREMOVED] Unhandled exception raised by JobHandler
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nav/ipdevpoll/daemon.py", line 99, in run
    reactor.run()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1192, in run
    self.mainLoop()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1201, in mainLoop
    self.runUntilCurrent()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in runUntilCurrent
    call.func(*call.args, **call.kw)
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/nav/ipdevpoll/schedule.py", line 140, in run_job
    interval=self.job.interval)
  File "/usr/lib/python2.7/dist-packages/nav/ipdevpoll/pool.py", line 289, in execute_job
    plugins=plugins, interval=interval)
  File "/usr/lib/python2.7/dist-packages/nav/ipdevpoll/pool.py", line 273, in _execute
    deferred = worker.execute(self.serial, command, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nav/ipdevpoll/pool.py", line 225, in execute
    deferred = self.process.callRemote(command, serial=serial, **kwargs)
  File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 821, in callRemote
    return co._doCommand(self)
  File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 1778, in _doCommand
    self.requiresAnswer)
  File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 752, in _sendBoxCommand
    box._sendTo(self.boxSender)
  File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 577, in _sendTo
    proto.sendBox(self)
  File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 2153, in sendBox
    self.transport.write(box.serialize())
  File "/usr/lib/python2.7/dist-packages/twisted/protocols/amp.py", line 551, in serialize
    "Unicode value for key %r not allowed: %r" % (k, v))
exceptions.TypeError: Unicode value for key 'plugins' not allowed: u'\x00\tlinkstate\x00\x06entity\x00\x07modules\x00\x03bgp\x00\x03poe'

To upgrade I had to run apt-get install python-pynetsnmp-2 Could this be the source of problems?

@lunkwill42
Copy link
Member

No, it seems you've found a bug related to our work on porting NAV to Python 3, which we didn't find ourselves - and the problem seems to be specifically located in the code that marshals data between the master process and the child processes.

I don't have time to debug this ATM, it will unfortunately have to wait until Monday - but I will assign this to the developer who worked on the multiprocess code changes for 4.7, he should know this best.

Until then, I'm afraid it seems you may have to make do with single-process mode.

(I took the liberty of editing your comment markup to make things a bit more legible)

@lunkwill42 lunkwill42 changed the title ipdevpoll no longer multithreads after upgrade to 4.8.0 ipdevpoll multiprocess mode broken in v4.8.0 Nov 24, 2017
lunkwill42 added a commit that referenced this issue Nov 27, 2017
Tested and found ok. Fixes #1618 and adds useful tests.
@lunkwill42
Copy link
Member

This was fixed by #1620 - specifically the commit 61ff008 . NAV 4.8.1 will be released on Thursday, at the earliest, but the patch can be applied manually if desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants