Skip to content

implement netcat operating mode#1222

Open
f97ada87 wants to merge 1 commit intoBitmessage:v0.6from
f97ada87:opmode-netcat-v2
Open

implement netcat operating mode#1222
f97ada87 wants to merge 1 commit intoBitmessage:v0.6from
f97ada87:opmode-netcat-v2

Conversation

@f97ada87
Copy link
Copy Markdown
Contributor

This PR enables a special headless operating mode (which I named "netcat" mode due to similarities with the Unix utility) where all object processing is disabled. Instead, raw objects received from the network are output to STDOUT unprocessed, also, any valid raw objects received on STDIN are broadcast to the network.

This is a re-implementation of PR #1149 , using the switches defined in PR #1214 . The discussions from the linked PRs should provide useful background.

@f97ada87 f97ada87 force-pushed the opmode-netcat-v2 branch 2 times, most recently from c6cfe51 to e9425c8 Compare April 16, 2018 11:09
@f97ada87
Copy link
Copy Markdown
Contributor Author

a line-too-long codacy issue was fixed

Copy link
Copy Markdown
Collaborator

@g1itch g1itch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it stops? Maybe handle SIGINT? I cannot stop it either by SIGINT or SIGTERM.

Comment thread src/std_io.py
sql.execute('''INSERT INTO objectprocessorqueue VALUES (?,?)''',
objectType, data)
numberOfObjectsInObjProcQueue += 1
logger.debug('Saved %s objects from the objectProcessorQueue to disk. objectProcessorThread exiting.' %
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 validation missing here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread src/std_io.py
outputs to STDOUT in format: hex_timestamp - tab - hex-encoded_object
"""

import threading
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintain a order all the imports should be first.
from statements should be written after import statements.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread src/std_io.py Outdated
while True:
# read a line in hex encoding
line = self.inputSrc.readline()
if len(line) == 0:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we use
if not line or if not line.strip() ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if not line" - can't strip an EOF :)

Comment thread src/std_io.py
self.inputSrc = inputSrc
logger.info('stdInput thread started.')

def run(self):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write a method with only 15 or 20 lines.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Go big or go home :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over longer term, I would also prefer to have shorter methods, some parts of the old code, like the class_objectProcessor are too long, but for this PR it's fine the way it is.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted with thanks. I think the ad-hoc object parsing accounts for a lot of avoidable LLOC wastage, however, as discussed in PR #1149 , there was no readily usable parser function to use instead.

Comment thread src/std_io.py
"""
The objectStdOut thread receives network objects from the receiveDataThreads.
"""
def __init__(self):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow Pep8 styling format.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread src/std_io.py
objectType, data = row
queues.objectProcessorQueue.put((objectType, data))
sqlExecute('''DELETE FROM objectprocessorqueue''')
logger.debug('Loaded %s objects from disk into the objectProcessorQueue.' % str(len(queryreturn)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pep8 styling missing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@f97ada87
Copy link
Copy Markdown
Contributor Author

@g1itch it stops clean on Ctrl-D (EOF) on STDIN or dirty with SIGKILL. Clean exit by signal is not available because of blocking read on STDIN.
Do you think it's needed? If yes, I can use a non-blocking read instead.

@f97ada87
Copy link
Copy Markdown
Contributor Author

@omkar1117 human or robot? If human, we need to talk.

@g1itch
Copy link
Copy Markdown
Collaborator

g1itch commented Apr 16, 2018

Of course, Ctrl+D. Yes, it works. It's enough for me.

@f97ada87 f97ada87 force-pushed the opmode-netcat-v2 branch 2 times, most recently from 63893b5 to 7d6a0b5 Compare April 16, 2018 16:00
@f97ada87
Copy link
Copy Markdown
Contributor Author

all pep8 issues are now fixed (except line length which is endemic) - thanks @omkar1117

@g1itch
Copy link
Copy Markdown
Collaborator

g1itch commented Apr 19, 2018

It could be useful to have an option to dump the messages which caused an exceptions with help of this std_io module.

@f97ada87
Copy link
Copy Markdown
Contributor Author

I'm sorry, I'm not sure what you mean, in a number of ways :)

  1. by "messages which caused an exception", are you referring to:
  • the objects received from userspace via stdin thread, which fail unhexlify or other pre-parsing operations, or
  • the objects received from the network layer, which fail early sanity checks in bmproto.py and are dropped before even reaching inventory?
  1. by "an option to dump", do you mean the objects should be:
  • displayed in raw form, and if yes, where exactly, considering that all STDIO is busy; just send them to logger.info?, or
  • dropped and not processed further, which is pretty much what we already do? :)

Asking because these are all things that I've considered. :)

@g1itch
Copy link
Copy Markdown
Collaborator

g1itch commented Apr 19, 2018

  1. messages received from network which caused an exception in bmproto (with ERROR - Error processing in debug.log)
  2. dump them to files in format produced by your std_io module for analysis

It may be another application for this new code when it will be merged.

@f97ada87
Copy link
Copy Markdown
Contributor Author

Yes, this makes sense. As a matter of fact, the initial commit from PR #1149 was tapping into bmproto.py upstream of the sanity checks, and was capturing broken objects along with the good ones. It was only after updating the code to tap into the ObjectProcessorQueue instead (downstream of bmproto) that we stopped capturing them. Anyway, this should be easy to add in the future if needed, with a couple of conditionals inside bmproto.py

@f97ada87
Copy link
Copy Markdown
Contributor Author

f97ada87 commented May 2, 2018

Hi guys, not trying to rush things, just double-checking if there are any open issues requiring attention from my end.

@PeterSurda
Copy link
Copy Markdown
Member

Haven't had time yet to check it out

Copy link
Copy Markdown
Contributor

@coffeedogs coffeedogs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested yet, will do after the sql refactor. Any reason to think it wouldn't work cross-platform? Note I'm new to the codebase so apologies for any dumb points I raise.

Comment thread src/bitmessagemain.py
opts, args = getopt.getopt(
sys.argv[1:], "hcdt",
["help", "curses", "daemon", "test"])
sys.argv[1:], "hcdtn",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option '-n' often has connotations of 'dry run' or 'numeric only'. While we don't have anything using 'n' right now I'm thinking of avoiding future confusion if we do. Is there another letter that you would say makes as much sense as 'n'?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that and decided that none of the common uses of "-n" (dry-run, no-resolve, no-output, numeric etc) are applicable in the Bitmessage context.
A second best option would be "-c" (netCat) which is already taken.

Comment thread src/bitmessagemain.py
sys.argv[1:], "hcdt",
["help", "curses", "daemon", "test"])
sys.argv[1:], "hcdtn",
["help", "curses", "daemon", "test", "mode-netcat"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other options are modes too. Perhaps you could drop the "mode-"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See general comment, there's a naming theme.

Comment thread src/std_io.py Outdated
# unhex the input with error rejection
try:
binObject = unhexlify(hexObject)
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs unhexlify might throw TypeError. Can we be more specific here otherwise codacy will complain of a bare except.

Comment thread src/std_io.py
logger.info("STDIN: Invalid object size")
continue

if not state.enableNetwork and state.enableGUI:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You set state.enableGUI = False above. Will this conditional ever be reached?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state.enableGUI is only false in netcat mode; std_io may have other uses outside of netcat mode, which are not currently included. Yes, the conditional makes sense from a logical perspective.

Comment thread src/std_io.py Outdated
try:
# stdioStamp, = unpack('>Q', unhexlify(hexTStamp))
_, = unpack('>Q', unhexlify(hexTStamp))
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just catch struct.error here? Oh, plus TypeError for unhexlify.

Comment thread src/std_io.py Outdated

# duplicate check on inventory hash (both netcat and airgap)
if inventoryHash in Inventory():
logger.info("STDIN: Already got object " + hexlify(inventoryHash))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how many of these one might expect during 'normal' operation. Is 'info' the right log level? Maybe this and the next log statement could be debug level if it would otherwise lead to info being spammed. Maybe we expect people running in this mode to not care about other info level statements but be very interested in knowing that messages were or were not added, I don't know.

Copy link
Copy Markdown
Contributor Author

@f97ada87 f97ada87 May 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're probably right. It's only useful when running in manual mode and pasting objects in console, useless in most other scenarios. I'll change it to debug.
I guess what I really need here is not logger.info pollution, but the ability to switch logging levels as needed via getopt; I might actually open an issue about that.
EDIT - Done: #1246

Comment thread src/std_io.py
# REFACTOR THIS with objectProcessor into objectProcessorQueue
queryreturn = sqlQuery(
'''SELECT objecttype, data FROM objectprocessorqueue''')
for row in queryreturn:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By waiting until we have synchronously pulled all objects from the database and put all objects on the queue we are missing out on some benefits of parallelism and we'll hit a memory limit with large data sets. Maybe this is unavoidable or related to your suggested refactoring.

Also, keeping the number of places where raw SQL is used to a minimum is a good idea. Again, I assume this would be part of your suggested refactoring.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block marked by refactor ... /refactor is duplicated verbatim from class objectProcessor; the comment indicates my intention to refactor it into class objectProcessorQueue, a task which is outside the scope of this PR.

Comment thread src/std_io.py
# /REFACTOR THIS

def run(self):
while True:
Copy link
Copy Markdown
Contributor

@coffeedogs coffeedogs May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the queue is empty would this not cause unnecessary 100% CPU? Perhaps a small sleep is needed inside the while loop?

Edit: no it wouldn't, Queue.get(block=True) blocks when the queue is empty. I was thinking of the AMQP library I was most recently using.

Comment thread src/std_io.py
numberOfObjectsInObjProcQueue += 1
logger.debug('Saved %s objects from the objectProcessorQueue to disk. objectProcessorThread exiting.' %
str(numberOfObjectsInObjProcQueue))
# /REFACTOR THIS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please!

Comment thread src/std_io.py
objectType, data = queues.objectProcessorQueue.get()

# Output in documented format
print "%016x" % int(time.time()) + '\t' + hexlify(data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use sys.stdout.write() instead of print here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be indeed the logical choice, I'm just not sure how cross-platform it would be. Print is 100% portable and not wrong.

@f97ada87
Copy link
Copy Markdown
Contributor Author

@coffeedogs a general comment before I address some specific points; the answer to most your queries is along the lines of "backward compat", "minimizing the diff" and "one change at a time".
This PR is ported from of a private fork that has several special operating modes:

  • --mode-netcat / -n described here,
  • --mode-airgap / -a (no network, STDIO to ObjectProcessor) which will be ported next
  • --mode-router / -r and --mode-seeder / -s which may never be ported

As much as possible, I tried to preserve compatibility with both PyBitmessage and the original fork.

@f97ada87 f97ada87 force-pushed the opmode-netcat-v2 branch from 7d6a0b5 to 1b61750 Compare May 13, 2018 04:43
@f97ada87 f97ada87 force-pushed the opmode-netcat-v2 branch from 1b61750 to 5fd8990 Compare May 13, 2018 10:33
@PeterSurda PeterSurda removed the request for review from MahendraNG May 21, 2018 15:03
@PeterSurda PeterSurda added the enhancement New feature label May 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants