Skip to content

define and enable netcat operating mode#1149

Closed
f97ada87 wants to merge 1 commit intoBitmessage:v0.6from
f97ada87:opmode-netcat
Closed

define and enable netcat operating mode#1149
f97ada87 wants to merge 1 commit intoBitmessage:v0.6from
f97ada87:opmode-netcat

Conversation

@f97ada87
Copy link
Copy Markdown
Contributor

@f97ada87 f97ada87 commented Mar 9, 2018

This PR enables a special headless operating mode ("netcat" mode) where all object processing is disabled. Instead, raw objects received from the network are output to STDOUT unprocessed, also, any valid raw objects received on STDIN are broadcast to the network.

The STDOUT format is one object per line, formatted as:

  • current timestamp as a zero-padded 16-digit hex number
  • a TAB character
  • the object in HEX encoding

The STDIN format is one object per line, which can be either a single HEX-encoded object or the format described for STDOUT above (in the latter case the timestamp will be ignored).

Example:
000000005aa29ada 00000000014f5b6a000000005aa7c646000000000401c5a40f3411817a9ba03df27d7839dd98b854d4fea089f6ffa9c251456e1f61e5

This is part of a larger effort to compartmentalize the PyBitmessage operating layers in order to better contain network-borne attacks, and from this perspective it is an incomplete solution.
However, it also provides some immediate benefits such as the ability to generate a timestamped full record of network traffic for later examination/replay, as well as enabling developers to broadcast protocol objects generated externally.

As usual, all comments and questions will be appreciated.

@PeterSurda
Copy link
Copy Markdown
Member

Can you check the Codacy report and fix the issues?

@PeterSurda PeterSurda self-requested a review March 9, 2018 22:43
@PeterSurda PeterSurda self-assigned this Mar 9, 2018
@PeterSurda PeterSurda added the enhancement New feature label Mar 9, 2018
@f97ada87
Copy link
Copy Markdown
Contributor Author

I fixed all the Codacy warnings that made sense. IMO the two remaining "issues" are needed to retain human readability of the code and fixing them will not improve the source code quality.
I'm open to any suggestions.

@PeterSurda
Copy link
Copy Markdown
Member

Apparently the best practice for an unused variable in situations like this is to use the _ variable. This makes it clear to the reader that the variable is unused.

@f97ada87
Copy link
Copy Markdown
Contributor Author

I know, I'm just being pragmatic here. As both a code writer and reader, if I were to choose between:

  • versionNumber, versionLength = decodeVarint( ... )
  • _, versionLength = decodeVarint( ... )

I will always prefer the former, even if it causes a lint warning.
It's a matter of being respectful to the next guy; one line says clearly "I'm extracting this element and not using it as it's not needed", the other says "Nothing to see here, keep going".
I feel that, in this case, we're sacrificing human readability in order to keep the robots happy. Can you please confirm that this is required in order to have the PR accepted?

Thanks.

@PeterSurda
Copy link
Copy Markdown
Member

I need to think about it as I'm in the process of formalising the coding standards.

@PeterSurda
Copy link
Copy Markdown
Member

PeterSurda commented Mar 10, 2018

How about this: you use _ but add (for example in the line immediately above) a comment which lists both variables? This would both keep the linter quiet as well as explain to the potential reader what is happening. (S)he'll understand that there is an unused variable, but if (s)he needed it, this is what it should be called.

Like this:

# versionNumber, versionLength = ...
_, versionLength = decodeVarint( binObject[readPosition:readPosition + 10])

Even though in this case, even better would be to use some sort of parser function and then work on the result of that function.

@PeterSurda
Copy link
Copy Markdown
Member

Great. Please allow a couple of days for the review, I'll explain afterwards.

@f97ada87
Copy link
Copy Markdown
Contributor Author

Thanks, and agreed on the parser function. I saw your work in 96d58f3 etc but wasn't 100% confident to use it, so I reverted to old-style parsing instead, for now. Can be refactored later in a separate PR.

@PeterSurda
Copy link
Copy Markdown
Member

wasn't 100% confident to use it, so I reverted to old-style parsing instead, for now.

It probably wouldn't work correctly as that parser is integrated into the network buffer handling.

Can be refactored later in a separate PR.

Yes, that's for a later stage, at this moment I'd prefer to handle new features and refactoring separately.

@PeterSurda PeterSurda requested a review from MahendraNG March 14, 2018 06:39
Comment thread src/class_stdInput.py Outdated
if state.netcatmode:
# publish object to inventory and advertise
Inventory()[inventoryHash] = (objectType, streamNumber, binObject, eTime,'')
PendingUpload().add(inventoryHash)
Copy link
Copy Markdown
Member

@PeterSurda PeterSurda Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PendingUpload is deprecated, just remove this line.

@PeterSurda
Copy link
Copy Markdown
Member

I have an uneasy feeling about this.

  • printing new object data shouldn't be done from bmproto. It should rather signal a separate thread with the inventory hash, and that should print the data from Inventory(). Maybe there should be a separate mode in object processor which would print it out? Then you can get rid of stdInput class.

  • can't sending new objects be done from API instead? Just add a new method similar to HandleDisseminatePreEncryptedMsg, it could also automatically detect if the PoW is missing and calculate it if necessary.

  • why hex encoded and not something that is already used elsewhere, like JSON or msgpack? And why one object per line instead of using a transport protocol that is already used elsewhere (like HTTP)?

@f97ada87
Copy link
Copy Markdown
Contributor Author

Most decisions are inline with the Unix doctrine: do one thing and do it well, design for interoperation, your output is someone else's input.

The purpose of the netcat mode is to minimize the footprint and exploitable surface of the PyBitmessage network-facing process, for security reasons. This is achieved by short-circuiting several classes and process threads, including the objectProcessor in its entirety. There is no API, no POW either.

The patchset is designed to minimize codebase bloat. The stdInput class (and thread) is provided as a single, clear and easily auditable ingress port for raw objects in the STDIO special modes (netcat/airgap/etc). I don't see a security benefit in merging its functionality inside another file or class, as it would lead to reduced transparency.
The unqueued print from bmproto was the least intrusive option, and has worked without issue for nearly a year in real-life installations. I agree, queued output is better but unqueued is not wrong. Do you have a specific situation in mind, or just best practices?

I chose the hex one-line format for several reasons:

  • native support cross-platform, cross-language
  • simple and strict, no dark corners to hide bad stuff
  • 100% lossless – everything is recorded as received, even invalid data
  • easy to inspect visually (in monospace font)
  • can be processed with standard utilities (cat, tee, grep, sed, awk, head, tail, sort, perl, wc etc)

More exotic I/O formats may be introduced later if needed, selected by command-line options (--netcat-format=json), although for bloat reduction I would recommend using external conversion utilities instead (pybm –mode-netcat | hex_to_json_converter | json_app)

Happy to discuss any objections or counterpoints to the above.

@PeterSurda
Copy link
Copy Markdown
Member

PeterSurda commented Mar 22, 2018

I would like this to work more in a pluggable model rather than a wide range of code flow changes triggered bye a single variable. For this, we need a new class which acts like a queue, but allows multiple "subscribers". Then objectProcessor and the stdOut (or whatever you want to call it) could subscribe/launch independently. The same problem exists with the UI queue which prevents the GUI and the SMTP thread from working simulaneously. The output shouldn't happen in the receivedata thread but in a separate one. So this requires some refactoring of existing code. You could use threading.local to store the queues. Here is some pseudocode:

# subscriberqueue.py
SubscriberQueue(Queue):
    def subscribe():
      threading.local = Queue()
      self._queues.append(threading.local)
    def put(data):
      for i in self._queues:
          i.put(data)
    def get():
      return threading.local.get()
...
# queues.py:
objectOutputQueue = SubscriberQueue()
...
# class_objectProcessor.py:
class objectProcessorThread():
     def __init():
        queues.objectOutputQueue.subscribe()

Regarding input, I suppose it's ok for now.

Perhaps at first you could add more runtime variables, one for enabling stdin/out IO, one for enabling/disabling object processor, one for the worker thread, and so on. The netcat mode would then set all the variables accordingly.

@f97ada87
Copy link
Copy Markdown
Contributor Author

@PeterSurda - I have added the modular switches as suggested (as state.enable*). Is this what you had in mind?

@PeterSurda
Copy link
Copy Markdown
Member

@f97ada87 I need to review it in a bit more detail, but in general it appears to be like I asked.

@f97ada87
Copy link
Copy Markdown
Contributor Author

Thanks @PeterSurda . Please do not merge right now as I have some fixes to upload, mainly relevant to your output queuing suggestion. I'll confirm when it's ready to go.

@f97ada87
Copy link
Copy Markdown
Contributor Author

f97ada87 commented Mar 29, 2018

Hi @PeterSurda , this is complete and ready to go.

  • code sections and threads are controlled by separate boolean switches
  • object output uses the existing objectProcessorQueue
  • all changes to bmproto.py have been removed.

As the conventional objectProcessor and the netcat mode are mutually exclusive by definition, the subscriberQueue logic seems YAGNI at this stage. I used a simple if/else construct instead, hope it's OK.

@PeterSurda
Copy link
Copy Markdown
Member

Not having a SubscriberQueue is ok for the time being. The rest I'll look at tomorrow.

@PeterSurda PeterSurda requested a review from g1itch April 5, 2018 14:49
@g1itch
Copy link
Copy Markdown
Collaborator

g1itch commented Apr 5, 2018

There are a couple of conflicts currently:

  • textual conflict can be solved by rebase
  • and logical one is related to test-mode which is been merged before this PR ¯\_(ツ)_/¯

You can solve them like me or maybe propose a better solution.

Some style questions also:

  • why you dropped the class_startInput module and named it std_io instead?
  • what is the point of "0.6.3+ SPECIALOPMODES" prefix in every comment?

@f97ada87
Copy link
Copy Markdown
Contributor Author

f97ada87 commented Apr 6, 2018

Short answer: Scope creep :)

Long answers:

  • I renamed the file class_stdInput to std_io because the original name was no longer truly reflective of the current content. Originally the file contained a single class for the standard input thread. Currently, to enable queued stdout (discussion above), the file contains a variable, two classes (one for input, one for netcat output) and there's more to come.
  • the "special operating modes" comments were used inhouse to tag the relevant changes from the original 0.6 tree; 0.6.2+ and 0.6.3+ indicate successful forward-porting by me. They are indeed useless to anyone else and I should remove them from further pushes.

As for conflict solving, I'm planning to close this PR and resubmit as a series of smaller ones, more targeted and less scope-creepy :)

@PeterSurda
Copy link
Copy Markdown
Member

As for conflict solving, I'm planning to close this PR and resubmit as a series of smaller ones, more targeted and less scope-creepy :)

I would definitely prefer it this way. I promise I'll allocate time earlier for review so that you don't have to wait that long again for feedback.

@f97ada87
Copy link
Copy Markdown
Contributor Author

f97ada87 commented Apr 6, 2018

@PeterSurda , by "this way", do you mean as it is now, just rebased and with conflicts resolved as suggested by @g1itch ?

@PeterSurda
Copy link
Copy Markdown
Member

@f97ada87 I mean I prefer it split into separate PRs. In case I end up wanting changes, I can still merge a part of the PRs and make progress.

@f97ada87
Copy link
Copy Markdown
Contributor Author

f97ada87 commented Apr 8, 2018

Sounds good. Closing this for now.

@f97ada87 f97ada87 closed this Apr 8, 2018
@f97ada87 f97ada87 deleted the opmode-netcat branch April 8, 2018 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants