define and enable netcat operating mode by f97ada87 · Pull Request #1149 · Bitmessage/PyBitmessage

f97ada87 · 2018-03-09T14:43:51Z

This PR enables a special headless operating mode ("netcat" mode) where all object processing is disabled. Instead, raw objects received from the network are output to STDOUT unprocessed, also, any valid raw objects received on STDIN are broadcast to the network.

The STDOUT format is one object per line, formatted as:

current timestamp as a zero-padded 16-digit hex number
a TAB character
the object in HEX encoding

The STDIN format is one object per line, which can be either a single HEX-encoded object or the format described for STDOUT above (in the latter case the timestamp will be ignored).

Example:
000000005aa29ada 00000000014f5b6a000000005aa7c646000000000401c5a40f3411817a9ba03df27d7839dd98b854d4fea089f6ffa9c251456e1f61e5

This is part of a larger effort to compartmentalize the PyBitmessage operating layers in order to better contain network-borne attacks, and from this perspective it is an incomplete solution.
However, it also provides some immediate benefits such as the ability to generate a timestamped full record of network traffic for later examination/replay, as well as enabling developers to broadcast protocol objects generated externally.

As usual, all comments and questions will be appreciated.

PeterSurda · 2018-03-09T22:43:45Z

Can you check the Codacy report and fix the issues?

f97ada87 · 2018-03-10T02:31:50Z

I fixed all the Codacy warnings that made sense. IMO the two remaining "issues" are needed to retain human readability of the code and fixing them will not improve the source code quality.
I'm open to any suggestions.

PeterSurda · 2018-03-10T06:56:00Z

Apparently the best practice for an unused variable in situations like this is to use the _ variable. This makes it clear to the reader that the variable is unused.

f97ada87 · 2018-03-10T09:35:01Z

I know, I'm just being pragmatic here. As both a code writer and reader, if I were to choose between:

versionNumber, versionLength = decodeVarint( ... )
_, versionLength = decodeVarint( ... )

I will always prefer the former, even if it causes a lint warning.
It's a matter of being respectful to the next guy; one line says clearly "I'm extracting this element and not using it as it's not needed", the other says "Nothing to see here, keep going".
I feel that, in this case, we're sacrificing human readability in order to keep the robots happy. Can you please confirm that this is required in order to have the PR accepted?

Thanks.

PeterSurda · 2018-03-10T09:40:03Z

I need to think about it as I'm in the process of formalising the coding standards.

PeterSurda · 2018-03-10T09:50:25Z

How about this: you use _ but add (for example in the line immediately above) a comment which lists both variables? This would both keep the linter quiet as well as explain to the potential reader what is happening. (S)he'll understand that there is an unused variable, but if (s)he needed it, this is what it should be called.

Like this:

# versionNumber, versionLength = ...
_, versionLength = decodeVarint( binObject[readPosition:readPosition + 10])

Even though in this case, even better would be to use some sort of parser function and then work on the result of that function.

PeterSurda · 2018-03-10T10:03:59Z

Great. Please allow a couple of days for the review, I'll explain afterwards.

f97ada87 · 2018-03-10T10:12:50Z

Thanks, and agreed on the parser function. I saw your work in 96d58f3 etc but wasn't 100% confident to use it, so I reverted to old-style parsing instead, for now. Can be refactored later in a separate PR.

PeterSurda · 2018-03-10T10:25:18Z

wasn't 100% confident to use it, so I reverted to old-style parsing instead, for now.

It probably wouldn't work correctly as that parser is integrated into the network buffer handling.

Can be refactored later in a separate PR.

Yes, that's for a later stage, at this moment I'd prefer to handle new features and refactoring separately.

PeterSurda · 2018-03-22T08:29:47Z

+            if state.netcatmode:
+                # publish object to inventory and advertise
+                Inventory()[inventoryHash] = (objectType, streamNumber, binObject, eTime,'')
+                PendingUpload().add(inventoryHash)


PendingUpload is deprecated, just remove this line.

PeterSurda · 2018-03-22T08:42:48Z

I have an uneasy feeling about this.

printing new object data shouldn't be done from bmproto. It should rather signal a separate thread with the inventory hash, and that should print the data from Inventory(). Maybe there should be a separate mode in object processor which would print it out? Then you can get rid of stdInput class.
can't sending new objects be done from API instead? Just add a new method similar to HandleDisseminatePreEncryptedMsg, it could also automatically detect if the PoW is missing and calculate it if necessary.
why hex encoded and not something that is already used elsewhere, like JSON or msgpack? And why one object per line instead of using a transport protocol that is already used elsewhere (like HTTP)?

f97ada87 · 2018-03-22T14:14:43Z

Most decisions are inline with the Unix doctrine: do one thing and do it well, design for interoperation, your output is someone else's input.

The purpose of the netcat mode is to minimize the footprint and exploitable surface of the PyBitmessage network-facing process, for security reasons. This is achieved by short-circuiting several classes and process threads, including the objectProcessor in its entirety. There is no API, no POW either.

The patchset is designed to minimize codebase bloat. The stdInput class (and thread) is provided as a single, clear and easily auditable ingress port for raw objects in the STDIO special modes (netcat/airgap/etc). I don't see a security benefit in merging its functionality inside another file or class, as it would lead to reduced transparency.
The unqueued print from bmproto was the least intrusive option, and has worked without issue for nearly a year in real-life installations. I agree, queued output is better but unqueued is not wrong. Do you have a specific situation in mind, or just best practices?

I chose the hex one-line format for several reasons:

native support cross-platform, cross-language
simple and strict, no dark corners to hide bad stuff
100% lossless – everything is recorded as received, even invalid data
easy to inspect visually (in monospace font)
can be processed with standard utilities (cat, tee, grep, sed, awk, head, tail, sort, perl, wc etc)

More exotic I/O formats may be introduced later if needed, selected by command-line options (--netcat-format=json), although for bloat reduction I would recommend using external conversion utilities instead (pybm –mode-netcat | hex_to_json_converter | json_app)

Happy to discuss any objections or counterpoints to the above.

PeterSurda · 2018-03-22T15:43:20Z

I would like this to work more in a pluggable model rather than a wide range of code flow changes triggered bye a single variable. For this, we need a new class which acts like a queue, but allows multiple "subscribers". Then objectProcessor and the stdOut (or whatever you want to call it) could subscribe/launch independently. The same problem exists with the UI queue which prevents the GUI and the SMTP thread from working simulaneously. The output shouldn't happen in the receivedata thread but in a separate one. So this requires some refactoring of existing code. You could use threading.local to store the queues. Here is some pseudocode:

# subscriberqueue.py
SubscriberQueue(Queue):
    def subscribe():
      threading.local = Queue()
      self._queues.append(threading.local)
    def put(data):
      for i in self._queues:
          i.put(data)
    def get():
      return threading.local.get()
...
# queues.py:
objectOutputQueue = SubscriberQueue()
...
# class_objectProcessor.py:
class objectProcessorThread():
     def __init():
        queues.objectOutputQueue.subscribe()

Regarding input, I suppose it's ok for now.

Perhaps at first you could add more runtime variables, one for enabling stdin/out IO, one for enabling/disabling object processor, one for the worker thread, and so on. The netcat mode would then set all the variables accordingly.

f97ada87 · 2018-03-24T02:22:35Z

@PeterSurda - I have added the modular switches as suggested (as state.enable*). Is this what you had in mind?

PeterSurda · 2018-03-28T08:18:23Z

@f97ada87 I need to review it in a bit more detail, but in general it appears to be like I asked.

f97ada87 · 2018-03-28T08:40:02Z

Thanks @PeterSurda . Please do not merge right now as I have some fixes to upload, mainly relevant to your output queuing suggestion. I'll confirm when it's ready to go.

f97ada87 · 2018-03-29T15:14:38Z

Hi @PeterSurda , this is complete and ready to go.

code sections and threads are controlled by separate boolean switches
object output uses the existing objectProcessorQueue
all changes to bmproto.py have been removed.

As the conventional objectProcessor and the netcat mode are mutually exclusive by definition, the subscriberQueue logic seems YAGNI at this stage. I used a simple if/else construct instead, hope it's OK.

PeterSurda · 2018-03-29T23:32:36Z

Not having a SubscriberQueue is ok for the time being. The rest I'll look at tomorrow.

g1itch · 2018-04-05T17:11:02Z

There are a couple of conflicts currently:

textual conflict can be solved by rebase
and logical one is related to test-mode which is been merged before this PR ¯\_(ツ)_/¯

You can solve them like me or maybe propose a better solution.

Some style questions also:

why you dropped the class_startInput module and named it std_io instead?
what is the point of "0.6.3+ SPECIALOPMODES" prefix in every comment?

f97ada87 · 2018-04-06T02:42:18Z

Short answer: Scope creep :)

Long answers:

I renamed the file class_stdInput to std_io because the original name was no longer truly reflective of the current content. Originally the file contained a single class for the standard input thread. Currently, to enable queued stdout (discussion above), the file contains a variable, two classes (one for input, one for netcat output) and there's more to come.
the "special operating modes" comments were used inhouse to tag the relevant changes from the original 0.6 tree; 0.6.2+ and 0.6.3+ indicate successful forward-porting by me. They are indeed useless to anyone else and I should remove them from further pushes.

As for conflict solving, I'm planning to close this PR and resubmit as a series of smaller ones, more targeted and less scope-creepy :)

PeterSurda · 2018-04-06T06:02:47Z

As for conflict solving, I'm planning to close this PR and resubmit as a series of smaller ones, more targeted and less scope-creepy :)

I would definitely prefer it this way. I promise I'll allocate time earlier for review so that you don't have to wait that long again for feedback.

f97ada87 · 2018-04-06T14:32:24Z

@PeterSurda , by "this way", do you mean as it is now, just rebased and with conflicts resolved as suggested by @g1itch ?

PeterSurda · 2018-04-06T17:00:48Z

@f97ada87 I mean I prefer it split into separate PRs. In case I end up wanting changes, I can still merge a part of the PRs and make progress.

f97ada87 · 2018-04-08T14:22:17Z

Sounds good. Closing this for now.

PeterSurda self-requested a review March 9, 2018 22:43

PeterSurda self-assigned this Mar 9, 2018

PeterSurda added the enhancement New feature label Mar 9, 2018

PeterSurda requested a review from MahendraNG March 14, 2018 06:39

PeterSurda reviewed Mar 22, 2018

View reviewed changes

define and enable netcat operating mode

e8af4d8

PeterSurda requested a review from g1itch April 5, 2018 14:49

f97ada87 closed this Apr 8, 2018

f97ada87 deleted the opmode-netcat branch April 8, 2018 14:22

f97ada87 mentioned this pull request Apr 9, 2018

component control switches #1214

Merged

f97ada87 mentioned this pull request Apr 16, 2018

implement netcat operating mode #1222

Open

Conversation

f97ada87 commented Mar 9, 2018

Uh oh!

PeterSurda commented Mar 9, 2018

Uh oh!

f97ada87 commented Mar 10, 2018

Uh oh!

PeterSurda commented Mar 10, 2018

Uh oh!

f97ada87 commented Mar 10, 2018

Uh oh!

PeterSurda commented Mar 10, 2018

Uh oh!

PeterSurda commented Mar 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PeterSurda commented Mar 10, 2018

Uh oh!

f97ada87 commented Mar 10, 2018

Uh oh!

PeterSurda commented Mar 10, 2018

Uh oh!

PeterSurda Mar 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PeterSurda commented Mar 22, 2018

Uh oh!

f97ada87 commented Mar 22, 2018

Uh oh!

PeterSurda commented Mar 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

f97ada87 commented Mar 24, 2018

Uh oh!

PeterSurda commented Mar 28, 2018

Uh oh!

f97ada87 commented Mar 28, 2018

Uh oh!

f97ada87 commented Mar 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PeterSurda commented Mar 29, 2018

Uh oh!

g1itch commented Apr 5, 2018

Uh oh!

f97ada87 commented Apr 6, 2018

Uh oh!

PeterSurda commented Apr 6, 2018

Uh oh!

f97ada87 commented Apr 6, 2018

Uh oh!

PeterSurda commented Apr 6, 2018

Uh oh!

f97ada87 commented Apr 8, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PeterSurda commented Mar 10, 2018 •

edited

Loading

PeterSurda Mar 22, 2018 •

edited

Loading

PeterSurda commented Mar 22, 2018 •

edited

Loading

f97ada87 commented Mar 29, 2018 •

edited

Loading