Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Communication protocol redesign #16

Closed
metall0id opened this issue Jun 8, 2012 · 22 comments
Closed

Communication protocol redesign #16

metall0id opened this issue Jun 8, 2012 · 22 comments
Labels
Milestone

Comments

@metall0id
Copy link
Contributor

We are looking at redesigning the communications protocol to take this project to the next level.

The communication protocol is rather simple at the moment and is using a defined XML communication in 1 direction only, from client to server. In order to reach a point where Mercury can be used for many purposes, the following would need to be supported by the communications protocol:

  • Encryption - I suspect something like SSL would be best
  • Bi-directional communications. The ability to have the server connect back to the client OR the client connect to the server would (i.e. Reverse and bind connections) be needed in many applications of Mercury.

By implementing these, it would be the most flexible. Examples of when a bind connection is needed: current "assessment suite". Examples of when a reverse connection is needed: full exploitation suite

Any ideas are welcome
Tyrone

@ikelos
Copy link
Contributor

ikelos commented Jun 8, 2012

I'd recommend adding a protocol version identifier early on in whatever protocol you decide to go for, so you can make changes in the future without too much breakage. Currently since Mercury is one-way, you only need to worry about a new client attempting to connect to an old server, which I'd guess will just error out with an exception. Since all existing Mercury calls should start with "<", we ought to be able to build a reasonable detection mechanism for a new server responding to an old client. In the future both sides should indicate what version of the Mercury protocol they support.

I'd recommend checking out http://code.google.com/p/protobuf/ as a serialization library, it seems to have a fair number of useful bindings (java, python, c++), so could be good for this.

@rchiossi
Copy link

I also like the idea of using protobuf. I'ts very simple to define data models and it has support for Java and Python. Fits pretty well in mercury.

@metall0id
Copy link
Contributor Author

Agreed, this looks like the way forward.

How about the following definitions on messages in both directions.

Bytes   Field
1       Protocol Version
1       Message Type
4       ProtoBuf Length
x       ProtoBuf Structure

The pro about having the message type field is that a completely new protobuf definition can be made for other incorporated features. At the moment it would probably be enum values for messages like:

  • Command-Request
  • Command-Reply
  • Reflective-Request
  • Reflective-Reply

The ProtoBuf Length is an easy way on the receiving side so that we don't need to look for silly delimiters in the TCP stream. A protobuf definition can be made for each message type and parsed as such by checking the message type field.

Thoughts and suggestions?

@ikelos
Copy link
Contributor

ikelos commented Jun 21, 2012

This would be a break from the existing API, as it wouldn't interoperate with the old XML formatted code. We'd also need to be careful not to accept a mercury version of '<'. I haven't looked into protocol buffers closely enough to know whether they could encapsulate the buffer length themselves (or even the message type and mercury version). Are there any existing examples of how other people construct protocols using them? We should probably do a bit more reading before we dive right in...

@metall0id
Copy link
Contributor Author

I don't see the need for it to inter-operate with existing XML code. The project is still in early phases and we need proper structures to take the project forward.

A complete redesign of the comms protocol that breaks backward compatibility with the existing protocol is fine in my opinion.

Protobuf has no way of knowing its own size and so a size header is recommended. I found these 2 threads helpful:

@metall0id
Copy link
Contributor Author

For reference, my proposed layout follows the Type-length-value structure found in many protocols.

See http://en.wikipedia.org/wiki/Type-length-value

@gjunquera
Copy link

I appreciate using protocol buffers too.

I wrote a .proto file as an example (https://github.com/gjunquera/mercury/blob/pb2_communication/Message.proto).
The structure is basically:
• A general request that has section, function and the arguments;
• A general response that has a data and an error field;
• Specific responses that returns specific fields for each section on Mercury.

I also modify "connect" and "provider->info" implementations to use protocol buffers as a PoC (https://github.com/gjunquera/mercury/blob/pb2_communication)

Please give your thoughts.

@metall0id
Copy link
Contributor Author

Hi,

Thank you very much for our first code contribution on this matter. I don't want to be harsh on it but I do not believe that it is the best way to do this.

By implementing specific responses in the protocol itself, we are losing flexibility all the way. Instead why not have something more generalised like this as the Response:

message Response {
    optional bytes data = 1;
    optional bytes error = 2;
    repeated KVPair structured_data = 3;
}

message KVPair {
    required bytes key = 1;
    required bytes value = 2;
}

This way, it is up to the method that received the Request to fill in the structured_data fields with appropriate data, like various columns and fields etc. If the method that received the Request does not need to send back specific ordered information it can merely fill in the data field and leave structured_data blank.

Let me know if anyone sees any obvious faults in that idea?
Tyrone

EDIT: It also might be worth using the bytes proto type instead of string because this protocol should also be able to transfer binary files.

EDIT 2: Rethought the structure a bit

@ikelos
Copy link
Contributor

ikelos commented Aug 15, 2012

I'll need a bit of time to process this, please give me until after the weekend to have a think on it...

@gjunquera
Copy link

No problem to change my solution, it was just an example to restart the discussion. I was not sure if it was a good.
I agree that using key-value pairs is much better than specify each field like I did.

On KVPair, why did you use key as bytes? Key is always a string, isn't it?

What about the Request? Follows my suggestion:

message Request {
    optional string section = 1;
    optional string function = 2;
    repeated KVPair args = 3;
}

message KVPair {
    required string key = 1;
    required bytes value = 2;
}

What do you think?

Another point, Protocol Buffers would add some dependencies to mercury, is it a problem?

@metall0id
Copy link
Contributor Author

Agreed that the key will be a string. There might be a case for having a data field in the Request as well for unstructured data requests. We would like to keep it as open for extension as possible.

I think that these are good first steps but there are still many things to consider. The dependencies will have to be looked at as well, if it is cross-platform and reliably simple to install then it shouldn't be a problem.

@gjunquera
Copy link

About the response:

message Response {
    optional bytes data = 1;
    optional bytes error = 2;
    repeated KVPair structured_data = 3;
}

I think the structure above is not a good solution for all kind of Responses.

For example:
The function "info" from section "provider" returns the following response:

Package Name: ******
Authority: *******
Permissions: *******
(...)
Package Name: ******
Authority: *******
Permissions: *******
(...)
Package Name: ******
Authority: *******
Permissions: *******
(...)

How would we map this kind of response in a KVPair list?

For some cases we could create a specific .proto.
For example:

ProviderResponse{
  Info {
    optional string package_name = 1;
    optional string authority = 2;
    optional string permissions = 3;
    (...)
  }
  repeated Info info = 1;
}

We could wrapper this structure in the data field of an usual Response, then all commands always would return the same structure.

I like the idea of creating specific responses because Protocol Buffers automatically generates the parser which makes data manipulation really easy.

What do you think?

@metall0id
Copy link
Contributor Author

As an example (probably not nearly the best way), that response could be structured as follows:

Key              | Value
row1:packagename | *******
row1:authority   | *******
row1:permissions | *******
row2:packagename | *******
row2:authority   | *******
row2:permissions | *******

@gjunquera
Copy link

Well.... I don't like too much the idea of using separators in the Key or in the Value.
Of course in this case it is really simple to parse, but maybe some Responses would require more separators making it harder to parse and to manipulate the response data.

Today I can't say one case where a Response would require adding a lot of separators, but maybe in the future Mercury could have more complex Responses and making specific .proto could make a lot of sense.

I agree with the Response structure suggested, but I think we could create new .protos for complex Responses and wrapper them in the data field =).

@metall0id
Copy link
Contributor Author

I completely agree with you. New .protos could even be wrapped inside a value in the KVPair with a nice key name.

The point is that we need a very generalised Response structure in order to be able to do these sorts of things and not limit ourselves at all by the structures on the outer layers :)

@gjunquera
Copy link

OK!

What about encryption? Did you think something about it?

I am not a cryptography expert, but follows my first thoughts about it:
I think encryption should be configurable, the user should activate encryption, then he should input a secret. In Mercury I don't see a secure way to do encryption without requesting user input.
The secret could be just a password that would be used as basis for symmetric encryption, or he could generate RSA public/private key pairs that would be "installed" in both sides (client and server). Maybe the user could generate and install his own certificates.
At the server side the key could be protected by Android uid sandboxing, at the client side I have no idea how to protect the key =(,

Well.... just an idea to start the discussion about encryption. As I said I don't know too much about encryption, maybe there are much better ways to do it.

@metall0id
Copy link
Contributor Author

I would choose user configurable client and server certificates as the way to go.

This is probably not functionality that is required right now though, so I think we will let the idea rest for a while :)

@ikelos
Copy link
Contributor

ikelos commented Aug 29, 2012

Right, so having built up a tech-preview of a system built on protocol buffers, it turns out that whilst they produce extremely small data transmissions for the network, they're pretty slow to construct (and add a significant overhead). I'm currently investigating thrift to see if that gets the balance of bandwidth to computation time a little better. I'll report back with how I get on...

@luander
Copy link
Contributor

luander commented Aug 29, 2012

I really did not write a proof of concept to measure the performance of Protocol Buffers over XML parsers. But Google says that "[Protocol Buffers] are 3 to 10 times smaller, and 20 to 100 times faster". (source: https://developers.google.com/protocol-buffers/docs/overview#whynotxml )
So I think the performance could be even better with Procol Buffers than it is with XML.
What I know is that Protobuffers are a little bit trickier to construct. At least for python.

@gjunquera
Copy link

I implemented Mercury working with protobuffers on this branch: https://github.com/gjunquera/mercury/tree/temp
I didn't test too much but it seems that all Mercury commands are working well with no significant performance impact. Only commands are working, all modules must be broken.

If you have some time, run this implementation and give your thoughts about the performance.

@ikelos
Copy link
Contributor

ikelos commented Aug 31, 2012

So I was trying out protocol buffers with the reflection code (which has a very high communications requirement, hence the performance needs). @gjunquera, it doesn't look like you've converted the reflection code over, and the performance of the remaining elements isn't going to be anywhere near as sensitive as that is, so testing it out won't help a great deal. It turns out, after some tweaking, that the performance with protocol buffers is marginally better than with the XML parsing code we had in there, but not by much (maybe 6% faster). Still, it saves me having to re-implement it in thrift, so I'm going to keep pushing ahead on that front.

@gjunquera
Copy link

I forgot to say that I didn't change reflection code, sorry about it.

If it is defined that protobuf is the way to go, when are you planning to start the implementation? Our team (samsung sidi) can help with the implementation.
Do you have more points to discuss about the .proto file?

@ghost ghost closed this as completed Dec 14, 2012
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants