Don't deal with Encodings internally #48

stouset · 2013-03-14T18:35:27Z

I've brought it up incidentally in other bug reports, but it's worth bringing up separately: I don't believe that RbNaCl should concern itself with encodings.

It's a nice thought, but it has a lot of downsides IMHO. Every class that takes input or produces output now needs to have boilerplate code handling encoding or decoding. It makes the API less "clean". And it's incidental to (what I believe) should be the main focus of this library: crypto.

Cryptographic primitives understand one concept: bytes. Encoding data is, conceptually, a concern of transmission and storage layers. By handling encodings directly, this library takes on responsibilities which I believe are best suited for handling by other layers of an application.

For ease of use by tests (as @tarcieri mentioned earlier), a simple pair of helpers to convert between hex and binary is an easy solution.

In my mind, there is a clear separation of concerns for different parts of the "stack".

NaCl: high-quality crypto implementations
libsodium: cross-platform bundling of NaCl
RbNaCl: straight ruby bindings to libsodium
(various projects): extremely high-level user-facing Ruby libraries built on RbNaCl

I still think that RbNaCl is far too low-level for an average programmer to use safely (and that's okay!). There's still too much that can go wrong: users can use poor sources of randomness for keys, they can reuse nonces, they can use a key across cryptographic contexts, and all sorts of other rookie mistakes. Having documentation that warns against these sorts of things has been repeatedly proven not to be good enough.

I'd much rather RbNaCl be considered a building-block for experienced folk to build "crypto for the masses" on top of, than for it to try and take on too much responsibility.

tarcieri · 2013-03-14T18:51:09Z

Every class that takes input or produces output now needs to have boilerplate code handling encoding or decoding.

Take a look at the commit that added it. The encoding system just replaced type conversion boilerplate code for on the input arguments:

9055dc5

At the same time, the encoding system let us change the test vectors from ugly string literals to hex strings easily.

By handling encodings directly, this library takes on responsibilities which I believe are best suited for handling by other layers of an application.

If the ability to easily use bytes isn't impacted, what's the problem with doing both? Supporting hex helped us out as library authors.

As I've said many times, all you have to do is pretend the encoding system isn't there, and you will get bytes in and bytes out by default without even trying. This is why I find your complaints much-ado-about-nothing.

Now, if the default encoding were hex, instead of bytes, or if I tacked some crazy nonstandard things onto the outputs like "$xsalsa20poly1305$hexencoded$cyphertextgo" that would be a reason to complain.

I still think that RbNaCl is far too low-level for an average programmer to use safely (and that's okay!)

I don't, and understanding how the library works will be important to any of its prospective users, whether they're leveraging it through Krypt, your hypothetical library, or bare-to-the-metal.

I'd much rather RbNaCl be considered a building-block for experienced folk to build "crypto for the masses" on top of, than for it to try and take on too much responsibility.

That's great, and I've been talking to @emboss about writing a Krypt provider for RbNaCl. Have you considered helping out with Krypt instead of writing a competing library?

tarcieri · 2013-03-14T19:00:30Z

I'm not normally one to cite Node, but here's a similar API:

http://nodejs.org/api/crypto.html#crypto_cipher_update_data_input_encoding_output_encoding

Not saying that's a good thing, just that this sort of thing isn't entirely unprecedented.

stouset · 2013-03-14T21:05:50Z

Take a look at the commit that added it. The encoding system just replaced type conversion boilerplate code for on the input arguments:

To be fair, that's because the existing conversion boilerplate in a lot of cases already expected use hex as a default.

If the ability to easily use bytes isn't impacted, what's the problem with doing both? Supporting hex helped us out as library authors.

It ties every single API down to having to accept an encoding parameter. One that, IMHO, will probably end up becoming a named parameter / options hash, necessitating a breaking API change. It makes every API responsible for something that's conceptually outside of its scope.

As I've said many times, all you have to do is pretend the encoding system isn't there, and you will get bytes in and bytes out by default without even trying. This is why I find your complaints much-ado-about-nothing.

As a user of the library, I can ignore it. As someone who's interested in contributing to the library itself, I think it's a bad decision. It's not catastrophic, I can't think of any vulnerabilities it would introduce, and it certainly seems like a clean implementation of the concept. But it places the responsibility of encoding in the wrong place (along the lines of attr_accessible/attr_protected in Rails 3 vs. strong_parameters in Rails 4).

It's a solution to the problem, but not the right solution to the problem.

I still think that RbNaCl is far too low-level for an average programmer to use safely (and that's okay!)

I don't, and understanding how the library works will be important to any of its prospective users, whether they're leveraging it through Krypt, your hypothetical library, or bare-to-the-metal.

If users can simply ignore documentation to use a cryptographic library in dangerous ways, they will do so regularly, emphatically, and without hesitation. Simply refer to the seemingly-infinite number of webpages out there that demonstrate using crypto in ECB mode, or using static nonces. For instance, the highest-voted answer on the topic on StackOverflow encourages exactly this. And the author's response when called on it? "In most commercial encryption settings, though, [using a new initialization vector every time] is a bit of overkill." And this is a contributor with over 16k reputation: an experienced developer by any metric.

Good documentation is not enough for a programmer without a background in cryptography to use cryptographic libraries correctly. The only way is to effectively force them to use it correctly (as much as is possible) by default. I am utterly convinced that any library which expects users to generate their own nonces and ensure they are used only once, or that allows users to use any string of appropriate length as a cryptographic key is simply not fit for use by cryptographic novices.

Put another way, if bad cryptography could have been solved by adequate documentation, it would have been done so by now.

That's great, and I've been talking to @emboss about writing a Krypt provider for RbNaCl. Have you considered helping out with Krypt instead of writing a competing library?

Krypt appears to have completely different goals than I do. Exposing concepts like PKCS5 and HMAC to novices is the wrong way to go in my opinion, and I have no interest in having a library with pluggable backends. My primary goal is a library where the default is to be extremely hard to use incorrectly. It should support high-level use-cases like "hide this data" or "prove that this data hasn't been tampered with", as opposed to merely providing good choices for cryptographic primitives. Details like unique, one-use nonces or keys that must be generated from cryptographically random sources should be opaque and require active effort to bypass built-in protections.

As Thomas Ptacek said, "I don't care what you write, but if you go into it thinking that the dangerous stuff is in the primitives like the AES core, and if you just stick to the glue you'll be safe, you're gonna have a bad time." By supporting high-level use-cases, I intend to reduce the glue code users have to write to implement tasks involving crypto to near-zero.

I'm not normally one to cite Node, but here's a similar API:

Encoding aside, I think anyone who looks at that API can agree that it should probably not be considered a template on which to build crypto APIs.

Better might be to consider NaCl itself. Or OpenSSL. OpenSSL allows for encoding, but only on its command-line interface and not the API itself. The only "encodings" the OpenSSL APIs support directly are to convert things like keys to standardized interchange formats: PEM, PKCS7, PKCS12, etc.

On the other hand, Google's KeyCzar's Python bindings treat all cryptographic inputs as byte buffers, and all cryptographic outputs as Base64 strings. The Java bindings have overloaded method calls that input bytes and output bytes, input Strings and output Base64 strings, and input byte buffers and output byte buffers. Keys are managed out-of-process by a command-line tool that stores keys internally with metadata about their acceptable uses (a protected key store is something an extremely-high-level crypto library should provide).

I'm not sure what that points to other than: other libraries have approached this in various ways, and there's no well-trod ground here.

tarcieri · 2013-03-14T21:22:29Z

Well, I'm afraid we'll have to agree to disagree. That said, RbNaCl is shipped, the API is 1.0, and I'd like to honor Semantic Versioning if possible. If the encoding API were to be removed (which I probably wouldn't agree with) it would have to be in version 2.0. So if discussion of shipping RbNaCl 2.0 comes up, that would be the time to broach the issue. Feel free to reopen this ticket at this time. I'm afraid you've missed the window here though.

Exposing concepts like PKCS5 and HMAC to novices is the wrong way to go in my opinion

PKCS5 is perhaps a bit obsolete, but it's a very important feature (used by Rails among other things). JRuby recently vendored Krypt to provide PKCS5 support to Rails. Sodium will ship the PHC winner... how is it any different from that besides PKCS5 being a NIST standard (and a bit of a build-your-own PHF toolkit instead of a standard like PHC)?

HMAC is definitely useful to end-users as the symmetric equivalent of digital signatures. djb saw fit to include it in NaCl.

Overall, I think you're just being too hyperbolic. NaCl's goals are the same as yours, yet you seem to be arguing that djb didn't go far enough, and that by including things like HMAC (complete with built-in constant time comparison functions) he's exposing user to potential troubles. You could make that case for Poly1305, but I think NaCl includes HMAC because it has fewer sharper edges, while still being useful.

I wish you good luck on your novice-oriented crypto library. You will probably find me contributing to Krypt instead though.

stouset · 2013-03-14T21:46:27Z

To be clear, I'm not arguing against supporting things like PKCS5, HMAC, etc. But those are cryptographic implementation details and have no business being exposed to end-users who are just trying to do things like "check that this data hasn't been forged" or "hide this data from eavesdroppers".

When I'm writing a webapp, I'm not constantly having to deal with TCP packets. It's been abstracted away from me. Nobody would rightfully consider a web framework that directly exposed TCP details to be "high-level", or suitable for use by average developers. And yet we expect novices to understand, choose between, and correctly use PKCS5, PKCS7, PKCS12, AES, SHA2, HMAC, PBKDF2, bcrypt, scrypt, CSPRNGs, IVs, keys, nonces, salts, et al ad infinitum.

NaCl, libsodium, and RbNaCl are a huge step in the right direction. But they are still relatively low-level libraries that are extremely dangerous if left in the wrong hands.

namelessjon · 2013-03-14T22:18:46Z

I do kinda agree with @stouset here. Hell, I was never terribly happy with the encoding in the first place. They do complicate the API and for not that much gain. It's particularly noticable on the Box API, where you're decoding two things. I think I'd support removing it from the API in 2.0, especially for the primitives, though there could be higher level entities which support it, and delegate the crypto lower.

tarcieri · 2013-03-15T03:20:05Z

They do complicate the API and for not that much gain

Again, they certainly made incorporating hex literals as the test vectors much easier

tarcieri · 2013-03-15T03:51:20Z

Here is what I expect you will find if you try to remove the Encoding system:

There's very little code, and in fact, very little repeated code, most of it in constructors. The serialization component is DRY. You will need to add #to_s conversions every place the encoding system is used to ensure proper conversion to strings.

For every line of code dealing with encodings you delete, you will likely need to add another line to the test suite that handles converting the test vectors from hex to binary. Is this really helping, or is it churn? And it this feature really so... not helpful to end users of RbNaCl that it's worth taking away and hiding from them?

You keep making claims about how it harms the codebase. I think it really helped converting the test vectors to hex, and that conversion made the code a lot more readable.

I think there are further ways the encoding system could be combined with some of the existing argument checking code to further simplify input validation.

I would veto any attempts to change the test vectors from hex back to binary string literals.

stouset · 2013-03-15T07:18:17Z

I would veto any attempts to change the test vectors from hex back to binary string literals.

As well you should.

namelessjon · 2013-03-15T09:43:24Z

No objection from me either on keeping those as hex (That was a sensible transition), but it's a trivial two lines of helper code in the specs (see https://github.com/cryptosphere/rbnacl/blob/master/spec/spec_helper.rb#L17) to make a make a method which returns the raw test vector when given a name, which also eliminates the 'sometimes we hex decode a test vector for e.g. :message, but sometimes we don't e.g. :key' so you have

let(:key) { test_vector :secret_key }
# and 
let(:box) { SecretBox.new(key) }

# instead of
let(:key) { TestVectors[:secret_key] }
# and
let(:box) { SecretBox.new(key, :hex) }

In the latter case, many methods need to be aware the vectors are hex encoded. In the former, they need no such knowledge.

I also suspect it will break in a probably not uncommon case with the API as it is: Your private key is a PrivateKey, but the public key you got from a message and is hex encoded. So you pass it into the constructor and I think at that point it goes boom, as it tries to hex decode a binary key.

tarcieri closed this as completed Mar 14, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't deal with Encodings internally #48

Don't deal with Encodings internally #48

stouset commented Mar 14, 2013

tarcieri commented Mar 14, 2013

tarcieri commented Mar 14, 2013

stouset commented Mar 14, 2013

tarcieri commented Mar 14, 2013

stouset commented Mar 14, 2013

namelessjon commented Mar 14, 2013

tarcieri commented Mar 15, 2013

tarcieri commented Mar 15, 2013

stouset commented Mar 15, 2013

namelessjon commented Mar 15, 2013

Don't deal with Encodings internally #48

Don't deal with Encodings internally #48

Comments

stouset commented Mar 14, 2013

tarcieri commented Mar 14, 2013

tarcieri commented Mar 14, 2013

stouset commented Mar 14, 2013

tarcieri commented Mar 14, 2013

stouset commented Mar 14, 2013

namelessjon commented Mar 14, 2013

tarcieri commented Mar 15, 2013

tarcieri commented Mar 15, 2013

stouset commented Mar 15, 2013

namelessjon commented Mar 15, 2013