Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Part 1 of NUL bytes in strings and explicit string lengths #63

Merged
merged 0 commits into from

4 participants

@chipdude

This compiles, but the tests still use the deprecated string function so they don't compile, and the docs aren't updated. What think?

@akheron
Owner

Hi and thanks for the patch. It looks good in general, but a few points:

  • I'd like to see the changes in utf.c and load.c that are irrelevant to the new functionality as a separate patch.

  • Same for the new S format character in pack_unpack.c.

I also don't like some (most) of the names and signatures of the new functions. I think these would be better (signatures simplified):

  • json_string_len(value, len)json_string_ex(value, len, size_t steal) (steal described below)

  • json_string_nocheck_len(value, len)json_string_nocheck_ex(value, len, size_t steal)

  • Add size_t json_string_len(json_t *str) that returns the length.

  • json_string_set_len(value, len)json_string_set_ex(value, len, size_t steal)

  • json_string_set_nocheck_len(value, len)json_string_set_nocheck_ex(value, len, size_t steal)

steal is used to control whether the value should be stealed, to implement your second suggestion. Its type is size_t so that it can be "upgraded" to be a flags parameter later, if a need for more options arises.

Lastly, I don't want json_string_value() to get deprecated, as zero bytes inside strings is quite a rare use case after all.

What do you think?

@chipdude

Happy to defer to your naming conventions, and happy to implement stealing as you suggest.

For leaving room for flags later, I suggest uint32_t or uint64_t or just unsigned, not size_t, since the latter is specialized for object sizes which flags aren't. Tiny point, don't care that much.

I am concerned about leaving json_string_value() undeprecated because, while embedded NULs are rare, you want to encourage users to write reliable code. As such you can't just leave the function that breaks on valid data sitting there begging to be used. That's what I'd call an attractive nuisance. OTOH with _len() available, you will want a function that just returns the pointer. Perhaps _string_ptr() instead? And besides, now that strings can contain NULs, it's actually lying to say that a simple pointer is the string's "value".

@akheron
Owner

uint32_t and uint64_t aren't as portable as size_t, and size_t is already used for flags in other parts of the API.

What comes to json_string_value() and reliable code, the decoder (load.c) currently doesn't allow embedded NULs, so you won't get strings with NUL bytes when parsing JSON. When this limitation is removed (by you or by someone else), I want a decoding flag to be added that explicitly allows parsing strings with NUL bytes. This is the best solution for backwards compatibility, as existing programs won't break because NUL bytes start appearing all over the place. New code can enable the support by using the new decoding flag.

When it's time for the next backwards incompatible version (if ever), this should of course be considered again. But at that point, we'll be wiser about how often and how the NUL byte support is actually used.

@chipdude

All questions answered. Will update patch.

@chipdude

OK, it's updated. Take a gander.

src/dump.c
@@ -364,7 +366,8 @@ static int do_dump(const json_t *json, size_t flags, int depth,
{
void *next = json_object_iter_next((json_t *)json, iter);
- dump_string(json_object_iter_key(iter), ascii, dump, data);
+ const char *key = json_object_iter_key(iter);
@akheron Owner
akheron added a note

Note to self: Move the variable declaration to the beginning of the block.

@chipdude
chipdude added a note

It's a fair cop, but society's to blame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/pack_unpack.c
((5 lines not shown))
{
const char *str = va_arg(*ap, const char *);
+ size_t len;
@akheron Owner
akheron added a note

Note to self: Move to beginning of block (although va_arg(,,,) isn't a function call)

@chipdude
chipdude added a note

No, this one is fine. Declarations with initializers are still declarations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/memory.c
@@ -12,6 +12,10 @@
#include <jansson.h>
#include "jansson_private.h"
+/* C89 allows these to be macros */
@akheron Owner
akheron added a note

This hunk is not relevant to the NUL byte support.

@chipdude
chipdude added a note
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/value.c
@@ -599,33 +599,59 @@ static int json_array_equal(json_t *array1, json_t *array2)
/*** string ***/
+/* XXX: this breaks const guarantee */
+static char *steal_or_copy(const char *value, size_t len, size_t flags) {
+ if(flags & JSON_STEAL) {
+ if(value[len])
+ ((char *)value)[len] = 0; /* only if needed, in case of const */
@akheron Owner
akheron added a note

This cannot be done. If value has been malloc'd to have exactly len bytes, we're writing memory that we're not allowed to write.

The correct way is to make users pass NUL terminated strings for stealing (by documenting the behavior).

@chipdude
chipdude added a note

My intention was to document the N+1 requirement, but I can work with the must-be-terminated requirement. Breaking the const guarantee is still needed, though, because you can't pass a const char * to free().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/jansson.h
@@ -249,6 +254,7 @@ int json_array_insert(json_t *array, size_t index, json_t *value)
#define JSON_SORT_KEYS 0x80
#define JSON_PRESERVE_ORDER 0x100
#define JSON_ENCODE_ANY 0x200
+#define JSON_STEAL 0x400
@akheron Owner
akheron added a note

These are encoding flags. JSON_STEAL should be defined to 0x1 and the definition moved to near the string constructor functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/load.c
@@ -367,7 +367,7 @@ static void lex_scan_string(lex_t *lex, json_error_t *error)
p++;
if(*p == 'u') {
char buffer[4];
- int length;
+ size_t length;
@akheron Owner
akheron added a note

I'd still like to see this change in a different patch as it's irrelevant to NUL support. You could submit this and the malloc/free #undefs in a separate "cleanup" patch.

@chipdude
chipdude added a note

Dude, I'm just trying to help. I know puppies aren't free, but do you really need a separate carrier for each puppy? I changed the API to exactly what you wanted.

Also this is related. The unicode checking API changed specifically to support strings with size_t lengths -- the old usage of -1 as a magic length no longer works. And the only reason I did any of that is because the lengths come in from the user as size_t.

@akheron Owner
akheron added a note

Well, I like clean history. But it's OK, I can change it while merging.

EDIT: Oops, I didn't noticed you had changed your message. It's OK then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@akheron
Owner

Hi and thanks for the update. It seems you already noticed my inline comments :)

@akheron
Owner

The changes that are left are pretty small. Do you want me to make them or will you send one more update?

@dcreager

There hasn't been much movement on this issue recently, is there anything I (or others) can do to help? Testing, documentation? We encounter NUL bytes quite often and it would be great if Jansson supported them.

@akheron
Owner

Tests and documentation wouldn't hurt, at least :) It also seems that this pull request wouldn't merge cleanly anymore.

What do you mean by encountering NUL bytes? This patch doesn't change the decoder to be able to handle them correctly.

@lano1106

I was just browsing jansson code and I had more or less the same improvement idea.

If there is interest, I would like to offer an updated pull request for the explicit string lengths API change.

I have 3 question/remarks.

  1. In memory.c:

    len = strlen(str);
    if(len == (size_t)-1)
    return NULL;

At best, this is non standard that strlen() returns -1. At worst, this could make the code nonportable. IMO, a small comment to clarify this would be helpful.

  1. What is the policy of the project concerning inline funcs?

I see a couple str funcs that are implemented in terms of the _ex equivalent that could possibly be great candidates for inlining

  1. there is no good or bad answer except except trying to make API less error prone as possible. What should the creating functions do if they failed with the value param that was set to be stolen?

IMHO, it should free it inconditionally to avoid leaks in user apps.

@akheron
Owner
  1. It doesn't suppose that strlen() would return -1, it just avoids the theoretical integer overflow if the length of str is the maximum value of size_t minus one. In that case, len + 1 would overflow to zero.

  2. Static inline functions are ok, just remember to use the JSON_INLINE macro to be portable. However, to retain ABI compatibility, a function that has been non-inline needs to stay so, so that dynamic linking with binaries linked against an older version of Jansson still work. (Or at least the non-inline version still needs to be available as a symbol in libjansson.)

  3. You're right. Looking at the code now, some functions decref it on error but some don't. This should be fixed.

What comes to explicit string lengths, the s# format specifier that was recently added to json_pack() already allows to specify an explicit string length. Extending Jansson to fully support embedded NUL bytes in strings is a bit hard if we're gonna have to make new versions of every function that deal with C strings. It's not limited to functions that deal with JSON string values, as object keys are also strings.

Adding this support only to json_pack() (and perhaps json_unpack()) would not be as intrusive, and that would probably be the correct place to start.

@akheron akheron merged commit e4d6a9f into from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Sorry, commit information is not available for this pull request.

Something went wrong with that request. Please try again.