Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Part 1 of NUL bytes in strings and explicit string lengths #63

Merged
merged 0 commits into from Sep 29, 2013

Conversation

Projects
None yet
4 participants
Contributor

chipdude commented Mar 27, 2012

This compiles, but the tests still use the deprecated string function so they don't compile, and the docs aren't updated. What think?

Owner

akheron commented Mar 27, 2012

Hi and thanks for the patch. It looks good in general, but a few points:

  • I'd like to see the changes in utf.c and load.c that are irrelevant to the new functionality as a separate patch.
  • Same for the new S format character in pack_unpack.c.

I also don't like some (most) of the names and signatures of the new functions. I think these would be better (signatures simplified):

  • json_string_len(value, len)json_string_ex(value, len, size_t steal) (steal described below)
  • json_string_nocheck_len(value, len)json_string_nocheck_ex(value, len, size_t steal)
  • Add size_t json_string_len(json_t *str) that returns the length.
  • json_string_set_len(value, len)json_string_set_ex(value, len, size_t steal)
  • json_string_set_nocheck_len(value, len)json_string_set_nocheck_ex(value, len, size_t steal)

steal is used to control whether the value should be stealed, to implement your second suggestion. Its type is size_t so that it can be "upgraded" to be a flags parameter later, if a need for more options arises.

Lastly, I don't want json_string_value() to get deprecated, as zero bytes inside strings is quite a rare use case after all.

What do you think?

Contributor

chipdude commented Mar 27, 2012

Happy to defer to your naming conventions, and happy to implement stealing as you suggest.

For leaving room for flags later, I suggest uint32_t or uint64_t or just unsigned, not size_t, since the latter is specialized for object sizes which flags aren't. Tiny point, don't care that much.

I am concerned about leaving json_string_value() undeprecated because, while embedded NULs are rare, you want to encourage users to write reliable code. As such you can't just leave the function that breaks on valid data sitting there begging to be used. That's what I'd call an attractive nuisance. OTOH with _len() available, you will want a function that just returns the pointer. Perhaps _string_ptr() instead? And besides, now that strings can contain NULs, it's actually lying to say that a simple pointer is the string's "value".

Owner

akheron commented Mar 28, 2012

uint32_t and uint64_t aren't as portable as size_t, and size_t is already used for flags in other parts of the API.

What comes to json_string_value() and reliable code, the decoder (load.c) currently doesn't allow embedded NULs, so you won't get strings with NUL bytes when parsing JSON. When this limitation is removed (by you or by someone else), I want a decoding flag to be added that explicitly allows parsing strings with NUL bytes. This is the best solution for backwards compatibility, as existing programs won't break because NUL bytes start appearing all over the place. New code can enable the support by using the new decoding flag.

When it's time for the next backwards incompatible version (if ever), this should of course be considered again. But at that point, we'll be wiser about how often and how the NUL byte support is actually used.

Contributor

chipdude commented Mar 28, 2012

All questions answered. Will update patch.

Contributor

chipdude commented Apr 4, 2012

OK, it's updated. Take a gander.

@akheron akheron and 1 other commented on an outdated diff Apr 4, 2012

@@ -364,7 +366,8 @@ static int do_dump(const json_t *json, size_t flags, int depth,
{
void *next = json_object_iter_next((json_t *)json, iter);
- dump_string(json_object_iter_key(iter), ascii, dump, data);
+ const char *key = json_object_iter_key(iter);
@akheron

akheron Apr 4, 2012

Owner

Note to self: Move the variable declaration to the beginning of the block.

@chipdude

chipdude Apr 4, 2012

Contributor

It's a fair cop, but society's to blame.

@akheron akheron and 1 other commented on an outdated diff Apr 4, 2012

src/pack_unpack.c
{
const char *str = va_arg(*ap, const char *);
+ size_t len;
@akheron

akheron Apr 4, 2012

Owner

Note to self: Move to beginning of block (although va_arg(,,,) isn't a function call)

@chipdude

chipdude Apr 4, 2012

Contributor

No, this one is fine. Declarations with initializers are still declarations.

@akheron akheron and 1 other commented on an outdated diff Apr 4, 2012

@@ -12,6 +12,10 @@
#include <jansson.h>
#include "jansson_private.h"
+/* C89 allows these to be macros */
@akheron

akheron Apr 4, 2012

Owner

This hunk is not relevant to the NUL byte support.

@chipdude

chipdude Apr 4, 2012

Contributor

Well, not strictly, but I added it because without the correct malloc
and free, string stealing doesn't work.

@akheron akheron and 1 other commented on an outdated diff Apr 4, 2012

@@ -599,33 +599,59 @@ static int json_array_equal(json_t *array1, json_t *array2)
/*** string ***/
+/* XXX: this breaks const guarantee */
+static char *steal_or_copy(const char *value, size_t len, size_t flags) {
+ if(flags & JSON_STEAL) {
+ if(value[len])
+ ((char *)value)[len] = 0; /* only if needed, in case of const */
@akheron

akheron Apr 4, 2012

Owner

This cannot be done. If value has been malloc'd to have exactly len bytes, we're writing memory that we're not allowed to write.

The correct way is to make users pass NUL terminated strings for stealing (by documenting the behavior).

@chipdude

chipdude Apr 4, 2012

Contributor

My intention was to document the N+1 requirement, but I can work with the must-be-terminated requirement. Breaking the const guarantee is still needed, though, because you can't pass a const char * to free().

@akheron akheron commented on an outdated diff Apr 4, 2012

src/jansson.h
@@ -249,6 +254,7 @@ int json_array_insert(json_t *array, size_t index, json_t *value)
#define JSON_SORT_KEYS 0x80
#define JSON_PRESERVE_ORDER 0x100
#define JSON_ENCODE_ANY 0x200
+#define JSON_STEAL 0x400
@akheron

akheron Apr 4, 2012

Owner

These are encoding flags. JSON_STEAL should be defined to 0x1 and the definition moved to near the string constructor functions.

@akheron akheron and 1 other commented on an outdated diff Apr 4, 2012

@@ -367,7 +367,7 @@ static void lex_scan_string(lex_t *lex, json_error_t *error)
p++;
if(*p == 'u') {
char buffer[4];
- int length;
+ size_t length;
@akheron

akheron Apr 4, 2012

Owner

I'd still like to see this change in a different patch as it's irrelevant to NUL support. You could submit this and the malloc/free #undefs in a separate "cleanup" patch.

@chipdude

chipdude Apr 4, 2012

Contributor

Dude, I'm just trying to help. I know puppies aren't free, but do you really need a separate carrier for each puppy? I changed the API to exactly what you wanted.

Also this is related. The unicode checking API changed specifically to support strings with size_t lengths -- the old usage of -1 as a magic length no longer works. And the only reason I did any of that is because the lengths come in from the user as size_t.

@akheron

akheron Apr 5, 2012

Owner

Well, I like clean history. But it's OK, I can change it while merging.

EDIT: Oops, I didn't noticed you had changed your message. It's OK then.

Owner

akheron commented Apr 4, 2012

Hi and thanks for the update. It seems you already noticed my inline comments :)

Owner

akheron commented Apr 5, 2012

The changes that are left are pretty small. Do you want me to make them or will you send one more update?

There hasn't been much movement on this issue recently, is there anything I (or others) can do to help? Testing, documentation? We encounter NUL bytes quite often and it would be great if Jansson supported them.

Owner

akheron commented Jul 29, 2013

Tests and documentation wouldn't hurt, at least :) It also seems that this pull request wouldn't merge cleanly anymore.

What do you mean by encountering NUL bytes? This patch doesn't change the decoder to be able to handle them correctly.

Contributor

lano1106 commented Aug 28, 2013

I was just browsing jansson code and I had more or less the same improvement idea.

If there is interest, I would like to offer an updated pull request for the explicit string lengths API change.

I have 3 question/remarks.

  1. In memory.c:

    len = strlen(str);
    if(len == (size_t)-1)
    return NULL;

At best, this is non standard that strlen() returns -1. At worst, this could make the code nonportable. IMO, a small comment to clarify this would be helpful.

  1. What is the policy of the project concerning inline funcs?

I see a couple str funcs that are implemented in terms of the _ex equivalent that could possibly be great candidates for inlining

  1. there is no good or bad answer except except trying to make API less error prone as possible.
    What should the creating functions do if they failed with the value param that was set to be stolen?

IMHO, it should free it inconditionally to avoid leaks in user apps.

Owner

akheron commented Aug 28, 2013

  1. It doesn't suppose that strlen() would return -1, it just avoids the theoretical integer overflow if the length of str is the maximum value of size_t minus one. In that case, len + 1 would overflow to zero.
  2. Static inline functions are ok, just remember to use the JSON_INLINE macro to be portable. However, to retain ABI compatibility, a function that has been non-inline needs to stay so, so that dynamic linking with binaries linked against an older version of Jansson still work. (Or at least the non-inline version still needs to be available as a symbol in libjansson.)
  3. You're right. Looking at the code now, some functions decref it on error but some don't. This should be fixed.

What comes to explicit string lengths, the s# format specifier that was recently added to json_pack() already allows to specify an explicit string length. Extending Jansson to fully support embedded NUL bytes in strings is a bit hard if we're gonna have to make new versions of every function that deal with C strings. It's not limited to functions that deal with JSON string values, as object keys are also strings.

Adding this support only to json_pack() (and perhaps json_unpack()) would not be as intrusive, and that would probably be the correct place to start.

@akheron akheron merged commit e4d6a9f into akheron:master Sep 29, 2013

1 check passed

default The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment