Clean up the cJSON struct #63

FSMaxB · 2016-11-13T17:32:56Z

Currently theres three ways the cJSON struct can be improved:

Get rid of valueint
Use a union for valuestring and valuedouble
Using good names for the elements

If you don't think this is a good idea, please tell me why.

Let me explain:

Getting rid of `valueint`

Having both valueint and valuedouble introduces unnecessary complexity in cJSON.
Having both valueint and valuedouble increases the chance of misuse by users of the library
With an IEEE754 compliant floating point implementation, valueint is not bringing any extra value on platforms that have integers smaller than 32 bit
- double can store integers up to 53 bits (> 32)
- even if you have a 64 bit integer, this code prevents it from being useful, because it only works if the integer fits into the double type completely.
- that means, that even on a floating point implementation that is not IEEE754 compliant, the range of supported numbers is completely bounded by the implementation of double, cJSON will never print an exact value of an integer bigger than the biggest exact integer that can be represented by double
The valueint is using between 2-8 bytes that are completely wasted
- it might be even more because of alignment, if we have a processor that requires 8 byte alignment and has a 64 bit pointer type and a 32 bit integer type, the struct will look like this
  - next: 8 byte
  - prev 8 byte
  - child 8 byte
  - type 4 byte
  - 4 byte padding
  - valuestring 8 byte
  - valueint 4 byte
  - 4 byte padding
  - valuedouble 8 byte
  - string 8 byte
  - even though valueint has only 4 bytes, it takes up 8 bytes in the struct

Using a union of valuestring and valuedouble

As an instance of the cJSON struct can only be either a string type or a number type, not both at the same time. Therefore they can be stored at the same place in memory.

New version of the cJSON struct

typedef struct cJSON
{
    /* next/prev allow you to walk array/object chains. Alternatively, use GetArraySize/GetArrayItem/GetObjectItem */
    struct cJSON *next;
    struct cJSON *previous;
    /* An array or object item will have a child pointer pointing to a chain of the items in the array/object. */
    struct cJSON *child;
    /* The item's name string, if this item is the child of, or is in the list of subitems of an object. */
    char *name;
    union {
        /* The item's string, if type==cJSON_String */
        char *string;
        /* The item's number, if type==cJSON_Number */
        double number;
    }
    /* The type of the item, as above. */
    int type;
} cJSON;

renamed prev -> previous
valueint removed
renamed valuedouble -> number
renamed string -> name
renamed valuestring -> string
put string and number in a union
reorder the elements in the struct

The text was updated successfully, but these errors were encountered:

DaveGamble · 2016-11-13T18:29:20Z

With fairly minimal rework, one could eliminate previous too. It's barely ever read in the library, and when it is, a single temp variable could eliminate it.

FSMaxB · 2016-11-13T18:35:42Z

Yes, now that I think about it the library only ever uses it as single linked list.

iMobs · 2016-11-15T18:16:53Z

You might also want to play around with the order of the variables for better struct packing. Depending on the word/pointer size of the system, this could reduce the size of the object.

FSMaxB · 2016-11-15T18:25:03Z

There are potentially 3 different sizes in the new struct. The size of a pointer, sizeof(int) and sizeof(double). The order I used in the suggested version of the new struct already optimizes that for x86_64 with still being the best version for 32 bit x86. But I haven't considered all the other possible size combinations on other processors.

This would change when removing the previous pointer though.

FSMaxB · 2016-11-15T18:38:18Z

With previous removed, the following variant has no padding on all architectures I can think of:

typedef struct cJSON
{
    union {
        char *string;
        double number;
    }
    char *name;
    struct cJSON *next;
    struct cJSON *child;
    int type;
} cJSON;

ffontaine · 2016-11-15T19:01:50Z

I would suggest to give a name to the union such as data or u indeed anonymous structures and unions are a C11 feature.

FSMaxB · 2016-11-15T19:05:05Z

I consider this C11 compatible (if even) pseudo code XD. That shows you that I never put this code through a compiler. But correct syntax doesn't matter for the purpose of this discussion.

ffontaine · 2016-11-15T19:25:51Z

Sorry, I think my comment was incomplete and didn't give you enough background. I wanted to point out that I would like to keep cjson compatible with C99 and to avoid any C11 features such as anonymous structures. Indeed some old gcc like gcc 4.3 (which is used to compile cjson for blackfin target on buildroot) will not be able to compile cjson anymore.

FSMaxB · 2016-11-15T19:30:19Z

Actually, cJSON is keeping compatible with C89 and avoids any C99 and C11 features (that haven't been available in C89). Don't worry, this GitHub issue is not cJSONs source code!

Dreaster · 2016-11-18T10:31:44Z

How about changing the type to be an enum instead so that the the code will be clearer?

FSMaxB · 2016-11-18T12:27:02Z

That sounds like a really good idea, but I thought that it is undefined behavior to use an enum with another than the predefined values. Let me check if that is actually true.

Because type is supposed to be used as bit flags, so you can check if something is array or object for example (e.g. if (json->type & (cJSON_Array | cJSON_Object)).

Dreaster · 2016-11-18T12:35:30Z

You are very correct. I have completly missed that the type was used as bit flags. I thought that it truly was one of the cJSON_types defines. In that case. A different name perhaps inroder to clairify. "used_types" or something like that.

FSMaxB · 2016-11-18T14:55:55Z

Renaming it to typeinfo instead of type might actually be a good idea, because it not only stores the type, but also cJSON_IsReference and cJSON_IsConst.

Another possible solution: Make it a bitfield, so cJSON_IsReference and cJSON_IsConst are not part of the type.

Also cJSON_False, cJSON_True ... can be declared ad const int instead of using a preprocessor macro.

So my current suggestion based on using a bitfield:

typedef struct cJSON
{
    union {
        char *string;
        double number;
    } data; /* the union is now named `data` */
    char *name;
    struct cJSON *next;
    struct cJSON *child;
    unsigned int type : 12; /* 12 bits allow for 5 more types to be added later */
    int is_reference : 1;
    int is_const : 1;
    /* reserve the remaining two bits, not sure if this should be done or not */
    int unused1 : 1;
    int unused2 : 1;
 } cJSON;

Notes:

type is unsigned in order to force the use of unsigned int for the bitfield (avoiding implementation defined behavior)
string and number are only accessible via data.XXX because anonymous unions and structs are not allowed in C89.
the bitfield uses 16 bits because that is the minimum requirement for an int.

FSMaxB · 2016-11-18T14:57:13Z

Using a bitfield fixes ->type == cJSON_XXXX if a node is a reference or contains a constant string. So it is not necessary to use & 0xFF anymore when comparing.

FSMaxB · 2016-11-18T15:36:24Z

Defining the types as const instead of #define doesn't work because it is not supported in switch case.

FSMaxB · 2016-11-18T20:47:52Z

Well, let's change all of the bit fields from int to unsigned int. For this reason!

FSMaxB · 2017-04-30T12:12:23Z

In order to be fully JSON conformant, cJSON needs to support '\0' in strings. This requires storing the length for name and string. Also it should be possible to use an enum for the type.

I also think that cJSON_False and cJSON_True should not be separate types. There should be a boolean type instead and an added value in the data union.

cJSON_Raw can be removed and replaced by a flag in the bitfield. This allows specifying the type of raw JSON. But this also means that an unknown type needs to be added. Although this increases the potential of misuse (because you cannot rely on just the type anymore to determine what union member to access).

New draft of the new cJSON struct (also let's rename it to cJSON_item):

enum cJSON_Type
{ 
    CJSON_INVALID = 0,
    CJSON_BOOLEAN = 1,
    CJSON_NULL = 2,
    CJSON_NUMBER = 4,
    CJSON_STRING = 8,
    CJSON_ARRAY = 16,
    CJSON_OBJECT = 32,
    CJSON_UNKNOWN = 64
};

typedef struct cJSON_string
{
    unsigned char *data;
    size_t length;
}

typedef struct cJSON_item
{
    union 
    {
        cJSON_string string;
        double number;
        cJSON_bool boolean;
    } data; /* the union is now named `data` */
    cJSON_string name;
    struct cJSON_item *next;
    struct cJSON_item *child;
    enum cJSON_type type : 12; /* 12 bits allow adding 3 types later, not sure about the 4th because enum is signed int and it behaves weird in bit fields, maybe use unsigned int here instead of enum */
    unsigned int read_only_name : 1;
    unsigned int read_only_string : 1;
    unsigned int read_only_struct : 1; /* the cJSON struct itself should not be deallocated, useful for cJSON_PrintPreallocated */
    unsigned int raw : 1;
 } cJSON_item;

Although cJSON_string has a length property, data should always be length + 1 bytes where the last one is '\0'. Just as a precaution if somebody feeds this to a function that expects zero terminated strings.

Because unions are dangerous, macros and functions should exist for accessing everything, hiding the implementation details.

FSMaxB · 2018-02-11T11:20:21Z

I thought of a better idea than using a union. Since cJSON structs are not stored in arrays, a dynamicly sized struct can be used. There can be a base struct called cJSON with type information and a next pointer for the linked list. Other types would then have a cJSON as their first member and add additional data after that.

This would have mostly two benefits:

Users won't use union members that they shouldn't. Accessing data always requires casting to the correct type first.
It uses less space.

Example:

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

typedef int cJSON_bool;

typedef enum cJSON_Type {
    CJSON_INVALID = 0,
    CJSON_BOOLEAN = 1,
    CJSON_NULL = 2,
    CJSON_NUMBER = 3,
    CJSON_STRING = 4,
    CJSON_ARRAY = 5,
    CJSON_OBJECT = 6
} cJSON_Type;

typedef enum cJSON_NumberType {
    CJSON_NUMBER_DOUBLE = 0,
    CJSON_NUMBER_INT = 1,
    CJSON_NUMBER_LONG = 2
} cJSON_NumberType;

typedef struct cJSON_string {
    const char *string;
    size_t length;
} cJSON_string;

typedef struct cJSON {
    struct cJSON *next;
    cJSON_string key;
    cJSON_Type type : 4; /* leaves room for 9 more types */
    unsigned int is_reference : 1;
    unsigned int key_is_reference : 1;
    unsigned int data_is_reference : 1;
    unsigned int is_raw : 1;
    cJSON_NumberType number_type : 2;
    unsigned int reserved : 6;
} cJSON;

typedef cJSON cJSON_Null;

typedef struct cJSON_Boolean {
    cJSON base;
    cJSON_bool value : 8;
} cJSON_Boolean;

typedef struct cJSON_Array {
    cJSON base;
    cJSON *child;
} cJSON_Array;

typedef cJSON_Array cJSON_Object;

typedef struct cJSON_String {
    cJSON base;
    cJSON_string value;
} cJSON_String;

typedef cJSON_String cJSON_Raw;

typedef struct cJSON_Number {
    cJSON base;
    union {
        double double_value;
        int integer;
        long int long_integer;
    };
} cJSON_Number;

int main(void) {
    printf("sizeof(cJSON) = %zu\n", sizeof(cJSON));
    printf("sizeof(cJSON_Boolean) = %zu\n", sizeof(cJSON_Boolean));
    printf("sizeof(cJSON_Array) = %zu\n", sizeof(cJSON_Array));
    printf("sizeof(cJSON_Object) = %zu\n", sizeof(cJSON_Object));
    printf("sizeof(cJSON_String) = %zu\n", sizeof(cJSON_String));
    printf("sizeof(cJSON_Raw) = %zu\n", sizeof(cJSON_Raw));
    printf("sizeof(cJSON_Number) = %zu\n", sizeof(cJSON_Number));
    printf("sizeof(cJSON_Null) = %zu\n", sizeof(cJSON_Null));

    return EXIT_SUCCESS;
}

On x86_64 the sizes are:

sizeof(cJSON) = 32
sizeof(cJSON_Boolean) = 40
sizeof(cJSON_Array) = 40
sizeof(cJSON_Object) = 40
sizeof(cJSON_String) = 48
sizeof(cJSON_Raw) = 48
sizeof(cJSON_Number) = 40
sizeof(cJSON_Null) = 32

The names are not final.

I think this change would be especially beneficial for environments where heap space is limited.

FSMaxB · 2018-02-11T11:21:54Z

Also note that I added the possibility of having int or long int numbers, since there have been many requests to support scenarios where double is not available.

me-axentia · 2020-05-05T09:08:55Z

What happened to this?
Its been more than 2 years and I can see it is planned for 2.00 but will it ever happen?
Reducing the size of cJSON would be very nice for embedded systems, like the ones I'm developing. I would like to be able to handle quite large datastructures with as low memory footprint as possible.
Just making a union of the data, removing the previous pointer and making a bitfiled of the type would go a long way.
I know it would break the ABI, but is that really a big problem?

Alanscut · 2020-05-05T09:27:04Z

indeed, It seems that there are often cases where embedded systems fail due to insufficient memory, reducing the lib size really makes sense for embeded systems. But as Dave said #16 (comment), it increases the complexity of the lib, while increasing the user's debugging costs.

Nevertheless, I will try my best to continue Milestone 2.0.0, it will take some time. If you have a good solution, welcome to submit a PR.

me-axentia · 2020-05-06T11:55:45Z

Ok, I see. I have not started working with cJSON yet, but I plan to do it within a few months. As you say these changes may not be welcomed by everyone as they involve some trade-offs. I don't know if the changes I would like to see are suitable for everyone.
I will probably start of using cJSON as it is but may be required to make my own fork optimized for embedded systems in general and my case in particular.

oneonce · 2020-07-31T02:58:50Z

char *string;
“string” is the keyword/type of C++, recommend use str replace, or others

maximkulkin · 2020-08-10T04:18:19Z

@oneonce No, it's not. C++ has no keyword "string" and the standard string type is in "std" namespace.

RudyFisher7 · 2024-01-28T01:39:13Z

Holy cow, this issue has been waiting a while. Is cJSON even being maintained at this point? Or should I go elsewhere? (There are also a ton of really old open issues. Not very confidence inspiring, no offense....)

iMobs · 2024-01-28T17:03:35Z

Holy cow, this issue has been waiting a while. Is cJSON even being maintained at this point? Or should I go elsewhere? (There are also a ton of really old open issues. Not very confidence inspiring, no offense....)

@RudyFisher7 What are you talking about, there was a release in December. This is an open source project run by volunteers, don't judge the health of it based on if there are old discussions of a feature that never got resolved. It's a bad metric.

RudyFisher7 · 2024-01-30T15:01:28Z

No need to get defensive. If you actually look at the issues, more than half of them ARE bug reports actually, quite a few dating a few years back and still marked as open. (Names starting with _ are not found, inconsistent error reporting, etc.) If they WERE all simply enhancements I would of course agree with you. Also, the latest version is a 1.7.x version, just FYI. I guess I typically find, even community driven, repositories resolving bugs with new releases, but they remain open here, so raised a flag for me.

FSMaxB · 2024-01-30T15:25:40Z

I did give up maintaining it because of burnout a long time ago, but I've seen @Alanscut do some maintenance work since then.

RudyFisher7 · 2024-02-29T01:56:39Z

That's understandable.

mbratch · 2024-02-29T03:17:46Z

Actually, I like having valueint. Some embedded platforms don't do a good job of accurately representing the their largest int with a double. Just recently I had to patch our copy of cJSON to render the ints directly because rendering to double then assigning to int internally was giving inaccurate int values. Possibly the fault of the compiler, but the do ect int parse fixed it.

FSMaxB added the enhancement label Nov 13, 2016

FSMaxB added this to the 1.0.0 milestone Nov 13, 2016

FSMaxB mentioned this issue Nov 13, 2016

Breaking changes in 2.0.0 #60

Open

11 tasks

FSMaxB mentioned this issue Dec 19, 2016

valueint support uint64_t #87

Closed

FSMaxB mentioned this issue Mar 1, 2017

Precision problem of cJSON_AddNumberToObject(); with double and float. #117

Closed

FSMaxB referenced this issue Apr 2, 2017

cJSON.h: Deprecate valueint

5916f70

FSMaxB mentioned this issue Feb 25, 2018

Making some features compile time optional to reduce the code size. #245

Open

FSMaxB mentioned this issue Apr 22, 2018

cJSON is more wasteful of memory #258

Closed

FSMaxB mentioned this issue Nov 27, 2018

Tightly pack the cJSON struct #313

Closed

Alanscut mentioned this issue Jul 31, 2020

Waste memory #491

Open

daschfg mentioned this issue Dec 17, 2023

An 'outside the range of int' bug found in cJSON_CreateNumber #806

Open

raphael-bmec-co mentioned this issue Apr 23, 2024

cJSON_CreateNumber is truncating numbers outside the bounds of INT_MIN - INT_MAX arduino-libraries/Arduino_JSON#62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up the cJSON struct #63

Clean up the cJSON struct #63

FSMaxB commented Nov 13, 2016

DaveGamble commented Nov 13, 2016

FSMaxB commented Nov 13, 2016

iMobs commented Nov 15, 2016

FSMaxB commented Nov 15, 2016 •

edited

Loading

FSMaxB commented Nov 15, 2016

ffontaine commented Nov 15, 2016

FSMaxB commented Nov 15, 2016

ffontaine commented Nov 15, 2016

FSMaxB commented Nov 15, 2016

Dreaster commented Nov 18, 2016

FSMaxB commented Nov 18, 2016 •

edited

Loading

Dreaster commented Nov 18, 2016

FSMaxB commented Nov 18, 2016

FSMaxB commented Nov 18, 2016

FSMaxB commented Nov 18, 2016

FSMaxB commented Nov 18, 2016

FSMaxB commented Apr 30, 2017 •

edited

Loading

FSMaxB commented Feb 11, 2018

FSMaxB commented Feb 11, 2018

me-axentia commented May 5, 2020

Alanscut commented May 5, 2020

me-axentia commented May 6, 2020

oneonce commented Jul 31, 2020

maximkulkin commented Aug 10, 2020

RudyFisher7 commented Jan 28, 2024

iMobs commented Jan 28, 2024

RudyFisher7 commented Jan 30, 2024 •

edited

Loading

FSMaxB commented Jan 30, 2024

RudyFisher7 commented Feb 29, 2024

mbratch commented Feb 29, 2024

Clean up the cJSON struct #63

Clean up the cJSON struct #63

Comments

FSMaxB commented Nov 13, 2016

Getting rid of valueint

Using a union of valuestring and valuedouble

New version of the cJSON struct

DaveGamble commented Nov 13, 2016

FSMaxB commented Nov 13, 2016

iMobs commented Nov 15, 2016

FSMaxB commented Nov 15, 2016 • edited Loading

FSMaxB commented Nov 15, 2016

ffontaine commented Nov 15, 2016

FSMaxB commented Nov 15, 2016

ffontaine commented Nov 15, 2016

FSMaxB commented Nov 15, 2016

Dreaster commented Nov 18, 2016

FSMaxB commented Nov 18, 2016 • edited Loading

Dreaster commented Nov 18, 2016

FSMaxB commented Nov 18, 2016

FSMaxB commented Nov 18, 2016

FSMaxB commented Nov 18, 2016

FSMaxB commented Nov 18, 2016

FSMaxB commented Apr 30, 2017 • edited Loading

FSMaxB commented Feb 11, 2018

FSMaxB commented Feb 11, 2018

me-axentia commented May 5, 2020

Alanscut commented May 5, 2020

me-axentia commented May 6, 2020

oneonce commented Jul 31, 2020

maximkulkin commented Aug 10, 2020

RudyFisher7 commented Jan 28, 2024

iMobs commented Jan 28, 2024

RudyFisher7 commented Jan 30, 2024 • edited Loading

FSMaxB commented Jan 30, 2024

RudyFisher7 commented Feb 29, 2024

mbratch commented Feb 29, 2024

Getting rid of `valueint`

FSMaxB commented Nov 15, 2016 •

edited

Loading

FSMaxB commented Nov 18, 2016 •

edited

Loading

FSMaxB commented Apr 30, 2017 •

edited

Loading

RudyFisher7 commented Jan 30, 2024 •

edited

Loading