Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up the cJSON struct #63

Open
FSMaxB opened this issue Nov 13, 2016 · 30 comments
Open

Clean up the cJSON struct #63

FSMaxB opened this issue Nov 13, 2016 · 30 comments
Milestone

Comments

@FSMaxB
Copy link
Collaborator

FSMaxB commented Nov 13, 2016

Currently theres three ways the cJSON struct can be improved:

  1. Get rid of valueint
  2. Use a union for valuestring and valuedouble
  3. Using good names for the elements

If you don't think this is a good idea, please tell me why.

Let me explain:

Getting rid of valueint

  • Having both valueint and valuedouble introduces unnecessary complexity in cJSON.
  • Having both valueint and valuedouble increases the chance of misuse by users of the library
  • With an IEEE754 compliant floating point implementation, valueint is not bringing any extra value on platforms that have integers smaller than 32 bit
    • double can store integers up to 53 bits (> 32)
    • even if you have a 64 bit integer, this code prevents it from being useful, because it only works if the integer fits into the double type completely.
    • that means, that even on a floating point implementation that is not IEEE754 compliant, the range of supported numbers is completely bounded by the implementation of double, cJSON will never print an exact value of an integer bigger than the biggest exact integer that can be represented by double
  • The valueint is using between 2-8 bytes that are completely wasted
    • it might be even more because of alignment, if we have a processor that requires 8 byte alignment and has a 64 bit pointer type and a 32 bit integer type, the struct will look like this
      • next: 8 byte
      • prev 8 byte
      • child 8 byte
      • type 4 byte
      • 4 byte padding
      • valuestring 8 byte
      • valueint 4 byte
      • 4 byte padding
      • valuedouble 8 byte
      • string 8 byte
      • even though valueint has only 4 bytes, it takes up 8 bytes in the struct

Using a union of valuestring and valuedouble

As an instance of the cJSON struct can only be either a string type or a number type, not both at the same time. Therefore they can be stored at the same place in memory.

New version of the cJSON struct

typedef struct cJSON
{
    /* next/prev allow you to walk array/object chains. Alternatively, use GetArraySize/GetArrayItem/GetObjectItem */
    struct cJSON *next;
    struct cJSON *previous;
    /* An array or object item will have a child pointer pointing to a chain of the items in the array/object. */
    struct cJSON *child;
    /* The item's name string, if this item is the child of, or is in the list of subitems of an object. */
    char *name;
    union {
        /* The item's string, if type==cJSON_String */
        char *string;
        /* The item's number, if type==cJSON_Number */
        double number;
    }
    /* The type of the item, as above. */
    int type;
} cJSON;
  • renamed prev -> previous
  • valueint removed
  • renamed valuedouble -> number
  • renamed string -> name
  • renamed valuestring -> string
  • put string and number in a union
  • reorder the elements in the struct
@FSMaxB FSMaxB added this to the 1.0.0 milestone Nov 13, 2016
@FSMaxB FSMaxB mentioned this issue Nov 13, 2016
11 tasks
@DaveGamble
Copy link
Owner

With fairly minimal rework, one could eliminate previous too. It's barely ever read in the library, and when it is, a single temp variable could eliminate it.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 13, 2016

Yes, now that I think about it the library only ever uses it as single linked list.

@iMobs
Copy link

iMobs commented Nov 15, 2016

You might also want to play around with the order of the variables for better struct packing. Depending on the word/pointer size of the system, this could reduce the size of the object.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 15, 2016

There are potentially 3 different sizes in the new struct. The size of a pointer, sizeof(int) and sizeof(double). The order I used in the suggested version of the new struct already optimizes that for x86_64 with still being the best version for 32 bit x86. But I haven't considered all the other possible size combinations on other processors.

This would change when removing the previous pointer though.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 15, 2016

With previous removed, the following variant has no padding on all architectures I can think of:

typedef struct cJSON
{
    union {
        char *string;
        double number;
    }
    char *name;
    struct cJSON *next;
    struct cJSON *child;
    int type;
} cJSON;

@ffontaine
Copy link
Contributor

I would suggest to give a name to the union such as data or u indeed anonymous structures and unions are a C11 feature.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 15, 2016

I consider this C11 compatible (if even) pseudo code XD. That shows you that I never put this code through a compiler. But correct syntax doesn't matter for the purpose of this discussion.

@ffontaine
Copy link
Contributor

Sorry, I think my comment was incomplete and didn't give you enough background. I wanted to point out that I would like to keep cjson compatible with C99 and to avoid any C11 features such as anonymous structures. Indeed some old gcc like gcc 4.3 (which is used to compile cjson for blackfin target on buildroot) will not be able to compile cjson anymore.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 15, 2016

Actually, cJSON is keeping compatible with C89 and avoids any C99 and C11 features (that haven't been available in C89). Don't worry, this GitHub issue is not cJSONs source code!

@Dreaster
Copy link

How about changing the type to be an enum instead so that the the code will be clearer?

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 18, 2016

That sounds like a really good idea, but I thought that it is undefined behavior to use an enum with another than the predefined values. Let me check if that is actually true.

Because type is supposed to be used as bit flags, so you can check if something is array or object for example (e.g. if (json->type & (cJSON_Array | cJSON_Object)).

@Dreaster
Copy link

You are very correct. I have completly missed that the type was used as bit flags. I thought that it truly was one of the cJSON_types defines. In that case. A different name perhaps inroder to clairify. "used_types" or something like that.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 18, 2016

Renaming it to typeinfo instead of type might actually be a good idea, because it not only stores the type, but also cJSON_IsReference and cJSON_IsConst.

Another possible solution: Make it a bitfield, so cJSON_IsReference and cJSON_IsConst are not part of the type.

Also cJSON_False, cJSON_True ... can be declared ad const int instead of using a preprocessor macro.

So my current suggestion based on using a bitfield:

typedef struct cJSON
{
    union {
        char *string;
        double number;
    } data; /* the union is now named `data` */
    char *name;
    struct cJSON *next;
    struct cJSON *child;
    unsigned int type : 12; /* 12 bits allow for 5 more types to be added later */
    int is_reference : 1;
    int is_const : 1;
    /* reserve the remaining two bits, not sure if this should be done or not */
    int unused1 : 1;
    int unused2 : 1;
 } cJSON;

Notes:

  • type is unsigned in order to force the use of unsigned int for the bitfield (avoiding implementation defined behavior)
  • string and number are only accessible via data.XXX because anonymous unions and structs are not allowed in C89.
  • the bitfield uses 16 bits because that is the minimum requirement for an int.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 18, 2016

Using a bitfield fixes ->type == cJSON_XXXX if a node is a reference or contains a constant string. So it is not necessary to use & 0xFF anymore when comparing.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 18, 2016

Defining the types as const instead of #define doesn't work because it is not supported in switch case.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Nov 18, 2016

Well, let's change all of the bit fields from int to unsigned int. For this reason!

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Apr 30, 2017

In order to be fully JSON conformant, cJSON needs to support '\0' in strings. This requires storing the length for name and string. Also it should be possible to use an enum for the type.

I also think that cJSON_False and cJSON_True should not be separate types. There should be a boolean type instead and an added value in the data union.

cJSON_Raw can be removed and replaced by a flag in the bitfield. This allows specifying the type of raw JSON. But this also means that an unknown type needs to be added. Although this increases the potential of misuse (because you cannot rely on just the type anymore to determine what union member to access).

New draft of the new cJSON struct (also let's rename it to cJSON_item):

enum cJSON_Type
{ 
    CJSON_INVALID = 0,
    CJSON_BOOLEAN = 1,
    CJSON_NULL = 2,
    CJSON_NUMBER = 4,
    CJSON_STRING = 8,
    CJSON_ARRAY = 16,
    CJSON_OBJECT = 32,
    CJSON_UNKNOWN = 64
};

typedef struct cJSON_string
{
    unsigned char *data;
    size_t length;
}

typedef struct cJSON_item
{
    union 
    {
        cJSON_string string;
        double number;
        cJSON_bool boolean;
    } data; /* the union is now named `data` */
    cJSON_string name;
    struct cJSON_item *next;
    struct cJSON_item *child;
    enum cJSON_type type : 12; /* 12 bits allow adding 3 types later, not sure about the 4th because enum is signed int and it behaves weird in bit fields, maybe use unsigned int here instead of enum */
    unsigned int read_only_name : 1;
    unsigned int read_only_string : 1;
    unsigned int read_only_struct : 1; /* the cJSON struct itself should not be deallocated, useful for cJSON_PrintPreallocated */
    unsigned int raw : 1;
 } cJSON_item;

Although cJSON_string has a length property, data should always be length + 1 bytes where the last one is '\0'. Just as a precaution if somebody feeds this to a function that expects zero terminated strings.

Because unions are dangerous, macros and functions should exist for accessing everything, hiding the implementation details.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Feb 11, 2018

I thought of a better idea than using a union. Since cJSON structs are not stored in arrays, a dynamicly sized struct can be used. There can be a base struct called cJSON with type information and a next pointer for the linked list. Other types would then have a cJSON as their first member and add additional data after that.

This would have mostly two benefits:

  • Users won't use union members that they shouldn't. Accessing data always requires casting to the correct type first.
  • It uses less space.

Example:

#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

typedef int cJSON_bool;

typedef enum cJSON_Type {
    CJSON_INVALID = 0,
    CJSON_BOOLEAN = 1,
    CJSON_NULL = 2,
    CJSON_NUMBER = 3,
    CJSON_STRING = 4,
    CJSON_ARRAY = 5,
    CJSON_OBJECT = 6
} cJSON_Type;

typedef enum cJSON_NumberType {
    CJSON_NUMBER_DOUBLE = 0,
    CJSON_NUMBER_INT = 1,
    CJSON_NUMBER_LONG = 2
} cJSON_NumberType;

typedef struct cJSON_string {
    const char *string;
    size_t length;
} cJSON_string;

typedef struct cJSON {
    struct cJSON *next;
    cJSON_string key;
    cJSON_Type type : 4; /* leaves room for 9 more types */
    unsigned int is_reference : 1;
    unsigned int key_is_reference : 1;
    unsigned int data_is_reference : 1;
    unsigned int is_raw : 1;
    cJSON_NumberType number_type : 2;
    unsigned int reserved : 6;
} cJSON;

typedef cJSON cJSON_Null;

typedef struct cJSON_Boolean {
    cJSON base;
    cJSON_bool value : 8;
} cJSON_Boolean;

typedef struct cJSON_Array {
    cJSON base;
    cJSON *child;
} cJSON_Array;

typedef cJSON_Array cJSON_Object;

typedef struct cJSON_String {
    cJSON base;
    cJSON_string value;
} cJSON_String;

typedef cJSON_String cJSON_Raw;

typedef struct cJSON_Number {
    cJSON base;
    union {
        double double_value;
        int integer;
        long int long_integer;
    };
} cJSON_Number;

int main(void) {
    printf("sizeof(cJSON) = %zu\n", sizeof(cJSON));
    printf("sizeof(cJSON_Boolean) = %zu\n", sizeof(cJSON_Boolean));
    printf("sizeof(cJSON_Array) = %zu\n", sizeof(cJSON_Array));
    printf("sizeof(cJSON_Object) = %zu\n", sizeof(cJSON_Object));
    printf("sizeof(cJSON_String) = %zu\n", sizeof(cJSON_String));
    printf("sizeof(cJSON_Raw) = %zu\n", sizeof(cJSON_Raw));
    printf("sizeof(cJSON_Number) = %zu\n", sizeof(cJSON_Number));
    printf("sizeof(cJSON_Null) = %zu\n", sizeof(cJSON_Null));

    return EXIT_SUCCESS;
}

On x86_64 the sizes are:

sizeof(cJSON) = 32
sizeof(cJSON_Boolean) = 40
sizeof(cJSON_Array) = 40
sizeof(cJSON_Object) = 40
sizeof(cJSON_String) = 48
sizeof(cJSON_Raw) = 48
sizeof(cJSON_Number) = 40
sizeof(cJSON_Null) = 32

The names are not final.

I think this change would be especially beneficial for environments where heap space is limited.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Feb 11, 2018

Also note that I added the possibility of having int or long int numbers, since there have been many requests to support scenarios where double is not available.

@me-axentia
Copy link

What happened to this?
Its been more than 2 years and I can see it is planned for 2.00 but will it ever happen?
Reducing the size of cJSON would be very nice for embedded systems, like the ones I'm developing. I would like to be able to handle quite large datastructures with as low memory footprint as possible.
Just making a union of the data, removing the previous pointer and making a bitfiled of the type would go a long way.
I know it would break the ABI, but is that really a big problem?

@Alanscut
Copy link
Collaborator

Alanscut commented May 5, 2020

indeed, It seems that there are often cases where embedded systems fail due to insufficient memory, reducing the lib size really makes sense for embeded systems. But as Dave said #16 (comment), it increases the complexity of the lib, while increasing the user's debugging costs.

Nevertheless, I will try my best to continue Milestone 2.0.0, it will take some time. If you have a good solution, welcome to submit a PR.

@me-axentia
Copy link

Ok, I see. I have not started working with cJSON yet, but I plan to do it within a few months. As you say these changes may not be welcomed by everyone as they involve some trade-offs. I don't know if the changes I would like to see are suitable for everyone.
I will probably start of using cJSON as it is but may be required to make my own fork optimized for embedded systems in general and my case in particular.

@oneonce
Copy link

oneonce commented Jul 31, 2020

char *string;
“string” is the keyword/type of C++, recommend use str replace, or others

@maximkulkin
Copy link

@oneonce No, it's not. C++ has no keyword "string" and the standard string type is in "std" namespace.

@RudyFisher7
Copy link

Holy cow, this issue has been waiting a while. Is cJSON even being maintained at this point? Or should I go elsewhere? (There are also a ton of really old open issues. Not very confidence inspiring, no offense....)

@iMobs
Copy link

iMobs commented Jan 28, 2024

Holy cow, this issue has been waiting a while. Is cJSON even being maintained at this point? Or should I go elsewhere? (There are also a ton of really old open issues. Not very confidence inspiring, no offense....)

@RudyFisher7 What are you talking about, there was a release in December. This is an open source project run by volunteers, don't judge the health of it based on if there are old discussions of a feature that never got resolved. It's a bad metric.

@RudyFisher7
Copy link

RudyFisher7 commented Jan 30, 2024

No need to get defensive. If you actually look at the issues, more than half of them ARE bug reports actually, quite a few dating a few years back and still marked as open. (Names starting with _ are not found, inconsistent error reporting, etc.) If they WERE all simply enhancements I would of course agree with you. Also, the latest version is a 1.7.x version, just FYI. I guess I typically find, even community driven, repositories resolving bugs with new releases, but they remain open here, so raised a flag for me.

@FSMaxB
Copy link
Collaborator Author

FSMaxB commented Jan 30, 2024

I did give up maintaining it because of burnout a long time ago, but I've seen @Alanscut do some maintenance work since then.

@RudyFisher7
Copy link

That's understandable.

@mbratch
Copy link

mbratch commented Feb 29, 2024

Actually, I like having valueint. Some embedded platforms don't do a good job of accurately representing the their largest int with a double. Just recently I had to patch our copy of cJSON to render the ints directly because rendering to double then assigning to int internally was giving inaccurate int values. Possibly the fault of the compiler, but the do ect int parse fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests