Skip to content

Using Fleece

Jens Alfke edited this page Jan 27, 2024 · 9 revisions

Version 1.3 … Feb 11, 2022

This document is a guide to the Fleece APIs for C and C++, at a higher level than the API docs in the headers.

I. Concepts

To begin with, I'll explain the concepts behind the API, without the language-specific details of types and methods. Those will come next.

1. Values

Fleece's data model is almost identical to JSON's, with the addition of a binary data (blob) type. This means Fleece has seven data types: null, boolean, numbers, strings, data, arrays, and dictionaries. Arrays can contain any types, and dictionaries have strings as keys and values of any types.

2. Parsing

When Fleece-encoded data is parsed, it isn't converted into heap-allocated objects. Instead, the Fleece objects used in the API point directly into the encoded data. This means the parsing is incredibly fast and allocates no memory!

Warning: The downside of this is that if the encoded data is invalidated, for example by freeing the heap block containing it, all the Fleece objects are invalidated too, and accessing them will likely return garbage or crash. There is a Doc class that helps by acting as a safer ref-counted container for the data.

These Fleece objects are immutable, since they're frozen inside the encoded data block. So how do you create new ones? There are two ways: with an Encoder, or using mutable objects.

3. Encoding

An Encoder is an object that generates encoded Fleece data using a streaming API. You write a value to it, and get the encoded data at the end. The value you write is probably a collection, so you call beginArray or beginDict, then write the values one at a time, then endArray or endDict. Those values can be scalars, given as C/C++ types like float or string, or they can themselves be collections that require a nested begin and end call. (Details are below.)

4. Mutable Collections

Fleece also supports mutable collections. Unlike the ones parsed from encoded data, these are individually allocated from the heap, like the objects in a typical collection API.

You can create an empty mutable array or dictionary from scratch and use it as the root of an object tree, adding values to it (including immutable value references.) You can also create a mutable copy of an immutable collection. This copy operation can be shallow (only the collection is copied; its contents will be the same values as the originals) or deep (values inside the collection are mutable-copied too, recursively.)

Note: Under the hood, making a shallow mutable copy is very cheap: instead of copying the entire collection into heap memory, it just allocates a small stub that points back to the immutable object. Its contents are inherited from the source object, as in JavaScript. As you make changes to the mutable collection, those are added to its heap data, shadowing the original contents. (You don't need to know about this to use collections, but it's pretty cool.)

To save a mutable object as encoded data, you write it to an Encoder. (See “Workflow”, below.)

5. Memory Management

Mutable Values

Mutable values are reference-counted: Array and Dictionary each have retain and release functions that increment/decrement the reference count. Adding a mutable value to a collection also increments its reference count. The value (and its contents) will remain alive as long as its reference count is positive.

Note: In C++ the reference counting is done for you. MutableArray and MutableDict are a type of smart pointer that retains the value while in scope.

Immutable Values

Immutable values are, as explained above, at the mercy of the memory block they're parsed from. Their lifespan is the same as the lifespan of that block; they cannot be individually retained or released. You must watch out for this, and avoid freeing that memory prematurely!

Warning: A mutable collection may contain immutable values; this happens when you make a shallow mutable copy of a nested immutable collection. The mutable collection cannot retain those values — they're still limited to the lifespan of their parsed memory block. If that block gets invalidated, the mutable collection remains valid but its contents will be garbage! If this is a problem, making a deep copy of the collection with the kFLDeepCopyImmutables mode will ensure that all values within the collection are copied to the heap.

6. Workflow

A typical workflow for updating persistent Fleece data is:

  1. Read the data into memory
  2. Parse the data, which returns a reference to the root collection
  3. Make a mutable copy of the root
  4. Make changes to the copy (possibly making mutable copies of nested collections)
  5. Encode the copy
  6. Write the encoded data back to storage
  7. Free the original and updated data blocks

If the persistent data doesn't exist yet, you'd initialize by creating a new empty mutable collection, then jumping to step 4.

II. C & C++ Types and Methods

Now let's go into the actual details of the API.

0. Boilerplate

The first thing you need to do is include the right headers. Make sure to add the Fleece source tree's API subdirectory to the compiler's header search path.

In C

#include <fleece/Fleece.h>

In C++

#include <fleece/Fleece.hh>
#include <fleece/Mutable.hh>  // Only needed if you use mutable classes

using namespace fleece;       // Optional but recommended :)

Note: There is also an internal C++ API, in the package fleece::impl. This used to be the public API, but it's been superseded. Please don't use it!

1. Value Types

The basic Fleece API type is Value (FLValue in C.) This is a reference to a value of any type. Of the other data types, Array and Dictionary have their own C++ types (FLArray and FLDict in C) which are subclasses of Value. Scalar values are just accessed using C/C++ types.

In C

In C, FLValue, FLArray and FLDict are typedefs for opaque pointers. The methods on them are functions that take the receiver as the first parameter. The name of the function reflects the type it operates on; for example, Dict's count method is called as the function FLDict_Count(FLDict).

Note: C doesn't support inheritance, so FLArray and FLDict are not type-compatible with FLValue. If you need to pass one of those to a function parameter that expects an FLValue, just type-cast.

in C++

In C++, they're classes that act as “smart pointers”, even though you don't use pointer syntax (*, ->) with them. They are reference types, not value types like std::vector or std::map.

2. Supporting Types: Slices

We need to mention a few support types that are lower-level than values. These are used to represent both strings and binary data blobs. They have more extensive documentation, but here's a short intro:

A slice is a simple struct consisting of a pointer and a length. All it does is point to a range of memory. It doesn't imply any ownership; it just says “over here, for this many bytes.” Nonetheless, it's very useful, and there are a lot of utility methods on it, including ones to convert to and from C++ and C strings.

alloc_slice is a subclass that does own memory. It always points to a heap block that it manages. The heap block is reference-counted, so it's freed when the last alloc_slice pointing to it goes out of scope.

Note: The null slice is {NULL, 0}. You test a slice for null by comparing its pointer (buf) with NULL. Comparing its size with 0 isn't the same: it's possible to have an empty but non-null slice. (C++ slice has a bool conversion operator that tests for null.) It is, however, illegal to have a slice with a null pointer but nonzero size.

In C++

A slice literal can be written as a string literal with _sl appended, e.g. "something"_sl.

The constant nullslice represents a null slice.

The C++ alloc_slice manages ref-counting automatically.

There are a great number of utility methods on slices; look in fleece/slice.hh for details. Comparisons, matching, splitting, hex or base64 encoding...

In C

In C, these types are called FLSlice and FLSliceResult, and an FLSlice literal can be written as FLSTR("something").

The constant kFLSliceNull represents a null slice.

As usual, reference-counting is up to you in C: whenever an FLSliceResult is returned from an API call, you are responsible for calling FLSliceResult_Release when you're done with it.

There are only a couple of utility functions in C. FLSlice_Equal compares two slices for equality. FLSlice_Compare is a 3-way comparison, like strcmp(). You can create new ref-counted FLSliceResults with FLSliceResult_New and FLSlice_Copy.

3. Type Testing and Conversion

Value's type property returns an enumeration that identifies which type of value it really is.

There are methods to get a scalar from a Value, or to cast it to a more specific (collection) type. They all return an empty default result if the value is not of the assumed type:

  • Boolean: asBool (returns false if the value is not boolean)
  • Numbers: asInt, asUnsigned, asFloat, asDouble (returns 0 if the value is not numeric)
  • Strings: asString (returns a null slice if the value is not a string)
  • Data: asData (returns a null slice if the value is not data)
  • Arrays: asArray (returns NULL if the Value is not an Array)
  • Dicts: asDict (returns NULL if the Value is not a Dict)

4. The Dreaded NULL

A Value can be a NULL pointer (i.e. a reference to address 0.) This is different from a JSON null! It means that there is literally no value. It's equivalent to JavaScript's undefined. It's returned from collection getters when the requested index or key doesn't exist, or from asDict / asArray when the value is not of the required type. It's also the initial state of a Value in C++.

It's safe to operate on a NULL Value — in general, any operation on it will return NULL, or false, or zero. (If you're used to Objective-C, it acts like nil. And it might remind you of ?. in Swift or Kotlin.) This is unusual, but it has the benefit of making it easy and safe to work with values whose schema is unknown or can't be guaranteed. For example, you can dive into nested properties like this:

double width = root.asDict()["dimensions"].asArray()[0].asDouble();

There are six things that could go wrong here, if the data isn't in the expected form: root might be NULL, or it might not be a dictionary, or it might not have a dimensions property, or the value of that property might not be an array, or that array might be empty, or its first value might not be a number. If anything goes wrong, all that happens is that width is set to 0. (If NULL weren't safe, you'd have to insert six error checks, turning that one line of code into about 18, or else risk crashing!)

If you want to distinguish between those failures and the case where width exists but really is 0, you can do this:

Value widthVal = root.asDict()["dimensions"].asArray()[0];
if (widthVal.type() != kFLNumber)
    throw "missing or invalid width!"
width = widthVal.asDouble();

This works because type() called on a NULL Value returns kFLUndefined.

5. Accessing Collections

The collection API should be pretty familiar if you've used other frameworks...

  • Array and Dict both have a count property, and a boolean empty property (for convenience, and because in some cases it can take longer to determine the actual count than to just check if the collection is empty.)
  • Arrays are indexed by (unsigned) integers starting at zero. Getting an index past the end of an array just returns NULL.
  • Dicts are indexed by strings. If the Dict doesn't contain the key you requested, it returns NULL.

Mutable Collections

All collections have an isMutable property that tells you if the instance is actually mutable, and an asMutable property that type-casts to the appropriate mutable subclass, or returns NULL if it's not mutable.

You can create a mutable collection from scratch by calling MutableArray::newArray or MutableDict::newDict. Or you can copy an existing collection by calling its mutableCopy method. There are three modes for copying, which are progressively more expensive (but safer):

  • kFLDefaultCopy: A shallow copy that makes a new mutable collection object but leaves its values the same.
  • kFLDeepCopy: Nested mutable collections will also be copied. This is useful if you want to ensure that no other references can modify the object tree.
  • kFLDeepCopyImmutables: Immutable collections (and scalars) are also copied. The resulting object tree is now entirely heap-based, detached from any parsed Fleece data, so there's no danger of dangling references if that data is invalidated.

Mutable collections have a set method to store a value at a particular index/key, and a remove method to remove one. MutableArray also has append to add a value at the end. The setters return a special Slot type, which is a reference to where the value is stored; Slot has setter methods that store different types into it. For example, to store 17 into the first item of an array, call array.set(0).setInt(17).

Note: Collections can contain null values, but not NULL.

MutableArray and MutableDict also have the slightly-confusing-but-useful methods getMutableArray and getMutableDict. These are very useful when you have an immutable collection and want to make a mutable copy of it with some nested values changed.

6. Iteration

Arrays and Dicts have iterators that let you look at their values one by one. Regular iterators are “shallow”, but there's a DeepIterator class for when you ned to recursively visit every value in a tree.

It's OK to iterate over a NULL collection reference; it acts like an empty collection.

Warning: As with most other collection APIs, it's illegal to modify a mutable collection while you're iterating it. There's no explicit check for this, but the results will be, as they say, “undefined”.

In C++

The idiom is that you use a for loop to construct the iterator, test whether it's done, and move it to the next item:

for (Array::iterator i(myArray); i; ++i) {
    doSomethingWith( *i );
}

for (Dict::iterator i(myDict); i; ++i) {
    doSomethingWith( i.key(), i.value() );
}

In C

Everything's a bit more awkward in C, isn't it? 😝

FLArrayIterator iter;
FLArrayIterator_Begin(myArray, &iter);
FLValue value;
while (NULL != (value = FLArrayIterator_GetValue(&iter))) {
    doSomethingWith( value );
    FLArrayIterator_Next(&iter);
}

FLDictIterator iter;
FLDictIterator_Begin(myDict, &iter);
FLValue value;
while (NULL != (value = FLDictIterator_GetValue(&iter))) {
    FLString key = FLDictIterator_GetKeyString(&iter);
    doSomethingWith( key, value );
    FLDictIterator_Next(&iter);
}

WARNING: It is illegal to call FLArrayIterator_Next or FLDictIterator_Next when the iterator's already at the end! In particular, do not do this:

// Incorrect code, for demonstration only:
do {
    value = FLDictIterator_GetValue(&iter);
    if (value) { ... }
} while (FLDictIterator_Next(&iter));    // wrong! ☠️

This looks reasonable, but if myDict is empty the iterator starts out at the end, so the first call to FLDictIterator_Next is already illegal. The recommended loop in the first listing avoids this problem.

III. Reading And Writing

1. Generating Fleece

As described previously, an Encoder is an object that generates encoded Fleece data using a streaming API. You use it like this:

  1. Construct an Encoder
  2. Tell the encoder to begin a collection: beginArray or beginDict
  3. Write values to the encoder, which adds them to the collection:
  4. If the collection is a dictionary, call writeKey to define the key for the value.
  5. To write a scalar, call: writeNull, writeBool, writeInt, etc.
  6. To write a collection, recursively perform steps 2–4: Begin the collection, write values, end it.
  7. End the collection: endArray or endDict
  8. Call finish, which returns the encoded data

The most recently begun collection is the “current collection” that values will be added to. When that collection is ended, the containing collection becomes current.

Example:

Encoder enc;
enc.beginDict();
enc.writeKey("dimensions");
enc.beginArray();
enc.writeInt(10);
enc.writeInt(16);
enc.endArray();
enc.writeKey("color");
enc.writeString("blue");
enc.endDict();
alloc_slice encodedData = enc.finish();

If you already have the root collection as an Array or Dict object, just write it as the only value, without a begin or end call:

Encoder enc;
enc.writeValue(myRootDict);
alloc_slice encodedData = enc.finish();

Encoder Errors

Encoding can fail for a number of reasons, mostly through programmer error (like not nesting begin/end calls properly), but also if memory runs out.

The individual begin/end/write methods return false on error, but it's easiest to just ignore those until the end of encoding and then check whether finish returned a null slice. If so, you can check the Encoder's error and errorMessage properties for details.

2. Parsing Fleece

The fastest, most dangerous way to parse Fleece is to call Value::fromData, which takes a slice pointing to a block of Fleece data, and returns a pointer to its root object. (This pointer is not heap-allocated; it points inside the input data.) If the data isn't valid, NULL is returned.

On the plus side, this allocates no memory; its only overhead is a quick scan through the data to make sure it’s not corrupted. The drawback is that you bear full responsibility for making sure the lifespan of that block of encoded data outlives any of the object pointers you got from it.

The better way to parse Fleece is to use a Doc object. Its constructor takes an alloc_slice containing Fleece data – this has to be heap-allocated, but it’s ref-counted, and the Doc takes ownership and ensures the data stays alive as long as it does. And Doc itself is ref-counted, so it’s easier to manage. There’s more documentation of Doc on the Advanced Fleece page.

Trust

Both methods of parsing take a trust parameter, whose value can be kFLTrusted or kFLUntrusted. This determines how much checking is done. Untrusted data is thoroughly scanned to make sure it's valid, at least to the extent that it won't lead to a crash. Trusted data goes through less (but some) checking. It's a speed vs. security tradeoff.

Warning: For security reasons, always use kFLUntrusted if the data comes from the network, or from an arbitrary file. Only use kFLTrusted if the data is under your control — e.g. a record inside a database inside your app, or if the data has already passed a previous untrusted parse, or if you just encoded it yourself. Trusting corrupt or malicious data could cause Fleece to read outside the data's bounds in memory, resulting in garbage or crashes.

3. JSON

Fleece interoperates well with JSON!

Parsing JSON

Fleece has a JSON converter that takes JSON data and returns it translated to Fleece. The usual next step is to parse the Fleece (in trusted mode) to get to its root object. The Doc class encapsulates this for convenience:

slice jsonData("{\"hello\":12345}");
Doc convertedDoc = Doc::fromJSON(jsonData);
Dict root = convertedDoc.root().asDict();

Generating JSON

There are two ways to generate JSON from Fleece:

  • You can call toJSON() on any Value and get back a JSON string.
  • You can create an Encoder whose output format is JSON, by passing the format value kFLEncodeJSON to its constructor. The Encoder works just as usual, except that its output will be JSON instead of Fleece.

Note: Remember that binary-data type that isn't in JSON? Those values turn into base64-encoded strings.

JSON5

JSON5 is a superset of JSON syntax that adds some JavaScript sugar for convenience. You can use single or double quotes; you can omit the quotes around keys; you can leave trailing commas at the end of a collection ... it's wonderful. 🤩

All of Fleece's JSON APIs support JSON5. You just need to change the method name slightly or pass an optional parameter; see the API docs for details.

IV. Advanced Topics

Ready for more? Continue to the Advanced Fleece document, if you dare!