Vision for Dexie #427

dfahlander · 2016-12-29T14:13:32Z

TL;DR

Just want to share my thoughts and visions for dexie. This issue is a working draft and I may update this text going forward.

Support for complex expressions

Chained where clauses and orderBy() on collections are something that have been asked for (#87, #155, #167, #168, #171, #228, #297). These can be realized using compound indices and cursor joining techniques, and the resulting code does not need to be that complex. I've been designing a lot already about how an arbritary expression engine could function to allow queries like the one below:

db.friends
    .where('name').startsWithIgnoreCase('a')
    .where('age').above(25)
    .orderBy('lastName')
    .toArray();

This would require index [name+age+lastName] (or [age+name+lastName])

The query could be realized by executing the following steps internally:

let uniqueNames = await db.friends.where('name').startsWithIgnoreCase('a').uniqueKeys();
let uniqueNamesAndAges = await db.friends.where('[name+age]')
    .inAnyRange(
        uniqueNames.reduce((result, name) =>
            result.concat([[name, 25], [name, Infinity]]), [])
        , false)
    .uniqueKeys();

let collectionsToJoin = uniqueNamesAndAges.map(tuple =>
    db.friends.where('[name+age+lastName]').between(
        tuple.concat(Dexie.minKey),
        tuple.concat(Dexie.maxKey)));

// Here, we should spawn parallell cursors on collectionsToJoin (this part is missing
// in Dexie api today) but call .continue() on the cursor with the lowest value each
// time. That would make the end result sorted on the last keyPath ('lastName').

The above example will need to be generalized. The key to make it work would be to build everything on cursors, so that the result of this operation also generates a cursor to call .continue() on etc.

I have some thoughts about implementing this algorithm in a separate repository and npm package and let dexie 3.0 depend on it. The benefits of that would be that the algorithm could be shared by other indexedDB libraries as well.

The core would be to execute a complex expression. Then Collection api could generate such an expression.

One challenge is the API for expression complex expressions like:

(name = 'foo' AND (age = 99 OR (emailAddress = 'foo@bar' AND shoeSize < 7)))

There is no problem implementing the search for such query. Every expression needs to be canonicalized into a matrix with ANDs on the Y-axis and ORs on the X-axis, then execute compound queries like the one samplified earlier and do that for each column of OR statements.

The issue is more of how to find an API that allows the user to express any complex expressions. I have three possible solutions for this. Do we have any better ones to add to my three proposals?

Option 1: Extending the Collection API to put together complex collections:

db.friends.where({name: 'foo'}).and (
    db.friends.where({age: 99}).or (
        db.friends.where({emailAddress: 'foo@bar'}).and('shoeSize').below(7)))

Option 2: Allowing to execute the complex expression with object notation

db.friends.expression({
    op: 'and',
    left: {
        keyPath: 'name',
        op: 'equals',
        value: 'foo'
    },
    right: {
        op: 'or',
        left: {
            keyPath: 'age',
            op: 'equals',
            value: 99
        },
        right: {
            op: 'and',
            left: {
                keyPath: 'emailAddress',
                op: 'equals',
                value: 'foo@bar'
            },
            right: {
                keyPath: 'shoeSize',
                op: 'below',
                value: 7
            }
        }
    }
});

Option 3: Allow plain javascript expressions (á la LINQ) and parse using jsep

function (name, age, email, maxShoeSize) {
  return db.friends.filter(f =>
    f.name == name && (
        f.age == age ||
        (f.emailAddress == email && f.shoeSize < maxShoeSize)),
    {
        name, age, email, maxShoeSize // Provide closures in second argument.
    })
}

I admin that this 3rd option would boost some coolness if realized, but being pragmatic, I think it will be less intuitive than the 1rst option anyway. Arrow function are not that great on expression stuff like startsWithIgnoreCase().

The main drawback with this last option is we would need the closures to be given in a second argument to the filter method. Otherwise we couldn't query for parameters.'

After writing down these options I am leaning towards the first approach only though. Even though it is a little redundant having db.friends mentioned several time. Complex expressions are rare in real life. Most expressions could be expressed without complex parentheeses anyway. So let's optimize for the common cases but have a way to express the complex ones when needed.

Let Table derive from Collection.

With the possiblity to chain where-clauses, we would at least be able to make Table a subclass of Collection, if not just removing Table and only have Collection. There would be some API challenges of doing so. Table.prototype.delete() has a different meaning from Collection.prototype.delete() today. However, this could easily be distinguished by the presence of a parameter in the arguments to delete(). Other challenges are Table.hook and Table.mapToClass(). It would complicate the API allowing subscribing to a filtered collection and I do not see the use case right now. In that sense, it would be easier to let hook be a Table-only property and let Table derive from Collection rather than incorporating Table into Collection in total.

Non-Sparse Indexes

IndexedDB will only index certain types (numbers, strings, Dates and arrays of such). It won't index fields of type boolean, null, undefined or Object. This is good for the most parts, but some methods won't work as expected. WhereClause.notEqual() and WhereClause.noneOf() won't find objects where the properties are of a non-indexable type.

It would be great if an index could be marked as "non-sparse" using some sigil. For example an exclamation mark:

db.version(x).stores({
    friends: '++id, !isGoodFriend'
});

// Now we could use this index in not-equal and we can provide 
// non-indexable types in queries to it:
db.friends.where('isGoodFriend').notEqual(true).delete();

The challenge is how to rewrite fields of non-sparse indexes. It could be done using hooks. Before an object is saved, non-sparse indexes are type-checked. They could then be converted to a string prefixed with '\uffff' followed by a string representation of the object (such as JSON.stringify). This would lead to that non-indexable types receive a sort order between strings and arrays. When an object is retrieved from the database, it must be rewritten back to it's original form.

Another challenge is how to specify non-sparse compound indexes and non-sparse multiEntry indexes. Probably we should support the exclamation mark both within a compound index and for the whole. The type check should check for array and go through the array items and check each item.

// Allow either:
db.version(x).stores({
    friends: '++id, ![isGoodFriend+id]'
});
// or:
db.version(x).stores({
    friends: '++id, [!isGoodFriend+id]'
});

Whether or not to allow non-sparse primary keys is an outstanding question. Don't see the same benefit for that as primary keys are typically more controlled than document fields. They don't have booleans, null or undefined anyway.

Depending on the code size needed to implement this, it may suite better in an addon than included with dexie.

A generic bulk method

The idea is to narrow down all writing operations into a single bulk() method that takes an Array of operations. Each operation could be an add, put, update or delete. This is all about 1) Optimizing for bulk, 2) Minimize code, 3) Enabling new features like asynchronic bulk hooks. Let the existing methods put(), add(), update(), modify() and delete() just use that single bulk() method to perform the actual changes. This way we might actually lower the code size as a bonus. The main goal for this would be what is described below - to support asynchronic hooks #14, #372.

A simple 'writing' hook

Dexie hooks are triggers that are executed when an operation happens. They execute within the same transaction as the operation is happening within. Current 'updating' hook provide the existing object along with the changes to apply to it. IndexedDB API is not designed for this, so we have to fetch the current object prior to calling the hook. This extra fetch only happens if the hook has any subscribers. However, in many cases the user would only want to be able to inject changes whenever an add() or put() is about to be performed and doesnt care about the previous state of the object.

A 'writing' hook would be called before Dexie is about to call IDBObjectStore.add(), IDBObjectStore.put() or IDBCursor.update(). All three cases will have the full object to put.

Computed fields would be easier to implement on a 'writing' hook rather than the existing 'creating' and 'updating' hooks. For example, take the full-text search sample. It needs to subscribe to both 'creating' and 'updating' and the 'updating' case is cumbersome to write. If we had a 'writing' hook, the sample would just subscribe to that one in place of the 'creating' hook. Also, it wouldn't slow down the performance as much as with the current hooks as the fetch() wouldn't be needed.

Asynchronic bulk-hooks

As requested in #14 and #372, it would be very powerful if the hooks could return a promise and perform asynchronic tasks before resuming the underlying operation. If we enable this, addons could implement constraints on foreign keys (ON DELETE CASCADE etc). A reading hook could resolve foreign keyed members on returned instances. There are other use cases as well. However to optimize performance for bulk operations, the hooks should rather take an array of operations than being called operation-by-operation.

Thus, a redesign of the hook API should be done. Considerations:

Today we call the hooks when the system is about to perform an operation. Hooks can then subscribe to onsuccess and onerror to get notified again when the operation completes or fails. How should this be handled if we have the opportunity to rethink it?
The new hooks should just be a layer between the user and the generic bulk() method. As argument, it should take an array of operations DatabaseOperation[] and a commit function to forward the operations to, modified or not, then return the promise from the commit function, or do attitional tasks after the commit() function has resolved. Possible interface of DatabaseOperation could be:

type DatabaseOperation = AddOperation | PutOperation | DeleteOperation;

interface AddOperation<T, TPrimaryKey> {
    op: 'add';
    key?: TPrimaryKey; // May be omitted if autoIncrement.
    obj: T;
}

interface PutOperation<T, TPrimaryKey> {
    op: 'put';
    key?: TPrimaryKey; // May be omitted in add() operation with autoIncrement.
    obj: T;
    curr?: T; // Current object before the change.
}

interface DeleteOperation<T, TPrimaryKey> {
    op: 'delete';
    key: TPrimaryKey;
    curr?: T; // Current object before the deletion.
}

Hooks doesn't always need to know the current state of the object. User could provide an option whether it requests current object or not. Not fetching current object improves performance.
Maybe we're good enough with 'writing' and 'reading' hooks. 'writing' hook could replace 'creating'/'updating' in case user opts in for fetching current object - making a diff would reveal whether a create or update is taking place and what the updates would be.
The new reading hook could need to inspect the collection being queried before modifying the returned instance. Specifically for the case of auto-resolving foreign keys into properties, this might be done conditionally. For example. the dexie-relationships addon could implement Collection.with() as a lazy method setting the with-options as properties onto the returned collection, then have a 'reading' hook that will resolve foreign keys if with() was used and according to the options provided to with() method.
Backward compatiblity to current CRUD hooks are important, so we should have an adapter for them and continue testing the current unit tests for CRUD hooks.

Naming. Today we have db.table.hook('creating' | 'reading' | 'updating' | 'deleting'). Would need new API for the new hooks. What about: db.table.hook('bulk', subscriber, options), db.table.hook('read', subscriber).

type BulkSubscriber = (
        operations: DatabaseOperation[],
        commit: (operations: DatabaseOperation[]) =>Promise<TPrimaryKey>
    ) => Promise<TPrimaryKey>;

type ReadSubscruber = (
        objects: T[],
        collection: Collection<T,TPrimaryKey>
    ) => T[] | Promise<T[]>;

Collection.observe()

Extend Dexie.Observable with Collection.prototype.observe() that would let the query become a live query. In practice, it would subscribe to db.on('changes') and filter each change against the collection to find out whether the change would affect the result or not. If it would, re-execute the query (collection.toArray()) and notify the subscriber.

Challenges: The API. A simple API could be something like:

db.orders.where('customerId').equals('foo').observe(orders => {
    if (!stillWantToObserve) return false;
    updateOrdersDiv(orders); // Update view.
}).catch(error => {
   // Failed!
});

This would cover the parts that ES-Observable covers, but with a simpler API. It could be converted to an ES-Observable or Rx.Observable if needed. But should we use the ES-Observable API built-in instead? That would complicate the API.

let subsription = db.orders.where('customerId').equals('foo')
    .toObservable()
    .subscribe({
        next(orders) {
            if (!stillWantToObserve) return subscription.unsubscribe();
            updateOrdersDiv(orders); // Update view.
        }
        error(error => {
            // Failed!
        }
    });

Which one is best? Should we allow both or just one of them?

db.observe()

Sometimes you'd need to do a complex query upon several async calls in a chain and observe the result. Not always will this be possible to express in a single collection. Using zones, we could allow a very intuitive API that allow non-reactive patterns be converted to reactive patterns.

I'm taking the sample from API Reference on how to join on foreign keys:

var db = new Dexie('music');
db.version(1).stores({
    genres: '++id,name',
    albums: '++id,name,year,*tracks',
    bands: '++id,name,*albumIds,genreId'
});

// This method is non-reactive. It produces a Promise. Not an observable.
// Later down, I'll show how this could be observed anyway!
function getBandsStartingWithA () {

    // Query
    return db.bands.where('name').startsWith('A').toArray(bands => {
        return Promise.all (bands.map (band =>
          Promise.all([
            db.genres.get (band.genreId),
            db.albums.where('id').anyOf(band.albumIds).toArray()
          ]).then (result => {
            // Set genre and albums as direct properties on each result
            [band.genre, band.albums] = result;
            return band;
          });
        ));
    });
}

The sample above is nothing new.
Below comes the magic. We can reuse promise-based sequences
and convert the whole thing to an observable using db.observe():

let subscription = db.observe(getBandsStartingWithA).subscribe({
    onNext (bands) {
        if (!stillWantToObserve) return subscription.unsubscribe();
        updateBandsListDiv(bands); // Update view.
    },
    onError () {
        // fail!
    }
});

How would it be implemented?

db.observe() creates a new zone with state saying "we are in observation mode"
When any collection within the zone is realized into a promise, it checks if current zone is an observation zone. If so, it will internally call observe() on itself.
When anything changes in any of the queries involved, the following happens:
A. All internal Collection observers are unsubscribed to.
B. The entire function is re-executed and the result of the end-promise is propagated to the subscriber to db.observe().

Conflict-free sync

There's a lot research being done about Conflict-free replication data and CRDT types. These types are quite complex and require the API user to design their model for it. However, I believe I have a concept for conflict free replication based on isomorphism. The idea is this:

Application Developer encapsulates database-facing code in an isomorphic class (a service)
Each method can optionally be decorated for conflict-free replication.
When called, the decorator generates an entry in the change log about the operation and the parameters given to it. When changes reach the server, it executes the same operation using the same parameters. This is possible only if the server has the same isomorphic service and can lookup the same code to execute.

However, it is not always that conflict-free is needed. Most of the times, it is good enough with merging changes as Dexie.Syncable does now. Typically you would only need certain operations to be conflict free. Decorator expresses whether a method wants conflict-free operation or not. If conflict-free, then the method call and its arguments are registered in the change log. Otherwise, we would do as we do now - just record CREATEs, UPDATEs and DELETEs.

This requires we have an isomorphic API - a database with dexie API at the server. For some database engines it is possible to execute javascript within the database engine itself (postgres, mongodb and SqlServer with .NET stored procedures and Chakra). That way, a decorated method may be looked at as a stored procedure written in javascript.

Isomorphic Database

Have Collection implemented for other backends like PostgreSQL, MongoDB or SQLServer. The point with this is to be able to write components that could be used also when offline is not essential. The database-facing code could run on either server or client, work both offline and traditionally with fast-loading server-side rendered pages.

The text was updated successfully, but these errors were encountered:

nponiros · 2016-12-29T18:29:34Z

Lots of nice ideas here. Complex expressions and async hooks are definitely nice to haves.

Combining Table and Collection is also a nice idea. It would be nice when we have one API to care about. I find it a bit confusing now that some functions exist on Table others on Collection and some Table methods are actually Collection methods and so on. Removing WhereClause would also be nice if you ask me, maybe with complex expressions it would be possible to only have Collection and Table as a subclass of Collection and no WhereClause.

You mentioned in an other issue that you would like to refactor Dexie a bit. Do you have any thoughts if this would this be done as part of some of these ideas or before any of the things here are implemented?

Do you have any time frame in mind when you would like to start implementing the things you mention here? I would love to help but I don't know if my IndexedDB/Dexie knowledge and time (my vacation will be over soon) is enough.

Ps: Projections is also a nice feature in my opinion. So basically a way of only getting a part of an object from the database. For example if the db has an object {foo:'bar', baz:true} the projection {foo: true} would not return the baz attribute of the object.

dfahlander · 2017-01-04T02:14:54Z

Thanks @nponiros for this feedback. These are long-term wishes. Nearest in future is probably the support for complex expressions to let Table be subclass of Collection. After that, or possibly in parallell, the new bulk() method and the new hooks. Collection.prototype.observe() and db.observe() would be possible to implement in current version of Dexie.Observable. A change entry from db.on('changes') can be tested against a collection using _ctx.isMatch, _ctx.algorithm and _ctx.keyRange. However, it will be much easier when the new engine is in-place.

I've started the implementation of the new expression engine. Still not published the branch but will soon do even though it's very WIP. When done, it will replace the current Table, Collection and WhereClause classes with new ones (same names but new code). Doing it in typescript this time. Will be a big rewrite and a total refactoring, using small modules. Will still encourage PRs for the master branch. Will include everything that is being merged into master during the rewrite.

nponiros · 2017-01-04T07:14:42Z

Sounds good. I'm looking forward to seeing what you come up with. TypeScript also sounds good. Once there is a built process in place, I can start porting Syncable and Observable to TypeScript.

nponiros · 2017-01-04T21:43:57Z

@dfahlander just an idea, if you don't have a preferred order for the features maybe we could create a poll and ask the community what the most important features are/what people need the most.

dfahlander · 2017-01-05T08:13:31Z

That would be good. Do you know a good tool for that?

nponiros · 2017-01-05T08:23:14Z

Hmm I thought github supported polls. I guess I was wrong. Unfortunately I don't know any tools but this seems intersting: https://github.com/adambutler/poll . You create a poll and share the generated link. Another possibility would be to open an issue, have each feature in its own comment and let the users give thumbs up/down as a voting method. This is not very efficient I guess but at least it would be directly in the Dexie issue tracker.

dfahlander · 2017-01-05T10:21:58Z

What about this?

http://feathub.com/dfahlander/Dexie.js

nponiros · 2017-01-05T10:30:01Z

Works for me :)

lddubeau · 2017-02-20T12:41:32Z

I've added "A simple 'writing' hook" to the list on Feathub. That's the one feature I care about most among those mentioned in the list of features mentioned at the start of this issue. (I don't mean to say the other features are bad but writing hook support is a more pressing issue for me.) I recently had to customize a Dexie instance and this required writing a "create" and an "update" hook whereas a "writing" hook would have been sufficient.

bboydflo · 2017-06-23T13:33:31Z

regarding adding support for other types as indexes I agree that this makes more sense as part of an addon.

dfahlander added the Epic label Dec 29, 2016

dfahlander changed the title ~~Vision for Dexie 3.0~~ Vision for Dexie Dec 29, 2016

nponiros mentioned this issue Jan 4, 2017

Possibility to add exists with where #437

Closed

dfahlander mentioned this issue Mar 6, 2017

table.where() not working when chaining other 'table' functions #482

Open

dfahlander mentioned this issue Apr 17, 2017

Loading relationships with get()? dfahlander/dexie-relationships#19

Open

cfilipov mentioned this issue Oct 4, 2017

Is Dexie.Syncable using OT? #587

Open

nponiros mentioned this issue Oct 29, 2017

Refactoring Dexie #602

Closed

dfahlander mentioned this issue Nov 1, 2017

Contribution to Dexie #606

Open

dfahlander added a commit that referenced this issue Nov 1, 2017

Simple writing hook (part of vision for Dexie #427)

6689f1e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision for Dexie #427

Vision for Dexie #427

dfahlander commented Dec 29, 2016 •

edited

Loading

nponiros commented Dec 29, 2016 •

edited

Loading

dfahlander commented Jan 4, 2017 •

edited

Loading

nponiros commented Jan 4, 2017

nponiros commented Jan 4, 2017

dfahlander commented Jan 5, 2017

nponiros commented Jan 5, 2017

dfahlander commented Jan 5, 2017

nponiros commented Jan 5, 2017

lddubeau commented Feb 20, 2017

bboydflo commented Jun 23, 2017

Vision for Dexie #427

Vision for Dexie #427

Comments

dfahlander commented Dec 29, 2016 • edited Loading

Support for complex expressions

Let Table derive from Collection.

Non-Sparse Indexes

A generic bulk method

A simple 'writing' hook

Asynchronic bulk-hooks

Collection.observe()

db.observe()

Conflict-free sync

Isomorphic Database

nponiros commented Dec 29, 2016 • edited Loading

dfahlander commented Jan 4, 2017 • edited Loading

nponiros commented Jan 4, 2017

nponiros commented Jan 4, 2017

dfahlander commented Jan 5, 2017

nponiros commented Jan 5, 2017

dfahlander commented Jan 5, 2017

nponiros commented Jan 5, 2017

lddubeau commented Feb 20, 2017

bboydflo commented Jun 23, 2017

dfahlander commented Dec 29, 2016 •

edited

Loading

nponiros commented Dec 29, 2016 •

edited

Loading

dfahlander commented Jan 4, 2017 •

edited

Loading