Datomic Query Optimizer

Rationale

Sometimes people new to Datomic datalog write queries that are not performant. Other times queries are generated by application logic and it's much simpler to generate them naively, and let some other part of the application decide how to improve the query later. In other cases, the optimal form for a Datomic query might change over the lifetime of an application, i.e. due to real changes in the domain data being tracked, the addition of batch jobs, etc. Slow queries can significantly impact service health, dev and data science feedback cycles, etc., so query performance is a justified optimization target.

This project provides a "good enough" Datomic query improver based on two heuristics to handle these prblems. The logic helps avoid basic mistakes, rather than finding unique high performance edge cases. If you are writing awesome Datomic queries, it won't likely make them more awesome and might make some of them less awesome. If you're writing or generating bad Datomic queries, it will probably make them significantly less bad, and might make a large set of them awesome.

The Heuristics

Two heuristics are used. Both are described in the Datomic docs: join along and most selective clause first. These two heuristics are applied in this order to any where clause. Treating the original query's :where clauses as an unordered set from which to draw the next :where clause:

(1) Rank clauses by number of new variables introduced. (2) For all clauses that introduce the fewest new variables, pick (by attribute) the clause with the fewest datoms.

If a query contains arguments to :in, those are used as the initial bindings. Otherwise, the first clause will be chosen based on criteria (2). Each clause is then added in order to build a logical ordering based on this sort (determined by bindings introduced in relation to previous clauses). No guarantees are made about ordering when exact ties occur.

In order to compute (2), counts of all datomis with each attribute must be calculated. Note that this is available through the Datomic client API using db-stats (although not yet documented). If using peer, or not wanting to depend on db-stats, an example stats calculating approach is in dev/statics.clj. These statistics are updated lazily using datoms and stored in the database.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dev		dev
src/org/parkerici		src/org/parkerici
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
deps.edn		deps.edn
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datomic Query Optimizer

Rationale

The Heuristics

About

Releases

Packages

Languages

License

ParkerICI/datomic-query-improver

Folders and files

Latest commit

History

Repository files navigation

Datomic Query Optimizer

Rationale

The Heuristics

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages