Queries

André Peric Tavares edited this page Feb 13, 2018 · 34 revisions
Clone this wiki locally

Sample files

We provide the queries below as EDN data. For your convenience, we have included a java file and a clojure file containing the corresponding query API calls.

Running the Queries

You can query the database from Java:

package datomic.samples.mbrainz;

import java.util.*;

import datomic.Connection;
import datomic.Database;
import static datomic.Peer.*;

public class Query {

  public static void main(String[] args) {
    String uri = "datomic:free://localhost:4334/mbrainz-1968-1973";
    Connection conn = connect(uri);

    Database db = conn.db();

    Collection<List<Object>> result = q("[:find ?id ?type ?gender" +
      ":in $ ?name " +
      ":where [?e :artist/name ?name] " +
      "[?e :artist/gid ?id]" +
      "[?e :artist/type ?teid]" +
      "[?teid :db/ident ?type]" +
      "[?e :artist/gender ?geid]" +
      "[?geid :db/ident ?gender]]",
      db, "Janis Joplin");

    System.out.println(result);
  }
}

or query it from Clojure:

(require '[datomic.api :as d])

(def uri "datomic:free://localhost:4334/mbrainz-1968-1973")
(def conn (d/connect uri))

(def db (d/db conn))

(d/q '[:find ?id ?type ?gender
       :in $ ?name
       :where
       [?e :artist/name ?name]
       [?e :artist/gid ?id]
       [?e :artist/type ?teid]
       [?teid :db/ident ?type]
       [?e :artist/gender ?geid]
       [?geid :db/ident ?gender]]
     db
     "Janis Joplin")

The first parameter to the Peer.q() function is the query itself. The remaining arguments are the data sources to be queried. They are bound, in the order they are passed, to the names in the :in clause of the query, according to the following rules:

  • Names starting with (or consisting of only) $ bind database values
  • Names starting with (or consisting of only) % bind rule sets
  • Names starting with ? bind data literals (e.g. the string "Janis Joplin" in the example above).
  • Any number of data sources can be passed into a query

Now we're ready to start the fun stuff.

Queries

Let's start with a simple query:

What are the titles of all the tracks John Lennon played on?

Query:

[:find ?title
 :in $ ?artist-name
 :where
 [?a :artist/name ?artist-name]
 [?t :track/artists ?a]
 [?t :track/name ?title]]

Query args:

db, "John Lennon"

Sample results:

 ["Meat City"]
 ["Intuition"]
 ["Radio Play"]
 ["Dizzy Miss Lizzy"]
 ["Sunday Bloody Sunday"]
 ["I Found Out"]
 ["It's So Hard"]
 ["Nutopian International Anthem"]
 ["Jealous Guy"]
 ["Imagine"]
 ...

Results

The results themselves are returned as a java.util.HashSet containing tuples of the results specified in the :find clause, in this case just a 1-tuple containing the track title.

Cache

Run the query for a second time, and you'll notice it executes much faster. Datomic has a local cache in each Peer, which warms up as we execute queries. These queries perform better as the JVM heap gets bigger, up to 4GB, at which size the JVM starts to have its well-known long GC pauses.

Joins

If you're not familiar with John Lennon's post-Beatles work, many of the titles might not be familiar. Let's figure out what albums these tracks are on, and in what year those albums were released. To figure out how to navigate from track to album release, and figure out what attributes are available on each entity, you can refer to the Schema.

What are the titles, album names, and release years of John Lennon's tracks?

Query:

[:find ?title ?album ?year
 :in $ ?artist-name
 :where
 [?a :artist/name   ?artist-name]
 [?t :track/artists ?a]
 [?t :track/name    ?title]
 [?m :medium/tracks ?t]
 [?r :release/media ?m]
 [?r :release/name  ?album]
 [?r :release/year  ?year]]

Query args:

db, "John Lennon"

Sample results:

 ["New York City" "Some Time in New York City" 1972]
 [["Open Your Box" "Power to the People" 1971]]
 ["Happy X-Mas (War Is Over)" "Rock Christmas, Volume 5" 1996]
 ["Imagine" "Imagine" 1971]
 ["Sisters, O Sisters" "Some Time in New York City" 1972]
 ["The Luck of the Irish" "Some Time in New York City" 1973]
 ["Well (Baby Please Don't Go)" "Live Jam" 1973]
 ["We're All Water" "Some Time in New York City" 1972]
 ...

Most of these results are after the break up of the Beatles in 1970. Can we limit the results to only include releases before 1970?

Expression Clauses

So far we've only seen where clauses that look like:

[?entity :some/attr ?value]

The full version of these clauses is:

[?entity ?attribute ?value ?transaction-time]

and they allow us to specify or query for these various aspects of our stored datoms.

However, Datomic also allows us to execute arbitrary code as part of our queries. To figure out which tracks were released before John Lennon's death, we can use a Java method or Clojure function.

What are the titles, album names, and release years of the John Lennon tracks released before 1970?

Query:

[:find ?title ?album ?year
 :in $ ?artist-name
 :where
 [?a :artist/name   ?artist-name]
 [?t :track/artists ?a]
 [?t :track/name    ?title]
 [?m :medium/tracks ?t]
 [?r :release/media ?m]
 [?r :release/name  ?album]
 [?r :release/year  ?year]
 [(< ?year 1970)]]

Query args:

db, "John Lennon"

Sample results:

 ["John & Yoko" "Unfinished Music No. 3: Wedding Album" 1969]
 ["Radio Play" "Unfinished Music No. 2: Life With the Lions" 1969]
 ["No Bed for Beatle John" "Unfinished Music No. 2: Life With the Lions" 1969]
 ["Two Minutes Silence" "Unfinished Music No. 2: Life With the Lions" 1969]
 ...

Here we use the < function to filter the results to only the John Lennon albums released before 1970.

Rules

To reuse query logic across many queries, you can create and use rules like the following:

[;; Given ?t bound to track entity-ids, binds ?r to the corresponding
 ;; set of album release entity-ids
 [(track-release ?t ?r)
  [?m :medium/tracks ?t]
  [?r :release/media ?m]]]

This rule is called track-release, and as defined in the first clause when given a bound set of track eids ?t, will bind ?r to the eids of the releases on which the tracks appear. So we can re-write the John Lennon's albums query above as:

What are the titles, album names, and release years of John Lennon's tracks?

Query:

[:find ?title ?album ?year
 :in $ % ?artist-name
 :where
 [?a :artist/name   ?artist-name]
 [?t :track/artists ?a]
 [?t :track/name    ?title]
 (track-release ?t ?r)
 [?r :release/name  ?album]
 [?r :release/year  ?year]]

Query args:

db, rules, "John Lennon"

Sample results:

["My Mummy's Dead" "John Lennon/Plastic Ono Band" 1970]
["The Luck of the Irish" "Some Time in New York City" 1972]
["Nutopian International Anthem" "Mind Games" 1973]
["Amsterdam" "Unfinished Music No. 3: Wedding Album" 1969]
["Happy Xmas (War Is Over)" "Happy Xmas (War Is Over)" 1972]
 ...

Notice the results are the same as the rule-less version above.

The rule set goes in as %, and when you supply it, you can invoke the rules by name. It is idiomatic to use round brackets for rule invocations, and square brackets for other clauses.

For the queries that follow, when rules is specified as a query arg, it refers to the set of example rules in Appendix below.

Fulltext

Let's find all of the tracks with the word "always" in the title.

What are the titles, artists, album names, and release years of all tracks having the word "always" in their titles?

Query:

[:find ?title ?artist ?album ?year
 :in $ % ?search
 :where
 (track-search ?search ?track)
 (track-info ?track ?title ?artist ?album ?year)]

Query args:

db, rules, "always"

Sample results:

["She Always Leaves Me Laughing" "The Turtles" "Turtle Soup" 1969]
["Theme Three: The Grass Is Always Greener" "Colosseum" "Valentyne Suite" 1969]
["Always, Always" "Dolly Parton" "Always, Always / No Need to Hurry Home" 1969]
["Always Coming Back to You" "Scott Walker" "Joanna" 1968]
["The Best Girls Don't Always Win" "Betty Wright" "My First Time Around" 1968]
["Always Something There to Remind Me" "Terry Baxter & His Orchestra" "The Best of '70" 1971]
["You Know I'll Always Love You" "Budgie" "Never Turn Your Back on a Friend" 1973]
...

Graph walk

Using Datomic joins and rules, we can do graph walks to determine who collaborated with whom. We can also pass in data literals of various shapes, as described in the "Bindings" section of the query docs.

Who collaborated with one of the Beatles?

Query:

[:find ?aname ?aname2
 :in $ % [?aname ...]
 :where (collab ?aname ?aname2)]

Query args:

db, rules, ["John Lennon" "Paul McCartney" "George Harrison" "Ringo Starr"]

Sample results:

["John Lennon" "The Plastic Ono Band"]
["George Harrison" "Ravi Shankar"]
["John Lennon" "Yoko Ono"]
["George Harrison" "Bob Dylan"]
["Paul McCartney" "Linda McCartney"]

What about collaborators of collaborators? Let's find Paul McCartney's collaboration network to a depth of 2.

Who either directly collaborated with Paul McCartney, or collaborated with one of his collaborators?

Query:

[:find ?aname2
 :in $ % ?aname
 :where (collab-net-2 ?aname ?aname2)]

Query args:

db, rules, "George Harrison"

Sample results:

["George Harrison" "Ravi Shankar"]
["George Harrison" "Ali Akbar Khan"]
["George Harrison" "Bob Dylan"]

We used graph-walk rules to find the network above, but we could just as easily chain queries together, passing the results of one query into another query.

Who either directly collaborated with Diana Ross, or collaborated with one of her collaborators?

Query:

[:find ?aname2
 :in $ % [[?aname]]
 :where (collab ?aname ?aname2)]

Query args:

First time:

db, rules, [["Diana Ross"]]

Second time:

db, rules, resultsOfFirstQuery

Sample Results:

First time:

["Marvin Gaye"]

Second time:

["Diana Ross"]
["Tammi Terrell"]

Who Covered Bill Withers?

Which artists have songs that might be covers of Bill Withers (or vice versa)?

Query:

[:find ?aname ?tname
 :in $ ?artist-name
 :where
 [?a :artist/name ?artist-name]
 [?t :track/artists ?a]
 [?t :track/name ?tname]
 [(!= "Outro" ?tname)]
 [(!= "[outro]" ?tname)]
 [(!= "Intro" ?tname)]
 [(!= "[intro]" ?tname)]
 [?t2 :track/name ?tname]
 [?t2 :track/artists ?a2]
 [(!= ?a2 ?a)]
 [?a2 :artist/name ?aname]]

Query args:

db, "Bill Withers"

Sample results:

["Ken Boothe" "Ain't No Sunshine"]
["Grover Washington, Jr." "Ain't No Sunshine"]
["Sivuca" "Ain't No Sunshine"]
["Frederick Knight" "Lean on Me"]
["Grover Washington, Jr." "Lean on Me"]
 ...

Appendix: Example Rules

Some of the queries above use a rule set (supplied to query as %). The example rules contain basic lookups, joins, and a graph-walking rule for finding the collaboration network of an artist to a depth of 4. You could increase the depth by adding more rules according to the recursive pattern. The rules are generated from the functions in rules.clj, are included as an EDN file in the project, and are also listed here:

[;; Given ?t bound to track entity-ids, binds ?r to the corresponding
 ;; set of album release entity-ids
 [(track-release ?t ?r)
  [?m :medium/tracks ?t]
  [?r :release/media ?m]]

 ;; Supply track entity-ids as ?t, and the other parameters will be
 ;; bound to the corresponding information about the tracks
 [(track-info ?t ?track-name ?artist-name ?album ?year)
  [?t :track/name    ?track-name]
  [?t :track/artists ?a]
  [?a :artist/name   ?artist-name]
  (track-release ?t ?r)
  [?r :release/name  ?album]
  [?r :release/year  ?year]]

 ;; Supply ?a (artist entity-ids) and and integer ?max track duration,
 ;; and ?t, ?len will be bound to track entity-ids and lengths
 ;; (respectively) of tracks shorter than the given ?max
 [(short-track ?a ?t ?len ?max)
  [?t :track/artists ?a]
  [?t :track/duration ?len]
  [(< ?len ?max)]]

 ;; Fulltext search on track.  Supply the query string ?q, and ?track
 ;; will be bound to entity-ids of tracks whose title matches the
 ;; search.
 [(track-search ?q ?track)
  [(fulltext $ :track/name ?q) [[?track ?tname]]]]

 ;; Generic transitive network walking, used by collaboration network
 ;; rule below

 ;; Supply:
 ;; ?e1 -- an entity-id
 ;; ?attr -- an attribute ident
 ;; and ?e2 will be bound to entity-ids such that ?e1 and ?e2 are both
 ;; values of the given attribute for some entity (?x)
 [(transitive-net-1 ?attr ?e1 ?e2)
  [?x ?attr ?e1]
  [?x ?attr ?e2]
  [(!= ?e1 ?e2)]]

 ;; Same as transitive-net-1, but search one more level of depth.  We
 ;; define this rule twice, once for each case, and the rule
 ;; represents the union of the two cases:
 ;; - The entities are directly related via the attribute
 ;; - The entities are related to the given depth (in this case 2) via the attribute
 [(transitive-net-2 ?attr ?e1 ?e2)
  (transitive-net-1 ?attr ?e1 ?e2)]
 [(transitive-net-2 ?attr ?e1 ?e2)
  (transitive-net-1 ?attr ?e1 ?x)
  (transitive-net-1 ?attr ?x ?e2)
  [(!= ?e1 ?e2)]]

 ;; Same as transitive-net-2 but to depth 3
 [(transitive-net-3 ?attr ?e1 ?e2)
  (transitive-net-1 ?attr ?e1 ?e2)]
 [(transitive-net-3 ?attr ?e1 ?e2)
  (transitive-net-2 ?attr ?e1 ?x)
  (transitive-net-2 ?attr ?x ?e2)
  [(!= ?e1 ?e2)]]

 ;; Same as transitive-net-2 but to depth 4
 [(transitive-net-4 ?attr ?e1 ?e2)
  (transitive-net-1 ?attr ?e1 ?e2)]
 [(transitive-net-4 ?attr ?e1 ?e2)
  (transitive-net-3 ?attr ?e1 ?x)
  (transitive-net-3 ?attr ?x ?e2)
  [(!= ?e1 ?e2)]]

 ;; Artist collaboration graph-walking rules, based on generic
 ;; graph-walk rule above

 ;; Supply an artist name as ?artist-name-1, an ?artist-name-2 will be
 ;; bound to the names of artists who directly collaborated with the
 ;; artist(s) having that name
 [(collab ?artist-name-1 ?artist-name-2)
  [?a1 :artist/name ?artist-name-1]
  (transitive-net-1 :track/artists ?a1 ?a2)
  [?a2 :artist/name ?artist-name-2]]

 ;; Alias for collab
 [(collab-net-1 ?artist-name-1 ?artist-name-2)
  (collab ?artist-name-1 ?artist-name-2)]

 ;; Collaboration network walk to depth 2
 [(collab-net-2 ?artist-name-1 ?artist-name-2)
  [?a1 :artist/name ?artist-name-1]
  (transitive-net-2 :track/artists ?a1 ?a2)
  [?a2 :artist/name ?artist-name-2]]

 ;; Collaboration network walk to depth 3
 [(collab-net-3 ?artist-name-1 ?artist-name-2)
  [?a1 :artist/name ?artist-name-1]
  (transitive-net-3 :track/artists ?a1 ?a2)
  [?a2 :artist/name ?artist-name-2]]

 ;; Collaboration network walk to depth 4
 [(collab-net-4 ?artist-name-1 ?artist-name-2)
  [?a1 :artist/name ?artist-name-1]
  (transitive-net-4 :track/artists ?a1 ?a2)
  [?a2 :artist/name ?artist-name-2]]]