+++ title = "Query Language" +++
Dgraph's GraphQL+- is based on Facebook's GraphQL. GraphQL wasn't developed for Graph databases, but its graph-like query syntax, schema validation and subgraph shaped response make it a great language choice. We've modified the language to better support graph operations, adding and removing features to get the best fit for graph databases. We're calling this simplified, feature rich language, ''GraphQL+-''.
GraphQL+- is a work in progress. We're adding more features and we might further simplify existing ones.
Take a Tour - https://tour.dgraph.io
This document is the Dgraph query reference material. It is not a tutorial. It's designed as a reference for users who already know how to write queries in GraphQL+- but need to check syntax, or indices, or functions, etc.
{{% notice "note" %}}If you are new to Dgraph and want to learn how to use Dgraph and GraphQL+-, take the tour - https://tour.dgraph.io{{% /notice %}}
The examples in this reference use a database of 21 million triples about movies and actors. The example queries run and return results. The queries are executed by an instance of Dgraph running at https://play.dgraph.io/. To run the queries locally or experiment a bit more, see the [Getting Started]({{< relref "get-started/index.md" >}}) guide, which also shows how to load the datasets used in the examples here.
A GraphQL+- query finds nodes based on search criteria, matches patterns in a graph and returns a graph as a result.
A query is composed of nested blocks, starting with a query root. The root finds the initial set of nodes against which the following graph matching and filtering is applied.
{{% notice "note" %}}See more about Queries in [Queries design concept]({{< relref "design-concepts/index.md#queries" >}}) {{% /notice %}}
Each query has a name, specified at the query root, and the same name identifies the results.
If an edge is of a value type, the value can be returned by giving the edge name.
Query Example: In the example dataset, edges that link movies to directors and actors, movies have a name, release date and identifiers for a number of well known movie databases. This query, with name bladerunner
, and root matching a movie name, returns those values for the early 80's sci-fi classic "Blade Runner".
{{< runnable >}} { bladerunner(func: eq(name@en, "Blade Runner")) { uid name@en initial_release_date netflix_id } } {{< /runnable >}}
The query first searches the graph, using indexes to make the search efficient, for all nodes with a name
edge equaling "Blade Runner". For the found node the query then returns the listed outgoing edges.
Every node had a unique 64-bit identifier. The uid
edge in the query above returns that identifier. If the required node is already known, then the function uid
finds the node.
Query Example: "Blade Runner" movie data found by UID.
{{< runnable >}} { bladerunner(func: uid(0x394c)) { uid name@en initial_release_date netflix_id } } {{< /runnable >}}
A query can match many nodes and return the values for each.
Query Example: All nodes that have either "Blade" or "Runner" in the name.
{{< runnable >}} { bladerunner(func: anyofterms(name@en, "Blade Runner")) { uid name@en initial_release_date netflix_id } } {{< /runnable >}}
Multiple IDs can be specified in a list to the uid
function.
Query Example: {{< runnable >}} { movies(func: uid(0xb5849, 0x394c)) { uid name@en initial_release_date netflix_id } } {{< /runnable >}}
{{% notice "note" %}} If your predicate has special characters, then you should wrap it with angular
brackets while asking for it in the query. E.g. <first:name>
{{% /notice %}}
A query expands edges from node to node by nesting query blocks with { }
.
Query Example: The actors and characters played in "Blade Runner". The query first finds the node with name "Blade Runner", then follows outgoing starring
edges to nodes representing an actor's performance as a character. From there the performance.actor
and performance.character
edges are expanded to find the actor names and roles for every actor in the movie.
{{< runnable >}}
{
brCharacters(func: eq(name@en, "Blade Runner")) {
name@en
initial_release_date
starring {
performance.actor {
name@en # actor name
}
performance.character {
name@en # character name
}
}
}
}
{{< /runnable >}}
Anything on a line following a #
is a comment
The query root finds an initial set of nodes and the query proceeds by returning values and following edges to further nodes - any node reached in the query is found by traversal after the search at root. The nodes found can be filtered by applying @filter
, either after the root or at any edge.
Query Example: "Blade Runner" director Ridley Scott's movies released before the year 2000. {{< runnable >}} { scott(func: eq(name@en, "Ridley Scott")) { name@en initial_release_date director.film @filter(le(initial_release_date, "2000")) { name@en initial_release_date } } } {{< /runnable >}}
Query Example: Movies with either "Blade" or "Runner" in the title and released before the year 2000.
{{< runnable >}} { bladerunner(func: anyofterms(name@en, "Blade Runner")) @filter(le(initial_release_date, "2000")) { uid name@en initial_release_date netflix_id } } {{< /runnable >}}
{{% notice "note" %}}A @lang
directive must be specified in the schema to query or mutate
predicates with language tags.{{% /notice %}}
Dgraph supports UTF-8 strings.
In a query, for a string valued edge edge
, the syntax
edge@lang1:...:langN
specifies the preference order for returned languages, with the following rules.
- At most one result will be returned (except in the case where the language list is set to *).
- The preference list is considered left to right: if a value in given language is not found, the next language from the list is considered.
- If there are no values in any of the specified languages, no value is returned.
- A final
.
means that a value without a specified language is returned or if there is no value without language, a value in ''some'' language is returned. - Setting the language list value to * will return all the values for that predicate along with their language. Values without a language tag are also returned.
For example:
name
=> Look for an untagged string; return nothing if no untagged value exits.name@.
=> Look for an untagged string, then any language.name@en
=> Look foren
tagged string; return nothing if noen
tagged string exists.name@en:.
=> Look foren
, then untagged, then any language.name@en:pl
=> Look foren
, thenpl
, otherwise nothing.name@en:pl:.
=> Look foren
, thenpl
, then untagged, then any language.name@*
=> Look for all the values of this predicate and return them along with their language. For example, if there are two values with languages en and hi, this query will return two keys named "name@en" and "name@hi".
{{% notice "note" %}}In functions, language lists (including the @*
notation) are not allowed. Untagged predicates, Single language tags, and .
notation work as described above.
In [full-text search functions]({{< relref "#full-text-search" >}}) (alloftext
, anyoftext
), when no language is specified (untagged or @.
), the default (English) full-text tokenizer is used.{{% /notice %}}
Query Example: Some of Bollywood director and actor Farhan Akhtar's movies have a name stored in Russian as well as Hindi and English, others do not.
{{< runnable >}} { q(func: allofterms(name@en, "Farhan Akhtar")) { name@hi name@en
director.film {
name@ru:hi:en
name@en
name@hi
name@ru
}
} } {{< /runnable >}}
{{% notice "note" %}}Functions can only be applied to [indexed]({{< relref "#indexing">}}) predicates.{{% /notice %}}
Functions allow filtering based on properties of nodes or variables. Functions can be applied in the query root or in filters.
For functions on string valued predicates, if no language preference is given, the function is applied to all languages and strings without a language tag; if a language preference is given, the function is applied only to strings of the given language.
Syntax Example: allofterms(predicate, "space-separated term list")
Schema Types: string
Index Required: term
Matches strings that have all specified terms in any order; case insensitive.
Query Example: All nodes that have name
containing terms indiana
and jones
, returning the English name and genre in English.
{{< runnable >}} { me(func: allofterms(name@en, "jones indiana")) { name@en genre { name@en } } } {{< /runnable >}}
Query Example: All Steven Spielberg films that contain the words indiana
and jones
. The @filter(has(director.film))
removes nodes with name Steven Spielberg that aren't the director --- the data also contains a character in a film called Steven Spielberg.
{{< runnable >}} { me(func: eq(name@en, "Steven Spielberg")) @filter(has(director.film)) { name@en director.film @filter(allofterms(name@en, "jones indiana")) { name@en } } } {{< /runnable >}}
Syntax Example: anyofterms(predicate, "space-separated term list")
Schema Types: string
Index Required: term
Matches strings that have any of the specified terms in any order; case insensitive.
Query Example: All nodes that have a name
containing either poison
or peacock
. Many of the returned nodes are movies, but people like Joan Peacock also meet the search terms because without a [cascade directive]({{< relref "#cascade-directive">}}) the query doesn't require a genre.
{{< runnable >}} { me(func:anyofterms(name@en, "poison peacock")) { name@en genre { name@en } } } {{< /runnable >}}
Query Example: All Steven Spielberg movies that contain war
or spies
. The @filter(has(director.film))
removes nodes with name Steven Spielberg that aren't the director --- the data also contains a character in a film called Steven Spielberg.
{{< runnable >}} { me(func: eq(name@en, "Steven Spielberg")) @filter(has(director.film)) { name@en director.film @filter(anyofterms(name@en, "war spies")) { name@en } } } {{< /runnable >}}
Syntax Examples: regexp(predicate, /regular-expression/)
or case insensitive regexp(predicate, /regular-expression/i)
Schema Types: string
Index Required: trigram
Matches strings by regular expression. The regular expression language is that of go regular expressions.
Query Example: At root, match nodes with Steven Sp
at the start of name
, followed by any characters. For each such matched uid, match the films containing ryan
. Note the difference with allofterms
, which would match only ryan
but regular expression search will also match within terms, such as bryan
.
{{< runnable >}} { directors(func: regexp(name@en, /^Steven Sp.*$/)) { name@en director.film @filter(regexp(name@en, /ryan/i)) { name@en } } } {{< /runnable >}}
A Trigram is a substring of three continuous runes. For example, Dgraph
has trigrams Dgr
, gra
, rap
, aph
.
To ensure efficiency of regular expression matching, Dgraph uses trigram indexing. That is, Dgraph converts the regular expression to a trigram query, uses the trigram index and trigram query to find possible matches and applies the full regular expression search only to the possibles.
Keep the following in mind when designing regular expression queries.
- At least one trigram must be matched by the regular expression (patterns shorter than 3 runes are not supported). That is, Dgraph requires regular expressions that can be converted to a trigram query.
- The number of alternative trigrams matched by the regular expression should be as small as possible (
[a-zA-Z][a-zA-Z][0-9]
is not a good idea). Many possible matches means the full regular expression is checked against many strings; where as, if the expression enforces more trigrams to match, Dgraph can make better use of the index and check the full regular expression against a smaller set of possible matches. - Thus, the regular expression should be as precise as possible. Matching longer strings means more required trigrams, which helps to effectively use the index.
- If repeat specifications (
*
,+
,?
,{n,m}
) are used, the entire regular expression must not match the empty string or any string: for example,*
may be used like[Aa]bcd*
but not like(abcd)*
or(abcd)|((defg)*)
- Repeat specifications after bracket expressions (e.g.
[fgh]{7}
,[0-9]+
or[a-z]{3,5}
) are often considered as matching any string because they match too many trigrams. - If the partial result (for subset of trigrams) exceeds 1000000 uids during index scan, the query is stopped to prohibit expensive queries.
Syntax: match(predicate, string, distance)
Schema Types: string
Index Required: trigram
Matches predicate values by calculating the Levenshtein distance to the string, also known as fuzzy matching. The distance parameter must be greater than zero (0). Using a greater distance value can yield more but less accurate results.
Query Example: At root, fuzzy match nodes similar to Stephen
, with a distance value of less than or equal to 8.
{{< runnable >}} { directors(func: match(name@en, Stephen, 8)) { name@en } } {{< /runnable >}}
Same query with a Levenshtein distance of 3.
{{< runnable >}} { directors(func: match(name@en, Stephen, 3)) { name@en } } {{< /runnable >}}
Syntax Examples: alloftext(predicate, "space-separated text")
and anyoftext(predicate, "space-separated text")
Schema Types: string
Index Required: fulltext
Apply full-text search with stemming and stop words to find strings matching all or any of the given text.
The following steps are applied during index generation and to process full-text search arguments:
- Tokenization (according to Unicode word boundaries).
- Conversion to lowercase.
- Unicode-normalization (to Normalization Form KC).
- Stemming using language-specific stemmer (if supported by language).
- Stop words removal (if supported by language).
Dgraph uses bleve for its full-text search indexing. See also the bleve language specific stop word lists.
Following table contains all supported languages, corresponding country-codes, stemming and stop words filtering support.
Language | Country Code | Stemming | Stop words |
---|---|---|---|
Arabic | ar | ✓ | ✓ |
Armenian | hy | ✓ | |
Basque | eu | ✓ | |
Bulgarian | bg | ✓ | |
Catalan | ca | ✓ | |
Chinese | zh | ✓ | ✓ |
Czech | cs | ✓ | |
Danish | da | ✓ | ✓ |
Dutch | nl | ✓ | ✓ |
English | en | ✓ | ✓ |
Finnish | fi | ✓ | ✓ |
French | fr | ✓ | ✓ |
Gaelic | ga | ✓ | |
Galician | gl | ✓ | |
German | de | ✓ | ✓ |
Greek | el | ✓ | |
Hindi | hi | ✓ | ✓ |
Hungarian | hu | ✓ | ✓ |
Indonesian | id | ✓ | |
Italian | it | ✓ | ✓ |
Japanese | ja | ✓ | ✓ |
Korean | ko | ✓ | ✓ |
Norwegian | no | ✓ | ✓ |
Persian | fa | ✓ | |
Portuguese | pt | ✓ | ✓ |
Romanian | ro | ✓ | ✓ |
Russian | ru | ✓ | ✓ |
Spanish | es | ✓ | ✓ |
Swedish | sv | ✓ | ✓ |
Turkish | tr | ✓ | ✓ |
Query Example: All names that have dog
, dogs
, bark
, barks
, barking
, etc. Stop word removal eliminates the
and which
.
{{< runnable >}} { movie(func:alloftext(name@en, "the dog which barks")) { name@en } } {{< /runnable >}}
Syntax Examples:
eq(predicate, value)
eq(val(varName), value)
eq(predicate, val(varName))
eq(count(predicate), value)
eq(predicate, [val1, val2, ..., valN])
eq(predicate, [$var1, "value", ..., $varN])
Schema Types: int
, float
, bool
, string
, dateTime
Index Required: An index is required for the eq(predicate, ...)
forms (see table below). For count(predicate)
at the query root, the @count
index is required. For variables the values have been calculated as part of the query, so no index is required.
Type | Index Options |
---|---|
int |
int |
float |
float |
bool |
bool |
string |
exact , hash |
dateTime |
dateTime |
Test for equality of a predicate or variable to a value or find in a list of values.
The boolean constants are true
and false
, so with eq
this becomes, for example, eq(boolPred, true)
.
Query Example: Movies with exactly thirteen genres.
{{< runnable >}} { me(func: eq(count(genre), 13)) { name@en genre { name@en } } } {{< /runnable >}}
Query Example: Directors called Steven who have directed 1,2 or 3 movies.
{{< runnable >}} { steve as var(func: allofterms(name@en, "Steven")) { films as count(director.film) }
stevens(func: uid(steve)) @filter(eq(val(films), [1,2,3])) { name@en numFilms : val(films) } } {{< /runnable >}}
Syntax Examples: for inequality IE
IE(predicate, value)
IE(val(varName), value)
IE(predicate, val(varName))
IE(count(predicate), value)
With IE
replaced by
le
less than or equal tolt
less thange
greater than or equal togt
greather than
Schema Types: int
, float
, string
, dateTime
Index required: An index is required for the IE(predicate, ...)
forms (see table below). For count(predicate)
at the query root, the @count
index is required. For variables the values have been calculated as part of the query, so no index is required.
Type | Index Options |
---|---|
int |
int |
float |
float |
string |
exact |
dateTime |
dateTime |
Query Example: Ridley Scott movies released before 1980.
{{< runnable >}} { me(func: eq(name@en, "Ridley Scott")) { name@en director.film @filter(lt(initial_release_date, "1980-01-01")) { initial_release_date name@en } } } {{< /runnable >}}
Query Example: Movies with directors with Steven
in name
and have directed more than 100
actors.
{{< runnable >}} { ID as var(func: allofterms(name@en, "Steven")) { director.film { num_actors as count(starring) } total as sum(val(num_actors)) }
dirs(func: uid(ID)) @filter(gt(val(total), 100)) { name@en total_actors : val(total) } } {{< /runnable >}}
Query Example: A movie in each genre that has over 30000 movies. Because there is no order specified on genres, the order will be by UID. The [count index]({{< relref "#count-index">}}) records the number of edges out of nodes and makes such queries more .
{{< runnable >}} { genre(func: gt(count(~genre), 30000)){ name@en ~genre (first:1) { name@en } } } {{< /runnable >}}
Query Example: Directors called Steven and their movies which have initial_release_date
greater
than that of the movie Minority Report.
{{< runnable >}} { var(func: eq(name@en,"Minority Report")) { d as initial_release_date }
me(func: eq(name@en, "Steven Spielberg")) { name@en director.film @filter(ge(initial_release_date, val(d))) { initial_release_date name@en } } } {{< /runnable >}}
Syntax Examples:
q(func: uid(<uid>))
predicate @filter(uid(<uid1>, ..., <uidn>))
predicate @filter(uid(a))
for variablea
q(func: uid(a,b))
for variablesa
andb
Filters nodes at the current query level to only nodes in the given set of UIDs.
For query variable a
, uid(a)
represents the set of UIDs stored in a
. For value variable b
, uid(b)
represents the UIDs from the UID to value map. With two or more variables, uid(a,b,...)
represents the union of all the variables.
uid(<uid>)
, like an identity function, will return the requested UID even if the node does not have any edges.
Query Example: If the UID of a node is known, values for the node can be read directly. The films of Priyanka Chopra by known UID
{{< runnable >}} { films(func: uid(0x2c964)) { name@hi actor.film { performance.film { name@hi } } } } {{< /runnable >}}
Query Example: The films of Taraji Henson by genre. {{< runnable >}} { var(func: allofterms(name@en, "Taraji Henson")) { actor.film { F as performance.film { G as genre } } }
Taraji_films_by_genre(func: uid(G)) { genre_name : name@en films : ~genre @filter(uid(F)) { film_name : name@en } } } {{< /runnable >}}
Query Example: Taraji Henson films ordered by number of genres, with genres listed in order of how many films Taraji has made in each genre. {{< runnable >}} { var(func: allofterms(name@en, "Taraji Henson")) { actor.film { F as performance.film { G as count(genre) genre { C as count(~genre @filter(uid(F))) } } } }
Taraji_films_by_genre_count(func: uid(G), orderdesc: val(G)) { film_name : name@en genres : genre (orderdesc: val(C)) { genre_name : name@en } } } {{< /runnable >}}
Syntax Examples:
q(func: ...) @filter(uid_in(predicate, <uid>))
predicate1 @filter(uid_in(predicate2, <uid>))
Schema Types: UID
Index Required: none
While the uid
function filters nodes at the current level based on UID, function uid_in
allows looking ahead along an edge to check that it leads to a particular UID. This can often save an extra query block and avoids returning the edge.
uid_in
cannot be used at root, it accepts one UID constant as its argument (not a variable).
Query Example: The collaborations of Marc Caro and Jean-Pierre Jeunet (UID 0x99706). If the UID of Jean-Pierre Jeunet is known, querying this way removes the need to have a block extracting his UID into a variable and the extra edge traversal and filter for ~director.film
.
{{< runnable >}}
{
caro(func: eq(name@en, "Marc Caro")) {
name@en
director.film @filter(uid_in(~director.film, 0x99706)) {
name@en
}
}
}
{{< /runnable >}}
Syntax Examples: has(predicate)
Schema Types: all
Determines if a node has a particular predicate.
Query Example: First five directors and all their movies that have a release date recorded. Directors have directed at least one film --- equivalent semantics to gt(count(director.film), 0)
.
{{< runnable >}}
{
me(func: has(director.film), first: 5) {
name@en
director.film @filter(has(initial_release_date)) {
initial_release_date
name@en
}
}
}
{{< /runnable >}}
{{% notice "note" %}} As of now we only support indexing Point, Polygon and MultiPolygon geometry types. However, Dgraph can store other types of gelocation data. {{% /notice %}}
Note that for geo queries, any polygon with holes is replace with the outer loop, ignoring holes. Also, as for version 0.7.7 polygon containment checks are approximate.
To make use of the geo functions you would need an index on your predicate.
loc: geo @index(geo) .
Here is how you would add a Point
.
{
set {
<_:0xeb1dde9c> <loc> "{'type':'Point','coordinates':[-122.4220186,37.772318]}"^^<geo:geojson> .
<_:0xeb1dde9c> <name> "Hamon Tower" .
<_:0xeb1dde9c> <dgraph.type> "Location" .
}
}
Here is how you would associate a Polygon
with a node. Adding a MultiPolygon
is also similar.
{
set {
<_:0xf76c276b> <loc> "{'type':'Polygon','coordinates':[[[-122.409869,37.7785442],[-122.4097444,37.7786443],[-122.4097544,37.7786521],[-122.4096334,37.7787494],[-122.4096233,37.7787416],[-122.4094004,37.7789207],[-122.4095818,37.7790617],[-122.4097883,37.7792189],[-122.4102599,37.7788413],[-122.409869,37.7785442]],[[-122.4097357,37.7787848],[-122.4098499,37.778693],[-122.4099025,37.7787339],[-122.4097882,37.7788257],[-122.4097357,37.7787848]]]}"^^<geo:geojson> .
<_:0xf76c276b> <name> "Best Western Americana Hotel" .
<_:0xf76c276b> <dgraph.type> "Location" .
}
}
The above examples have been picked from our SF Tourism dataset.
Syntax Example: near(predicate, [long, lat], distance)
Schema Types: geo
Index Required: geo
Matches all entities where the location given by predicate
is within distance
meters of geojson coordinate [long, lat]
.
Query Example: Tourist destinations within 1000 meters (1 kilometer) of a point in Golden Gate Park in San Francisco.
{{< runnable >}} { tourist(func: near(loc, [-122.469829, 37.771935], 1000) ) { name } } {{< /runnable >}}
Syntax Example: within(predicate, [[[long1, lat1], ..., [longN, latN]]])
Schema Types: geo
Index Required: geo
Matches all entities where the location given by predicate
lies within the polygon specified by the geojson coordinate array.
Query Example: Tourist destinations within the specified area of Golden Gate Park, San Francisco.
{{< runnable >}} { tourist(func: within(loc, [[[-122.47266769409178, 37.769018558337926 ], [ -122.47266769409178, 37.773699921075135 ], [ -122.4651575088501, 37.773699921075135 ], [ -122.4651575088501, 37.769018558337926 ], [ -122.47266769409178, 37.769018558337926]]] )) { name } } {{< /runnable >}}
Syntax Examples: contains(predicate, [long, lat])
or contains(predicate, [[long1, lat1], ..., [longN, latN]])
Schema Types: geo
Index Required: geo
Matches all entities where the polygon describing the location given by predicate
contains geojson coordinate [long, lat]
or given geojson polygon.
Query Example : All entities that contain a point in the flamingo enclosure of San Francisco Zoo. {{< runnable >}} { tourist(func: contains(loc, [ -122.50326097011566, 37.73353615592843 ] )) { name } } {{< /runnable >}}
Syntax Example: intersects(predicate, [[[long1, lat1], ..., [longN, latN]]])
Schema Types: geo
Index Required: geo
Matches all entities where the polygon describing the location given by predicate
intersects the given geojson polygon.
{{< runnable >}} { tourist(func: intersects(loc, [[[-122.503325343132, 37.73345766902749 ], [ -122.503325343132, 37.733903134117966 ], [ -122.50271648168564, 37.733903134117966 ], [ -122.50271648168564, 37.73345766902749 ], [ -122.503325343132, 37.73345766902749]]] )) { name } } {{< /runnable >}}
Within @filter
multiple functions can be used with boolean connectives.
Connectives AND
, OR
and NOT
join filters and can be built into arbitrarily complex filters, such as (NOT A OR B) AND (C AND NOT (D OR E))
. Note that, NOT
binds more tightly than AND
which binds more tightly than OR
.
Query Example : All Steven Spielberg movies that contain either both "indiana" and "jones" OR both "jurassic" and "park".
{{< runnable >}} { me(func: eq(name@en, "Steven Spielberg")) @filter(has(director.film)) { name@en director.film @filter(allofterms(name@en, "jones indiana") OR allofterms(name@en, "jurassic park")) { uid name@en } } } {{< /runnable >}}
Syntax Examples:
aliasName : predicate
aliasName : predicate { ... }
aliasName : varName as ...
aliasName : count(predicate)
aliasName : max(val(varName))
An alias provides an alternate name in results. Predicates, variables and aggregates can be aliased by prefixing with the alias name and :
. Aliases do not have to be different to the original predicate name, but, within a block, an alias must be distinct from predicate names and other aliases returned in the same block. Aliases can be used to return the same predicate multiple times within a block.
Query Example: Directors with name
matching term Steven
, their UID, English name, average number of actors per movie, total number of films, and the name of each film in English and French.
{{< runnable >}}
{
ID as var(func: allofterms(name@en, "Steven")) @filter(has(director.film)) {
director.film {
num_actors as count(starring)
}
average as avg(val(num_actors))
}
films(func: uid(ID)) { director_id : uid english_name : name@en average_actors : val(average) num_films : count(director.film)
films : director.film {
name : name@en
english_name : name@en
french_name : name@fr
}
} } {{< /runnable >}}
Pagination allows returning only a portion, rather than the whole, result set. This can be useful for top-k style queries as well as to reduce the size of the result set for client side processing or to allow paged access to results.
Pagination is often used with [sorting]({{< relref "#sorting">}}).
{{% notice "note" %}}Without a sort order specified, the results are sorted by uid
, which is assigned randomly. So the ordering, while deterministic, might not be what you expected.{{% /notice %}}
Syntax Examples:
q(func: ..., first: N)
predicate (first: N) { ... }
predicate @filter(...) (first: N) { ... }
For positive N
, first: N
retrieves the first N
results, by sorted or UID order.
For negative N
, first: N
retrieves the last N
results, by sorted or UID order. Currently, negative is only supported when no order is applied. To achieve the effect of a negative with a sort, reverse the order of the sort and use a positive N
.
Query Example: Last two films, by UID order, directed by Steven Spielberg and the first three genres of those movies, sorted alphabetically by English name.
{{< runnable >}} { me(func: allofterms(name@en, "Steven Spielberg")) { director.film (first: -2) { name@en initial_release_date genre (orderasc: name@en) (first: 3) { name@en } } } } {{< /runnable >}}
Query Example: The three directors named Steven who have directed the most actors of all directors named Steven.
{{< runnable >}} { ID as var(func: allofterms(name@en, "Steven")) @filter(has(director.film)) { director.film { stars as count(starring) } totalActors as sum(val(stars)) }
mostStars(func: uid(ID), orderdesc: val(totalActors), first: 3) { name@en stars : val(totalActors)
director.film {
name@en
}
} } {{< /runnable >}}
Syntax Examples:
q(func: ..., offset: N)
predicate (offset: N) { ... }
predicate (first: M, offset: N) { ... }
predicate @filter(...) (offset: N) { ... }
With offset: N
the first N
results are not returned. Used in combination with first, first: M, offset: N
skips over N
results and returns the following M
.
Query Example: Order Hark Tsui's films by English title, skip over the first 4 and return the following 6.
{{< runnable >}} { me(func: allofterms(name@en, "Hark Tsui")) { name@zh name@en director.film (orderasc: name@en) (first:6, offset:4) { genre { name@en } name@zh name@en initial_release_date } } } {{< /runnable >}}
Syntax Examples:
q(func: ..., after: UID)
predicate (first: N, after: UID) { ... }
predicate @filter(...) (first: N, after: UID) { ... }
Another way to get results after skipping over some results is to use the default UID ordering and skip directly past a node specified by UID. For example, a first query could be of the form predicate (after: 0x0, first: N)
, or just predicate (first: N)
, with subsequent queries of the form predicate(after: <uid of last entity in last result>, first: N)
.
Query Example: The first five of Baz Luhrmann's films, sorted by UID order.
{{< runnable >}} { me(func: allofterms(name@en, "Baz Luhrmann")) { name@en director.film (first:5) { uid name@en } } } {{< /runnable >}}
The fifth movie is the Australian movie classic Strictly Ballroom. It has UID 0x99e44
. The results after Strictly Ballroom can now be obtained with after
.
{{< runnable >}} { me(func: allofterms(name@en, "Baz Luhrmann")) { name@en director.film (first:5, after: 0x99e44) { uid name@en } } } {{< /runnable >}}
Syntax Examples:
count(predicate)
count(uid)
The form count(predicate)
counts how many predicate
edges lead out of a node.
The form count(uid)
counts the number of UIDs matched in the enclosing block.
Query Example: The number of films acted in by each actor with Orlando
in their name.
{{< runnable >}} { me(func: allofterms(name@en, "Orlando")) @filter(has(actor.film)) { name@en count(actor.film) } } {{< /runnable >}}
Count can be used at root and [aliased]({{< relref "#alias">}}).
Query Example: Count of directors who have directed more than five films. When used at the query root, the [count index]({{< relref "#count-index">}}) is required.
{{< runnable >}} { directors(func: gt(count(director.film), 5)) { totalDirectors : count(uid) } } {{< /runnable >}}
Count can be assigned to a [value variable]({{< relref "#value-variables">}}).
Query Example: The actors of Ang Lee's "Eat Drink Man Woman" ordered by the number of movies acted in.
{{< runnable >}} { var(func: allofterms(name@en, "eat drink man woman")) { starring { actors as performance.actor { totalRoles as count(actor.film) } } }
edmw(func: uid(actors), orderdesc: val(totalRoles)) { name@en name@zh totalRoles : val(totalRoles) } } {{< /runnable >}}
Syntax Examples:
q(func: ..., orderasc: predicate)
q(func: ..., orderdesc: val(varName))
predicate (orderdesc: predicate) { ... }
predicate @filter(...) (orderasc: N) { ... }
q(func: ..., orderasc: predicate1, orderdesc: predicate2)
Sortable Types: int
, float
, String
, dateTime
, default
Results can be sorted in ascending order (orderasc
) or descending order (orderdesc
) by a predicate or variable.
For sorting on predicates with [sortable indices]({{< relref "#sortable-indices">}}), Dgraph sorts on the values and with the index in parallel and returns whichever result is computed first.
Sorted queries retrieve up to 1000 results by default. This can be changed with [first]({{< relref "#first">}}).
Query Example: French director Jean-Pierre Jeunet's movies sorted by release date.
{{< runnable >}} { me(func: allofterms(name@en, "Jean-Pierre Jeunet")) { name@fr director.film(orderasc: initial_release_date) { name@fr name@en initial_release_date } } } {{< /runnable >}}
Sorting can be performed at root and on value variables.
Query Example: All genres sorted alphabetically and the five movies in each genre with the most genres.
{{< runnable >}} { genres as var(func: has(~genre)) { ~genre { numGenres as count(genre) } }
genres(func: uid(genres), orderasc: name@en) { name@en ~genre (orderdesc: val(numGenres), first: 5) { name@en genres : val(numGenres) } } } {{< /runnable >}}
Sorting can also be performed by multiple predicates as shown below. If the values are equal for the first predicate, then they are sorted by the second predicate and so on.
Query Example: Find all nodes which have type Person, sort them by their first_name and among those that have the same first_name sort them by last_name in descending order.
{
me(func: type("Person"), orderasc: first_name, orderdesc: last_name) {
first_name
last_name
}
}
Inside a single query, multiple query blocks are allowed. The result is all blocks with corresponding block names.
Multiple query blocks are executed in parallel.
The blocks need not be related in any way.
Query Example: All of Angelina Jolie's films, with genres, and Peter Jackson's films since 2008.
{{< runnable >}} { AngelinaInfo(func:allofterms(name@en, "angelina jolie")) { name@en actor.film { performance.film { genre { name@en } } } }
DirectorInfo(func: eq(name@en, "Peter Jackson")) { name@en director.film @filter(ge(initial_release_date, "2008")) { Release_date: initial_release_date Name: name@en } } } {{< /runnable >}}
If queries contain some overlap in answers, the result sets are still independent.
Query Example: The movies Mackenzie Crook has acted in and the movies Jack Davenport has acted in. The results sets overlap because both have acted in the Pirates of the Caribbean movies, but the results are independent and both contain the full answers sets.
{{< runnable >}} { Mackenzie(func:allofterms(name@en, "Mackenzie Crook")) { name@en actor.film { performance.film { uid name@en } performance.character { name@en } } }
Jack(func:allofterms(name@en, "Jack Davenport")) { name@en actor.film { performance.film { uid name@en } performance.character { name@en } } } } {{< /runnable >}}
Var blocks start with the keyword var
and are not returned in the query results.
Query Example: Angelina Jolie's movies ordered by genre.
{{< runnable >}} { var(func:allofterms(name@en, "angelina jolie")) { name@en actor.film { A AS performance.film { B AS genre } } }
films(func: uid(B), orderasc: name@en) { name@en ~genre @filter(uid(A)) { name@en } } } {{< /runnable >}}
Syntax Examples:
varName as q(func: ...) { ... }
varName as var(func: ...) { ... }
varName as predicate { ... }
varName as predicate @filter(...) { ... }
Types : uid
Nodes (UIDs) matched at one place in a query can be stored in a variable and used elsewhere. Query variables can be used in other query blocks or in a child node of the defining block.
Query variables do not affect the semantics of the query at the point of definition. Query variables are evaluated to all nodes matched by the defining block.
In general, query blocks are executed in parallel, but variables impose an evaluation order on some blocks. Cycles induced by variable dependence are not permitted.
If a variable is defined, it must be used elsewhere in the query.
A query variable is used by extracting the UIDs in it with uid(var-name)
.
The syntax func: uid(A,B)
or @filter(uid(A,B))
means the union of UIDs for variables A
and B
.
Query Example: The movies of Angelia Jolie and Brad Pitt where both have acted on movies in the same genre. Note that B
and D
match all genres for all movies, not genres per movie.
{{< runnable >}}
{
var(func:allofterms(name@en, "angelina jolie")) {
actor.film {
A AS performance.film { # All films acted in by Angelina Jolie
B As genre # Genres of all the films acted in by Angelina Jolie
}
}
}
var(func:allofterms(name@en, "brad pitt")) { actor.film { C AS performance.film { # All films acted in by Brad Pitt D as genre # Genres of all the films acted in by Brad Pitt } } }
films(func: uid(D)) @filter(uid(B)) { # Genres from both Angelina and Brad name@en ~genre @filter(uid(A, C)) { # Movies in either A or C. name@en } } } {{< /runnable >}}
Syntax Examples:
varName as scalarPredicate
varName as count(predicate)
varName as avg(...)
varName as math(...)
Types : int
, float
, String
, dateTime
, default
, geo
, bool
Value variables store scalar values. Value variables are a map from the UIDs of the enclosing block to the corresponding values.
It therefore only makes sense to use the values from a value variable in a context that matches the same UIDs - if used in a block matching different UIDs the value variable is undefined.
It is an error to define a value variable but not use it elsewhere in the query.
Value variables are used by extracting the values with val(var-name)
, or by extracting the UIDs with uid(var-name)
.
[Facet]({{< relref "#facets-edge-attributes">}}) values can be stored in value variables.
Query Example: The number of movie roles played by the actors of the 80's classic "The Princess Bride". Query variable pbActors
matches the UIDs of all actors from the movie. Value variable roles
is thus a map from actor UID to number of roles. Value variable roles
can be used in the totalRoles
query block because that query block also matches the pbActors
UIDs, so the actor to number of roles map is available.
{{< runnable >}} { var(func:allofterms(name@en, "The Princess Bride")) { starring { pbActors as performance.actor { roles as count(actor.film) } } } totalRoles(func: uid(pbActors), orderasc: val(roles)) { name@en numRoles : val(roles) } } {{< /runnable >}}
Value variables can be used in place of UID variables by extracting the UID list from the map.
Query Example: The same query as the previous example, but using value variable roles
for matching UIDs in the totalRoles
query block.
{{< runnable >}} { var(func:allofterms(name@en, "The Princess Bride")) { starring { performance.actor { roles as count(actor.film) } } } totalRoles(func: uid(roles), orderasc: val(roles)) { name@en numRoles : val(roles) } } {{< /runnable >}}
Like query variables, value variables can be used in other query blocks and in blocks nested within the defining block. When used in a block nested within the block that defines the variable, the value is computed as a sum of the variable for parent nodes along all paths to the point of use. This is called variable propagation.
For example:
{
q(func: uid(0x01)) {
myscore as math(1) # A
friends { # B
friends { # C
...myscore...
}
}
}
}
At line A, a value variable myscore
is defined as mapping node with UID 0x01
to value 1. At B, the value for each friend is still 1: there is only one path to each friend. Traversing the friend edge twice reaches the friends of friends. The variable myscore
gets propagated such that each friend of friend will receive the sum of its parents values: if a friend of a friend is reachable from only one friend, the value is still 1, if they are reachable from two friends, the value is two and so on. That is, the value of myscore
for each friend of friends inside the block marked C will be the number of paths to them.
The value that a node receives for a propagated variable is the sum of the values of all its parent nodes.
This propagation is useful, for example, in normalizing a sum across users, finding the number of paths between nodes and accumulating a sum through a graph.
Query Example: For each Harry Potter movie, the number of roles played by actor Warwick Davis. {{< runnable >}} { num_roles(func: eq(name@en, "Warwick Davis")) @cascade @normalize {
paths as math(1) # records number of paths to each character
actor : name@en
actor.film {
performance.film @filter(allofterms(name@en, "Harry Potter")) {
film_name : name@en
characters : math(paths) # how many paths (i.e. characters) reach this film
}
}
} } {{< /runnable >}}
Query Example: Each actor who has been in a Peter Jackson movie and the fraction of Peter Jackson movies they have appeared in. {{< runnable >}} { movie_fraction(func:eq(name@en, "Peter Jackson")) @normalize {
paths as math(1)
total_films : num_films as count(director.film)
director : name@en
director.film {
starring {
performance.actor {
fraction : math(paths / (num_films/paths))
actor : name@en
}
}
}
} } {{< /runnable >}}
More examples can be found in two Dgraph blog posts about using variable propagation for recommendation engines (post 1, post 2).
Syntax Example: AG(val(varName))
For AG
replaced with
min
: select the minimum value in the value variablevarName
max
: select the maximum valuesum
: sum all values in value variablevarName
avg
: calculate the average of values invarName
Schema Types:
Aggregation | Schema Types |
---|---|
min / max |
int , float , string , dateTime , default |
sum / avg |
int , float |
Aggregation can only be applied to [value variables]({{< relref "#value-variables">}}). An index is not required (the values have already been found and stored in the value variable mapping).
An aggregation is applied at the query block enclosing the variable definition. As opposed to query variables and value variables, which are global, aggregation is computed locally. For example:
A as predicateA {
...
B as predicateB {
x as ...some value...
}
min(val(x))
}
Here, A
and B
are the lists of all UIDs that match these blocks. Value variable x
is a mapping from UIDs in B
to values. The aggregation min(val(x))
, however, is computed for each UID in A
. That is, it has a semantics of: for each UID in A
, take the slice of x
that corresponds to A
's outgoing predicateB
edges and compute the aggregation for those values.
Aggregations can themselves be assigned to value variables, making a UID to aggregation map.
Query Example: Get the min initial release date for any Harry Potter movie.
The release date is assigned to a variable, then it is aggregated and fetched in an empty block. {{< runnable >}} { var(func: allofterms(name@en, "Harry Potter")) { d as initial_release_date } me() { min(val(d)) } } {{< /runnable >}}
Query Example: Directors called Steven and the date of release of their first movie, in ascending order of first movie.
{{< runnable >}} { stevens as var(func: allofterms(name@en, "steven")) { director.film { ird as initial_release_date # ird is a value variable mapping a film UID to its release date } minIRD as min(val(ird)) # minIRD is a value variable mapping a director UID to their first release date }
byIRD(func: uid(stevens), orderasc: val(minIRD)) { name@en firstRelease: val(minIRD) } } {{< /runnable >}}
Query Example: Get the max initial release date for any Harry Potter movie.
The release date is assigned to a variable, then it is aggregated and fetched in an empty block. {{< runnable >}} { var(func: allofterms(name@en, "Harry Potter")) { d as initial_release_date } me() { max(val(d)) } } {{< /runnable >}}
Query Example: Quentin Tarantino's movies and date of release of the most recent movie.
{{< runnable >}} { director(func: allofterms(name@en, "Quentin Tarantino")) { director.film { name@en x as initial_release_date } max(val(x)) } } {{< /runnable >}}
Query Example: Get the sum and average of number of count of movies directed by people who have Steven or Tom in their name.
{{< runnable >}} { var(func: anyofterms(name@en, "Steven Tom")) { a as count(director.film) }
me() { avg(val(a)) sum(val(a)) } } {{< /runnable >}}
Query Example: Steven Spielberg's movies, with the number of recorded genres per movie, and the total number of genres and average genres per movie.
{{< runnable >}} { director(func: eq(name@en, "Steven Spielberg")) { name@en director.film { name@en numGenres : g as count(genre) } totalGenres : sum(val(g)) genresPerMovie : avg(val(g)) } } {{< /runnable >}}
Aggregations can be assigned to value variables, and so these variables can in turn be aggregated.
Query Example: For each actor in a Peter Jackson film, find the number of roles played in any movie. Sum these to find the total number of roles ever played by all actors in the movie. Then sum the lot to find the total number of roles ever played by actors who have appeared in Peter Jackson movies. Note that this demonstrates how to aggregate aggregates; the answer in this case isn't quite precise though, because actors that have appeared in multiple Peter Jackson movies are counted more than once.
{{< runnable >}} { PJ as var(func:allofterms(name@en, "Peter Jackson")) { director.film { starring { # starring an actor performance.actor { movies as count(actor.film) # number of roles for this actor } perf_total as sum(val(movies)) } movie_total as sum(val(perf_total)) # total roles for all actors in this movie } gt as sum(val(movie_total)) }
PJmovies(func: uid(PJ)) { name@en director.film (orderdesc: val(movie_total), first: 5) { name@en totalRoles : val(movie_total) } grandTotal : val(gt) } } {{< /runnable >}}
Value variables can be combined using mathematical functions. For example, this could be used to associate a score which is then used to order or perform other operations, such as might be used in building news feeds, simple recommendation systems, and so on.
Math statements must be enclosed within math( <exp> )
and must be stored to a value variable.
The supported operators are as follows:
Operators | Types accepted | What it does |
---|---|---|
+ - * / % |
int , float |
performs the corresponding operation |
min max |
All types except geo , bool (binary functions) |
selects the min/max value among the two |
< > <= >= == != |
All types except geo , bool |
Returns true or false based on the values |
floor ceil ln exp sqrt |
int , float (unary function) |
performs the corresponding operation |
since |
dateTime |
Returns the number of seconds in float from the time specified |
pow(a, b) |
int , float |
Returns a to the power b |
logbase(a,b) |
int , float |
Returns log(a) to the base b |
cond(a, b, c) |
first operand must be a boolean | selects b if a is true else c |
Query Example: Form a score for each of Steven Spielberg's movies as the sum of number of actors, number of genres and number of countries. List the top five such movies in order of decreasing score.
{{< runnable >}} { var(func:allofterms(name@en, "steven spielberg")) { films as director.film { p as count(starring) q as count(genre) r as count(country) score as math(p + q + r) } }
TopMovies(func: uid(films), orderdesc: val(score), first: 5){
name@en
val(score)
}
} {{< /runnable >}}
Value variables and aggregations of them can be used in filters.
Query Example: Calculate a score for each Steven Spielberg movie with a condition on release date to penalize movies that are more than 10 years old, filtering on the resulting score.
{{< runnable >}} { var(func:allofterms(name@en, "steven spielberg")) { films as director.film { p as count(starring) q as count(genre) date as initial_release_date years as math(since(date)/(3652460*60)) score as math(cond(years > 10, 0, ln(p)+q-ln(years))) } }
TopMovies(func: uid(films), orderdesc: val(score)) @filter(gt(val(score), 2)){ name@en val(score) val(date) } } {{< /runnable >}}
Values calculated with math operations are stored to value variables and so can be aggregated.
Query Example: Compute a score for each Steven Spielberg movie and then aggregate the score.
{{< runnable >}} { steven as var(func:eq(name@en, "Steven Spielberg")) @filter(has(director.film)) { director.film { p as count(starring) q as count(genre) r as count(country) score as math(p + q + r) } directorScore as sum(val(score)) }
score(func: uid(steven)){
name@en
val(directorScore)
}
} {{< /runnable >}}
Syntax Examples:
q(func: ...) @groupby(predicate) { min(...) }
- `predicate @groupby(pred) { count(uid) }``
A groupby
query aggregates query results given a set of properties on which to group elements. For example, a query containing the block friend @groupby(age) { count(uid) }
, finds all nodes reachable along the friend edge, partitions these into groups based on age, then counts how many nodes are in each group. The returned result is the grouped edges and the aggregations.
Inside a groupby
block, only aggregations are allowed and count
may only be applied to uid
.
If the groupby
is applied to a uid
predicate, the resulting aggregations can be saved in a variable (mapping the grouped UIDs to aggregate values) and used elsewhere in the query to extract information other than the grouped or aggregated edges.
Query Example: For Steven Spielberg movies, count the number of movies in each genre and for each of those genres return the genre name and the count. The name can't be extracted in the groupby
because it is not an aggregate, but uid(a)
can be used to extract the UIDs from the UID to value map and thus organize the byGenre
query by genre UID.
{{< runnable >}} { var(func:allofterms(name@en, "steven spielberg")) { director.film @groupby(genre) { a as count(uid) # a is a genre UID to count value variable } }
byGenre(func: uid(a), orderdesc: val(a)) { name@en total_movies : val(a) } } {{< /runnable >}}
Query Example: Actors from Tim Burton movies and how many roles they have played in Tim Burton movies. {{< runnable >}} { var(func:allofterms(name@en, "Tim Burton")) { director.film { starring @groupby(performance.actor) { a as count(uid) # a is an actor UID to count value variable } } }
byActor(func: uid(a), orderdesc: val(a)) { name@en val(a) } } {{< /runnable >}}
The expand()
function can be used to expand the predicates out of a node. To
use expand()
, the [type system]({{< relref "#type-system" >}}) is required.
Refer to the section on the type system to check how to set the types
nodes. The rest of this section assumes familiarity with that section.
There are four ways to use the expand
function.
- Predicates can be stored in a variable and passed to
expand()
to expand all the predicates in the variable. - If
_all_
is passed as an argument toexpand()
, the predicates to be expanded will be the union of fields in the types assigned to a given node.
The _all_
keyword requires that the nodes have types. Dgraph will look for all
the types that have been assigned to a node, query the types to check which
attributes they have, and use those to compute the list of predicates to expand.
For example, consider a node that has types Animal
and Pet
, which have
the following definitions:
type Animal {
name
species
dob
}
type Pet {
owner
veterinarian
}
When expand(_all_)
is called on this node, Dgraph will first check which types
the node has (Animal
and Pet
). Then it will get the definitions of Animal
and Pet
and build a list of predicates from their type definitions.
name
species
dob
owner
veterinarian
For string
predicates, expand
only returns values not tagged with a language
(see [language preference]({{< relref "#language-support" >}})). So it's often
required to add name@fr
or name@.
as well to an expand query.
With the @cascade
directive, nodes that don't have all predicates specified in the query are removed. This can be useful in cases where some filter was applied or if nodes might not have all listed predicates.
Query Example: Harry Potter movies, with each actor and characters played. With @cascade
, any character not played by an actor called Warwick is removed, as is any Harry Potter movie without any actors called Warwick. Without @cascade
, every character is returned, but only those played by actors called Warwick also have the actor name.
{{< runnable >}}
{
HP(func: allofterms(name@en, "Harry Potter")) @cascade {
name@en
starring{
performance.character {
name@en
}
performance.actor @filter(allofterms(name@en, "Warwick")){
name@en
}
}
}
}
{{< /runnable >}}
You can apply @cascade
on inner query blocks as well.
{{< runnable >}}
{
HP(func: allofterms(name@en, "Harry Potter")) {
name@en
genre {
name@en
}
starring @cascade {
performance.character {
name@en
}
performance.actor @filter(allofterms(name@en, "Warwick")){
name@en
}
}
}
}
{{< /runnable >}}
With the @normalize
directive, only aliased predicates are returned and the result is flattened to remove nesting.
Query Example: Film name, country and first two actors (by UID order) of every Steven Spielberg movie, without initial_release_date
because no alias is given and flattened by @normalize
{{< runnable >}}
{
director(func:allofterms(name@en, "steven spielberg")) @normalize {
director: name@en
director.film {
film: name@en
initial_release_date
starring(first: 2) {
performance.actor {
actor: name@en
}
performance.character {
character: name@en
}
}
country {
country: name@en
}
}
}
}
{{< /runnable >}}
You can also apply @normalize
on nested query blocks. It will work similarly but only flatten the result of the nested query block where @normalize
has been applied. @normalize
will return a list irrespective of the type of attribute on which it is applied.
{{< runnable >}}
{
director(func:allofterms(name@en, "steven spielberg")) {
director: name@en
director.film {
film: name@en
initial_release_date
starring(first: 2) @normalize {
performance.actor {
actor: name@en
}
performance.character {
character: name@en
}
}
country {
country: name@en
}
}
}
}
{{< /runnable >}}
The @ignorereflex
directive forces the removal of child nodes that are reachable from themselves as a parent, through any path in the query result
Query Example: All the co-actors of Rutger Hauer. Without @ignorereflex
, the result would also include Rutger Hauer for every movie.
{{< runnable >}} { coactors(func: eq(name@en, "Rutger Hauer")) @ignorereflex { actor.film { performance.film { starring { performance.actor { name@en } } } } } } {{< /runnable >}}
For the purposes of debugging, you can attach a query parameter debug=true
to a query. Attaching this parameter lets you retrieve the uid
attribute for all the entities along with the server_latency
and start_ts
information under the extensions
key of the response.
parsing_ns
: Latency in nanoseconds to parse the query.processing_ns
: Latency in nanoseconds to process the query.encoding_ns
: Latency in nanoseconds to encode the JSON response.start_ts
: The logical start timestamp of the transaction.
Query with debug as a query parameter
curl -H "Content-Type: application/graphql+-" http://localhost:8080/query?debug=true -XPOST -d $'{
tbl(func: allofterms(name@en, "The Big Lebowski")) {
name@en
}
}' | python -m json.tool | less
Returns uid
and server_latency
{
"data": {
"tbl": [
{
"uid": "0x41434",
"name@en": "The Big Lebowski"
},
{
"uid": "0x145834",
"name@en": "The Big Lebowski 2"
},
{
"uid": "0x2c8a40",
"name@en": "Jeffrey \"The Big\" Lebowski"
},
{
"uid": "0x3454c4",
"name@en": "The Big Lebowski"
}
],
"extensions": {
"server_latency": {
"parsing_ns": 18559,
"processing_ns": 802990982,
"encoding_ns": 1177565
},
"txn": {
"start_ts": 40010
}
}
}
}
For each predicate, the schema specifies the target's type. If a predicate p
has type T
, then for all subject-predicate-object triples s p o
the object o
is of schema type T
.
-
On mutations, scalar types are checked and an error thrown if the value cannot be converted to the schema type.
-
On query, value results are returned according to the schema type of the predicate.
If a schema type isn't specified before a mutation adds triples for a predicate, then the type is inferred from the first mutation. This type is either:
-
type
uid
, if the first mutation for the predicate has nodes for the subject and object, or -
derived from the [RDF type]({{< relref "#rdf-types" >}}), if the object is a literal and an RDF type is present in the first mutation, or
-
default
type, otherwise.
Dgraph supports scalar types and the UID type.
For all triples with a predicate of scalar types the object is a literal.
Dgraph Type | Go type |
---|---|
default |
string |
int |
int64 |
float |
float |
string |
string |
bool |
bool |
dateTime |
time.Time (RFC3339 format [Optional timezone] eg: 2006-01-02T15:04:05.999999999+10:00 or 2006-01-02T15:04:05.999999999) |
geo |
go-geom |
password |
string (encrypted) |
{{% notice "note" %}}Dgraph supports date and time formats for dateTime
scalar type only if they
are RFC 3339 compatible which is different from ISO 8601(as defined in the RDF spec). You should
convert your values to RFC 3339 format before sending them to Dgraph.{{% /notice %}}
The uid
type denotes a node-node edge; internally each node is represented as a uint64
id.
Dgraph Type | Go type |
---|---|
uid |
uint64 |
Schema mutations add or modify schema.
Multiple scalar values can also be added for a S P
by specifying the schema to be of
list type. Occupations in the example below can store a list of strings for each S P
.
An index is specified with @index
, with arguments to specify the tokenizer. When specifying an
index for a predicate it is mandatory to specify the type of the index. For example:
name: string @index(exact, fulltext) @count .
multiname: string @lang .
age: int @index(int) .
friend: [uid] @count .
dob: dateTime .
location: geo @index(geo) .
occupations: [string] @index(term) .
If no data has been stored for the predicates, a schema mutation sets up an empty schema ready to receive triples.
If data is already stored before the mutation, existing values are not checked to conform to the new schema. On query, Dgraph tries to convert existing values to the new schema types, ignoring any that fail conversion.
If data exists and new indices are specified in a schema mutation, any index not in the updated list is dropped and a new index is created for every new tokenizer specified.
Reverse edges are also computed if specified by a schema mutation.
Any alphanumeric combination of a predicate name is permitted. Dgraph also supports Internationalized Resource Identifiers (IRIs). You can read more in Predicates i18n.
Single special characters are not accepted, which includes the special characters from IRIs. They have to be prefixed/suffixed with alphanumeric characters.
][&*()_-+=!#$%
Note: You are not restricted to use @ suffix, but the suffix character gets ignored.
The special characters below are not accepted.
^}|{`\~
If your predicate is a URI or has language-specific characters, then enclose
it with angle brackets <>
when executing the schema mutation.
{{% notice "note" %}}Dgraph supports Internationalized Resource Identifiers (IRIs) for predicate names and values.{{% /notice %}}
Schema syntax:
<职业>: string @index(exact) .
<年龄>: int @index(int) .
<地点>: geo @index(geo) .
<公司>: string .
This syntax allows for internationalized predicate names, but full-text indexing still defaults to English.
To use the right tokenizer for your language, you need to use the @lang
directive and enter values using your
language tag.
Schema:
<公司>: string @index(fulltext) @lang .
Mutation:
{
set {
_:a <公司> "Dgraph Labs Inc"@en .
_:b <公司> "夏新科技有限责任公司"@zh .
_:a <dgraph.type> "Company" .
}
}
Query:
{
q(func: alloftext(<公司>@zh, "夏新科技有限责任公司")) {
uid
<公司>@.
}
}
To use [upsert operations]({{< relref "howto/index.md#upserts">}}) on a
predicate, specify the @upsert
directive in the schema. When committing
transactions involving predicates with the @upsert
directive, Dgraph checks
index keys for conflicts, helping to enforce uniqueness constraints when running
concurrent upserts.
This is how you specify the upsert directive for a predicate.
email: string @index(exact) @upsert .
To use Noconflict directive on a predicate, specify the @noconflict
directive in the schema.
It prevents conflict detection at the predicate level. This is an experimental feature and not a
recommended directive but exists to help avoid conflicts for predicates that don't have high
correctness requirements. This can cause data loss, especially when used for predicates with count
index.
This is how you specify the @noconflict
directive for a predicate.
email: string @index(exact) @noconflict .
Dgraph supports a number of [RDF types in mutations]({{< relref "mutations/index.md#language-and-rdf-types" >}}).
As well as implying a schema type for a [first mutation]({{< relref "#schema" >}}), an RDF type can override a schema type for storage.
If a predicate has a schema type and a mutation has an RDF type with a different underlying Dgraph type, the convertibility to schema type is checked, and an error is thrown if they are incompatible, but the value is stored in the RDF type's corresponding Dgraph type. Query results are always returned in schema type.
For example, if no schema is set for the age
predicate. Given the mutation
{
set {
_:a <age> "15"^^<xs:int> .
_:b <age> "13" .
_:c <age> "14"^^<xs:string> .
_:d <age> "14.5"^^<xs:string> .
_:e <age> "14.5" .
}
}
Dgraph:
- sets the schema type to
int
, as implied by the first triple, - converts
"13"
toint
on storage, - checks
"14"
can be converted toint
, but stores asstring
, - throws an error for the remaining two triples, because
"14.5"
can't be converted toint
.
The following types are also accepted.
A password for an entity is set with setting the schema for the attribute to be of type password
. Passwords cannot be queried directly, only checked for a match using the checkpwd
function.
The passwords are encrypted using bcrypt.
For example: to set a password, first set schema, then the password:
pass: password .
{
set {
<0x123> <name> "Password Example" .
<0x123> <pass> "ThePassword" .
}
}
to check a password:
{
check(func: uid(0x123)) {
name
checkpwd(pass, "ThePassword")
}
}
output:
{
"data": {
"check": [
{
"name": "Password Example",
"checkpwd(pass)": true
}
]
}
}
You can also use alias with password type.
{
check(func: uid(0x123)) {
name
secret: checkpwd(pass, "ThePassword")
}
}
output:
{
"data": {
"check": [
{
"name": "Password Example",
"secret": true
}
]
}
}
{{% notice "note" %}}Filtering on a predicate by applying a [function]({{< relref "#functions" >}}) requires an index.{{% /notice %}}
When filtering by applying a function, Dgraph uses the index to make the search through a potentially large dataset efficient.
All scalar types can be indexed.
Types int
, float
, bool
and geo
have only a default index each: with tokenizers named int
, float
, bool
and geo
.
Types string
and dateTime
have a number of indices.
The indices available for strings are as follows.
Dgraph function | Required index / tokenizer | Notes |
---|---|---|
eq |
hash , exact , term , or fulltext |
The most performant index for eq is hash . Only use term or fulltext if you also require term or full-text search. If you're already using term , there is no need to use hash or exact as well. |
le , ge , lt , gt |
exact |
Allows faster sorting. |
allofterms , anyofterms |
term |
Allows searching by a term in a sentence. |
alloftext , anyoftext |
fulltext |
Matching with language specific stemming and stopwords. |
regexp |
trigram |
Regular expression matching. Can also be used for equality checking. |
{{% notice "warning" %}} Incorrect index choice can impose performance penalties and an increased transaction conflict rate. Use only the minimum number of and simplest indexes that your application needs. {{% /notice %}}
The indices available for dateTime
are as follows.
Index name / Tokenizer | Part of date indexed |
---|---|
year |
index on year (default) |
month |
index on year and month |
day |
index on year, month and day |
hour |
index on year, month, day and hour |
The choices of dateTime
index allow selecting the precision of the index. Applications, such as the movies examples in these docs, that require searching over dates but have relatively few nodes per year may prefer the year
tokenizer; applications that are dependent on fine grained date searches, such as real-time sensor readings, may prefer the hour
index.
All the dateTime
indices are sortable.
Not all the indices establish a total order among the values that they index. Sortable indices allow inequality functions and sorting.
- Indexes
int
andfloat
are sortable. string
indexexact
is sortable.- All
dateTime
indices are sortable.
For example, given an edge name
of string
type, to sort by name
or perform inequality filtering on names, the exact
index must have been specified. In which case a schema query would return at least the following tokenizers.
{
"predicate": "name",
"type": "string",
"index": true,
"tokenizer": [
"exact"
]
}
For predicates with the @count
Dgraph indexes the number of edges out of each node. This enables fast queries of the form:
{
q(func: gt(count(pred), threshold)) {
...
}
}
Predicate with scalar types can also store a list of values if specified in the schema. The scalar
type needs to be enclosed within []
to indicate that its a list type. These lists are like an
unordered set.
occupations: [string] .
score: [int] .
- A set operation adds to the list of values. The order of the stored values is non-deterministic.
- A delete operation deletes the value from the list.
- Querying for these predicates would return the list in an array.
- Indexes can be applied on predicates which have a list type and you can use [Functions]({{<ref "#functions">}}) on them.
- Sorting is not allowed using these predicates.
A graph edge is unidirectional. For node-node edges, sometimes modeling requires reverse edges. If only some subject-predicate-object triples have a reverse, these must be manually added. But if a predicate always has a reverse, Dgraph computes the reverse edges if @reverse
is specified in the schema.
The reverse edge of anEdge
is ~anEdge
.
For existing data, Dgraph computes all reverse edges. For data added after the schema mutation, Dgraph computes and stores the reverse edge for each added triple.
A schema query queries for the whole schema:
schema {}
{{% notice "note" %}} Unlike regular queries, the schema query is not surrounded by curly braces. Also, schema queries and regular queries cannot be combined. {{% /notice %}}
You can query for particular schema fields in the query body.
schema {
type
index
reverse
tokenizer
list
count
upsert
lang
}
You can also query for particular predicates:
schema(pred: [name, friend]) {
type
index
reverse
tokenizer
list
count
upsert
lang
}
Types can also be queried. Below are some example queries.
schema(type: Movie) {}
schema(type: [Person, Animal]) {}
Note that type queries do not contain anything between the curly braces. The output will be the entire definition of the requested types.
Dgraph supports a type system that can be used to categorize nodes and query them based on their type. The type system is also used during expand queries.
Types are defined using a GraphQL-like syntax. For example:
type Student {
name
dob
home_address
year
friends
}
Types are declared along with the schema using the Alter endpoint. In order to properly support the above type, a predicate for each of the attributes in the type is also needed, such as:
name: string @index(term) .
dob: datetime .
home_address: string .
year: int .
friends: [uid] .
Reverse predicates can also be included inside a type definition. For example, the type above
could be expanded to include the parent of the student if there's a predicate children
with
a reverse edge (the brackets around the predicate name are needed to properly understand the
special character ~
).
children: [uid] @reverse .
type Student {
name
dob
home_address
year
friends
<~children>
}
Edges can be used in multiple types: for example, name
might be used for both
a person and a pet. Sometimes, however, it's required to use a different
predicate for each type to represent a similar concept. For example, if student
names and book names required different indexes, then the predicates must be
different.
type Student {
student_name
}
type Textbook {
textbook_name
}
student_name: string @index(exact) .
textbook_name: string @lang @index(fulltext) .
Altering the schema for a type that already exists, overwrites the existing definition.
Scalar nodes cannot have types since they only have one attribute and its type
is the type of the node. UID nodes can have a type. The type is set by setting
the value of the dgraph.type
predicate for that node. A node can have multiple
types. Here's an example of how to set the types of a node:
{
set {
_:a <name> "Garfield" .
_:a <dgraph.type> "Pet" .
_:a <dgraph.type> "Animal" .
}
}
dgraph.type
is a reserved predicate and cannot be removed or modified.
Types can be used as a top level function in the query language. For example:
{
q(func: type(Animal)) {
uid
name
}
}
This query will only return nodes whose type is set to Animal
.
Types can also be used to filter results inside a query. For example:
{
q(func: has(parent)) {
uid
parent @filter(type(Person)) {
uid
name
}
}
}
This query will return the nodes that have a parent predicate and only the
parent
's of type Person
.
Type definitions can be deleted using the Alter endpoint. All that is needed is
to send an operation object with the field DropOp
(or drop_op
depending on
the client) to the enum value TYPE
and the field 'DropValue' (or drop_value
)
to the type that is meant to be deleted.
Below is an example deleting the type Person
using the Go client:
err := c.Alter(context.Background(), &api.Operation{
DropOp: api.Operation_TYPE,
DropValue: "Person"})
Queries using [expand]({{< relref "#expand-predicates" >}}) (i.e.:
expand(_all_)
) require that the nodes to be expanded have types.
Dgraph supports facets --- key value pairs on edges --- as an extension to RDF triples. That is, facets add properties to edges, rather than to nodes.
For example, a friend
edge between two nodes may have a boolean property of close
friendship.
Facets can also be used as weights
for edges.
Though you may find yourself leaning towards facets many times, they should not be misused. It wouldn't be correct modeling to give the friend
edge a facet date_of_birth
. That should be an edge for the friend. However, a facet like start_of_friendship
might be appropriate. Facets are however not first class citizen in Dgraph like predicates.
Facet keys are strings and values can be string
, bool
, int
, float
and dateTime
.
For int
and float
, only 32-bit signed integers and 64-bit floats are accepted.
The following mutation is used throughout this section on facets. The mutation adds data for some peoples and, for example, records a since
facet in mobile
and car
to record when Alice bought the car and started using the mobile number.
First we add some schema.
curl localhost:8080/alter -XPOST -d $'
name: string @index(exact, term) .
rated: [uid] @reverse @count .
' | python -m json.tool | less
curl -H "Content-Type: application/rdf" localhost:8080/mutate?commitNow=true -XPOST -d $'
{
set {
# -- Facets on scalar predicates
_:alice <name> "Alice" .
_:alice <dgraph.type> "Person" .
_:alice <mobile> "040123456" (since=2006-01-02T15:04:05) .
_:alice <car> "MA0123" (since=2006-02-02T13:01:09, first=true) .
_:bob <name> "Bob" .
_:bob <dgraph.type> "Person" .
_:bob <car> "MA0134" (since=2006-02-02T13:01:09) .
_:charlie <name> "Charlie" .
_:charlie <dgraph.type> "Person" .
_:dave <name> "Dave" .
_:dave <dgraph.type> "Person" .
# -- Facets on UID predicates
_:alice <friend> _:bob (close=true, relative=false) .
_:alice <friend> _:charlie (close=false, relative=true) .
_:alice <friend> _:dave (close=true, relative=true) .
# -- Facets for variable propagation
_:movie1 <name> "Movie 1" .
_:movie1 <dgraph.type> "Movie" .
_:movie2 <name> "Movie 2" .
_:movie2 <dgraph.type> "Movie" .
_:movie3 <name> "Movie 3" .
_:movie3 <dgraph.type> "Movie" .
_:alice <rated> _:movie1 (rating=3) .
_:alice <rated> _:movie2 (rating=2) .
_:alice <rated> _:movie3 (rating=5) .
_:bob <rated> _:movie1 (rating=5) .
_:bob <rated> _:movie2 (rating=5) .
_:bob <rated> _:movie3 (rating=5) .
_:charlie <rated> _:movie1 (rating=2) .
_:charlie <rated> _:movie2 (rating=5) .
_:charlie <rated> _:movie3 (rating=1) .
}
}' | python -m json.tool | less
Querying name
, mobile
and car
of Alice gives the same result as without facets.
{{< runnable >}} { data(func: eq(name, "Alice")) { name mobile car } } {{</ runnable >}}
The syntax @facets(facet-name)
is used to query facet data. For Alice the since
facet for mobile
and car
are queried as follows.
{{< runnable >}} { data(func: eq(name, "Alice")) { name mobile @facets(since) car @facets(since) } } {{</ runnable >}}
Facets are returned at the same level as the corresponding edge and have keys like edge|facet.
All facets on an edge are queried with @facets
.
{{< runnable >}} { data(func: eq(name, "Alice")) { name mobile @facets car @facets } } {{</ runnable >}}
Facets keys and values can use language-specific characters directly when mutating. But facet keys need to be enclosed in angle brackets <>
when querying. This is similar to predicates. See Predicates i18n for more info.
{{% notice "note" %}}Dgraph supports Internationalized Resource Identifiers (IRIs) for facet keys when querying.{{% /notice %}}
Example:
{
set {
_:person1 <name> "Daniel" (वंश="स्पेनी", ancestry="Español") .
_:person1 <dgraph.type> "Person" .
_:person2 <name> "Raj" (वंश="हिंदी", ancestry="हिंदी") .
_:person2 <dgraph.type> "Person" .
_:person3 <name> "Zhang Wei" (वंश="चीनी", ancestry="中文") .
_:person3 <dgraph.type> "Person" .
}
}
Query, notice the <>
's:
{
q(func: has(name)) {
name @facets(<वंश>)
}
}
Alias can be specified while requesting specific predicates. Syntax is similar to how would request
alias for other predicates. orderasc
and orderdesc
are not allowed as alias as they have special
meaning. Apart from that anything else can be set as alias.
Here we set car_since
, close_friend
alias for since
, close
facets respectively.
{{< runnable >}}
{
data(func: eq(name, "Alice")) {
name
mobile
car @facets(car_since: since)
friend @facets(close_friend: close) {
name
}
}
}
{{</ runnable >}}
Facets on UID edges work similarly to facets on value edges.
For example, friend
is an edge with facet close
.
It was set to true for friendship between Alice and Bob
and false for friendship between Alice and Charlie.
A query for friends of Alice.
{{< runnable >}} { data(func: eq(name, "Alice")) { name friend { name } } } {{</ runnable >}}
A query for friends and the facet close
with @facets(close)
.
{{< runnable >}} { data(func: eq(name, "Alice")) { name friend @facets(close) { name } } } {{</ runnable >}}
For uid edges like friend
, facets go to the corresponding child under the key edge|facet. In the above
example you can see that the close
facet on the edge between Alice and Bob appears with the key friend|close
along with Bob's results.
{{< runnable >}} { data(func: eq(name, "Alice")) { name friend @facets { name car @facets } } } {{</ runnable >}}
Bob has a car
and it has a facet since
, which, in the results, is part of the same object as Bob
under the key car|since.
Also, the close
relationship between Bob and Alice is part of Bob's output object.
Charlie does not have car
edge and thus only UID facets.
Dgraph supports filtering edges based on facets. Filtering works similarly to how it works on edges without facets and has the same available functions.
Find Alice's close friends {{< runnable >}} { data(func: eq(name, "Alice")) { friend @facets(eq(close, true)) { name } } } {{</ runnable >}}
To return facets as well as filter, add another @facets(<facetname>)
to the query.
{{< runnable >}} { data(func: eq(name, "Alice")) { friend @facets(eq(close, true)) @facets(relative) { # filter close friends and give relative status name } } } {{</ runnable >}}
Facet queries can be composed with AND
, OR
and NOT
.
{{< runnable >}} { data(func: eq(name, "Alice")) { friend @facets(eq(close, true) AND eq(relative, true)) @facets(relative) { # filter close friends in my relation name } } } {{</ runnable >}}
Sorting is possible for a facet on a uid edge. Here we sort the movies rated by Alice, Bob and
Charlie by their rating
which is a facet.
{{< runnable >}} { me(func: anyofterms(name, "Alice Bob Charlie")) { name rated @facets(orderdesc: rating) { name } } } {{</ runnable >}}
Facets on UID edges can be stored in [value variables]({{< relref "#value-variables" >}}). The variable is a map from the edge target to the facet value.
Alice's friends reported by variables for close
and relative
.
{{< runnable >}}
{
var(func: eq(name, "Alice")) {
friend @facets(a as close, b as relative)
}
friend(func: uid(a)) { name val(a) }
relative(func: uid(b)) { name val(b) } } {{</ runnable >}}
Facet values of int
and float
can be assigned to variables and thus the [values propagate]({{< relref "#variable-propagation" >}}).
Alice, Bob and Charlie each rated every movie. A value variable on facet rating
maps movies to ratings. A query that reaches a movie through multiple paths sums the ratings on each path. The following sums Alice, Bob and Charlie's ratings for the three movies.
{{}} { var(func: anyofterms(name, "Alice Bob Charlie")) { num_raters as math(1) rated @facets(r as rating) { total_rating as math(r) # sum of the 3 ratings average_rating as math(total_rating / num_raters) } } data(func: uid(total_rating)) { name val(total_rating) val(average_rating) }
} {{</ runnable >}}
Facet values assigned to value variables can be aggregated.
{{< runnable >}} { data(func: eq(name, "Alice")) { name rated @facets(r as rating) { name } avg(val(r)) } } {{</ runnable >}}
Note though that r
is a map from movies to the sum of ratings on edges in the query reaching the movie. Hence, the following does not correctly calculate the average ratings for Alice and Bob individually --- it calculates 2 times the average of both Alice and Bob's ratings.
{{< runnable >}}
{ data(func: anyofterms(name, "Alice Bob")) { name rated @facets(r as rating) { name } avg(val(r)) } } {{</ runnable >}}
Calculating the average ratings of users requires a variable that maps users to the sum of their ratings.
{{< runnable >}}
{ var(func: has(rated)) { num_rated as math(1) rated @facets(r as rating) { avg_rating as math(r / num_rated) } }
data(func: uid(avg_rating)) { name val(avg_rating) } } {{</ runnable >}}
The shortest path between a source (from
) node and destination (to
) node can be found using the keyword shortest
for the query block name. It requires the source node UID, destination node UID and the predicates (at least one) that have to be considered for traversal. A shortest
query block returns the shortest path under _path_
in the query response. The path can also be stored in a variable which is used in other query blocks.
By default the shortest path is returned. With numpaths: k
, the k-shortest paths are returned. With depth: n
, the shortest paths up to n
hops away are returned.
{{% notice "note" %}}
- If no predicates are specified in the
shortest
block, no path can be fetched as no edge is traversed. - If you're seeing queries take a long time, you can set a gRPC deadline to stop the query after a certain amount of time. {{% /notice %}}
For example:
curl localhost:8080/alter -XPOST -d $'
name: string @index(exact) .
' | python -m json.tool | less
curl -H "Content-Type: application/rdf" localhost:8080/mutate?commitNow=true -XPOST -d $'
{
set {
_:a <friend> _:b (weight=0.1) .
_:b <friend> _:c (weight=0.2) .
_:c <friend> _:d (weight=0.3) .
_:a <friend> _:d (weight=1) .
_:a <name> "Alice" .
_:a <dgraph.type> "Person" .
_:b <name> "Bob" .
_:b <dgraph.type> "Person" .
_:c <name> "Tom" .
_:c <dgraph.type> "Person" .
_:d <name> "Mallory" .
_:d <dgraph.type> "Person" .
}
}' | python -m json.tool | less
The shortest path between Alice and Mallory (assuming UIDs 0x2 and 0x5 respectively) can be found with query:
curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{
path as shortest(from: 0x2, to: 0x5) {
friend
}
path(func: uid(path)) {
name
}
}' | python -m json.tool | less
Which returns the following results. (Note, without considering the weight
facet, each edges' weight is considered as 1)
{
"data": {
"path": [
{
"name": "Alice"
},
{
"name": "Mallory"
}
],
"_path_": [
{
"uid": "0x2",
"friend": [
{
"uid": "0x5"
}
]
}
]
}
}
We can return more paths by specifying numpaths
. Setting numpaths: 2
returns the shortest two paths:
curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{
A as var(func: eq(name, "Alice"))
M as var(func: eq(name, "Mallory"))
path as shortest(from: uid(A), to: uid(M), numpaths: 2) {
friend
}
path(func: uid(path)) {
name
}
}' | python -m json.tool | less
{{% notice "note" %}}In the query above, instead of using UID literals, we query both people using var blocks and the uid()
function. You can also combine it with [GraphQL Variables]({{< relref "#graphql-variables" >}}).{{% /notice %}}
Edges weights are included by using facets on the edges as follows.
{{% notice "note" %}}Only one facet per predicate is allowed in the shortest query block.{{% /notice %}}
curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{
path as shortest(from: 0x2, to: 0x5) {
friend @facets(weight)
}
path(func: uid(path)) {
name
}
}' | python -m json.tool | less
{
"data": {
"path": [
{
"name": "Alice"
},
{
"name": "Bob"
},
{
"name": "Tom"
},
{
"name": "Mallory"
}
],
"_path_": [
{
"uid": "0x2",
"friend": [
{
"uid": "0x3",
"friend|weight": 0.1,
"friend": [
{
"uid": "0x4",
"friend|weight": 0.2,
"friend": [
{
"uid": "0x5",
"friend|weight": 0.3
}
]
}
]
}
]
}
]
}
}
Constraints can be applied to the intermediate nodes as follows.
curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{
path as shortest(from: 0x2, to: 0x5) {
friend @filter(not eq(name, "Bob")) @facets(weight)
relative @facets(liking)
}
relationship(func: uid(path)) {
name
}
}' | python -m json.tool | less
The k-shortest path algorithm (used when numpaths
> 1) also accepts the arguments minweight
and maxweight
, which take a float as their value. When they are passed, only paths within the weight range [minweight, maxweight]
will be considered as valid paths. This can be used, for example, to query the shortest paths that traverse between 2 and 4 nodes.
curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{
path as shortest(from: 0x2, to: 0x5, numpaths: 2, minweight: 2, maxweight: 4) {
friend
}
path(func: uid(path)) {
name
}
}' | python -m json.tool | less
Some points to keep in mind for shortest path queries:
- Weights must be non-negative. Dijkstra's algorithm is used to calculate the shortest paths.
- Only one facet per predicate in the shortest query block is allowed.
- Only one
shortest
path block is allowed per query. Only one_path_
is returned in the result. - For k-shortest paths (when
numpaths
> 1), the result of the shortest path query variable will only return a single path. All k paths are returned in_path_
.
Recurse
queries let you traverse a set of predicates (with filter, facets, etc.) until we reach all leaf nodes or we reach the maximum depth which is specified by the depth
parameter.
To get 10 movies from a genre that has more than 30000 films and then get two actors for those movies we'd do something as follows: {{< runnable >}} { me(func: gt(count(~genre), 30000), first: 1) @recurse(depth: 5, loop: true) { name@en ~genre (first:10) @filter(gt(count(starring), 2)) starring (first: 2) performance.actor } } {{< /runnable >}} Some points to keep in mind while using recurse queries are:
- You can specify only one level of predicates after root. These would be traversed recursively. Both scalar and entity-nodes are treated similarly.
- Only one recurse block is advised per query.
- Be careful as the result size could explode quickly and an error would be returned if the result set gets too large. In such cases use more filters, limit results using pagination, or provide a depth parameter at root as shown in the example above.
- The
loop
parameter can be set to false, in which case paths which lead to a loop would be ignored while traversing. - If not specified, the value of the
loop
parameter defaults to false. - If the value of the
loop
parameter is false and depth is not specified,depth
will default tomath.MaxUint64
, which means that the entire graph might be traversed until all the leaf nodes are reached.
fragment
keyword allows you to define new fragments that can be referenced in a query, as per GraphQL specification. The point is that if there are multiple parts which query the same set of fields, you can define a fragment and refer to it multiple times instead. Fragments can be nested inside fragments, but no cycles are allowed. Here is one contrived example.
curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'
query {
debug(func: uid(1)) {
name@en
...TestFrag
}
}
fragment TestFrag {
initial_release_date
...TestFragB
}
fragment TestFragB {
country
}' | python -m json.tool | less
Variables
can be defined and used in queries which helps in query reuse and avoids costly string building in clients at runtime by passing a separate variable map. A variable starts with a $
symbol.
For HTTP requests with GraphQL Variables, we must use Content-Type: application/json
header and pass data with a JSON object containing query
and variables
.
curl -H "Content-Type: application/json" localhost:8080/query -XPOST -d $'{
"query": "query test($a: string) { test(func: eq(name, $a)) { \n uid \n name \n } }",
"variables": { "$a": "Alice" }
}' | python -m json.tool | less
{{< runnable vars="{"$a": "5", "$b": "10", "$name": "Steven Spielberg"}" >}} query test($a: int, $b: int, $name: string) { me(func: allofterms(name@en, $name)) { name@en director.film (first: $a, offset: $b) { name @en genre(first: $a) { name@en } } } } {{< /runnable >}}
- Variables can have default values. In the example below,
$a
has a default value of2
. Since the value for$a
isn't provided in the variable map,$a
takes on the default value. - Variables whose type is suffixed with a
!
can't have a default value but must have a value as part of the variables map. - The value of the variable must be parsable to the given type, if not, an error is thrown.
- The variable types that are supported as of now are:
int
,float
,bool
andstring
. - Any variable that is being used must be declared in the named query clause in the beginning.
{{< runnable vars="{"$b": "10", "$name": "Steven Spielberg"}" >}} query test($a: int = 2, $b: int!, $name: string) { me(func: allofterms(name@en, $name)) { director.film (first: $a, offset: $b) { genre(first: $a) { name@en } } } } {{< /runnable >}}
You can also use array with GraphQL Variables.
{{< runnable vars="{"$b": "10", "$aName": "Steven Spielberg", "$bName": "Quentin Tarantino"}" >}} query test($a: int = 2, $b: int!, $aName: string, $bName: string) { me(func: eq(name@en, [$aName, $bName])) { director.film (first: $a, offset: $b) { genre(first: $a) { name@en } } } } {{< /runnable >}}
We also support variable substituion in facets now. {{< runnable vars="{"$name": "Alice"}" >}} query test($name: string = "Alice") { data(func: eq(name, $name)) { friend @facets(eq(close, true)) { name } } } {{</ runnable >}}
{{% notice "note" %}}
If you want to input a list of uids as a GraphQL variable value, you can have the variable as string type and
have the value surrounded by square brackets like ["13", "14"]
.
{{% /notice %}}
Dgraph comes with a large toolkit of builtin indexes, but sometimes for niche use cases they're not always enough.
Dgraph allows you to implement custom tokenizers via a plugin system in order to fill the gaps.
The plugin system uses Go's pkg/plugin
.
This brings some restrictions to how plugins can be used.
-
Plugins must be written in Go.
-
As of Go 1.9,
pkg/plugin
only works on Linux. Therefore, plugins will only work on Dgraph instances deployed in a Linux environment. -
The version of Go used to compile the plugin should be the same as the version of Go used to compile Dgraph itself. Dgraph always uses the latest version of Go (and so should you!).
{{% notice "note" %}} You should consider Go's plugin documentation to be supplementary to the documentation provided here. {{% /notice %}}
Plugins are implemented as their own main package. They must export a particular symbol that allows Dgraph to hook into the custom logic the plugin provides.
The plugin must export a symbol named Tokenizer
. The type of the symbol must
be func() interface{}
. When the function is called the result returned should
be a value that implements the following interface:
type PluginTokenizer interface {
// Name is the name of the tokenizer. It should be unique among all
// builtin tokenizers and other custom tokenizers. It identifies the
// tokenizer when an index is set in the schema and when search/filter
// is used in queries.
Name() string
// Identifier is a byte that uniquely identifiers the tokenizer.
// Bytes in the range 0x80 to 0xff (inclusive) are reserved for
// custom tokenizers.
Identifier() byte
// Type is a string representing the type of data that is to be
// tokenized. This must match the schema type of the predicate
// being indexed. Allowable values are shown in the table below.
Type() string
// Tokens should implement the tokenization logic. The input is
// the value to be tokenized, and will always have a concrete type
// corresponding to Type(). The return value should be a list of
// the tokens generated.
Tokens(interface{}) ([]string, error)
}
The return value of Type()
corresponds to the concrete input type of
Tokens(interface{})
in the following way:
Type() return value |
Tokens(interface{}) input type |
---|---|
"int" |
int64 |
"float" |
float64 |
"string" |
string |
"bool" |
bool |
"datetime" |
time.Time |
The plugin has to be built using the plugin
build mode so that an .so
file
is produced instead of a regular executable. For example:
go build -buildmode=plugin -o myplugin.so ~/go/src/myplugin/main.go
When starting Dgraph, use the --custom_tokenizers
flag to tell Dgraph which
tokenizers to load. It accepts a comma separated list of plugins. E.g.
dgraph ...other-args... --custom_tokenizers=plugin1.so,plugin2.so
{{% notice "note" %}} Plugin validation is performed on startup. If a problem is detected, Dgraph will refuse to initialise. {{% /notice %}}
To use a tokenization plugin, an index has to be created in the schema.
The syntax is the same as adding any built-in index. To add an custom index
using a tokenizer plugin named foo
to a string
predicate named
my_predicate
, use the following in the schema:
my_predicate: string @index(foo) .
There are two functions that can use custom indexes:
Mode | Behaviour |
---|---|
anyof |
Returns nodes that match on any of the tokens generated |
allof |
Returns nodes that match on all of the tokens generated |
The functions can be used either at the query root or in filters.
There behaviour here an analogous to anyofterms
/allofterms
and
anyoftext
/alloftext
.
The following examples should make the process of writing a tokenization plugin more concrete.
This example shows the type of tokenization that is similar to term tokenization of full-text search. Instead of being broken down into terms or stem words, the text is instead broken down into its constituent unicode codepoints (in Go terminology these are called runes).
{{% notice "note" %}} This tokenizer would create a very large index that would be expensive to manage and store. That's one of the reasons that text indexing usually occurs at a higher level; stem words for full-text search or terms for term search. {{% /notice %}}
The implementation of the plugin looks like this:
package main
import "encoding/binary"
func Tokenizer() interface{} { return RuneTokenizer{} }
type RuneTokenizer struct{}
func (RuneTokenizer) Name() string { return "rune" }
func (RuneTokenizer) Type() string { return "string" }
func (RuneTokenizer) Identifier() byte { return 0xfd }
func (t RuneTokenizer) Tokens(value interface{}) ([]string, error) {
var toks []string
for _, r := range value.(string) {
var buf [binary.MaxVarintLen32]byte
n := binary.PutVarint(buf[:], int64(r))
tok := string(buf[:n])
toks = append(toks, tok)
}
return toks, nil
}
Hints and tips:
-
Inside
Tokens
, you can assume thatvalue
will have concrete type corresponding to that specified byType()
. It's safe to do a type assertion. -
Even though the return value is
[]string
, you can always store non-unicode data inside the string. See this blogpost for some interesting background how string are implemented in Go and why they can be used to store non-textual data. By storing arbitrary data in the string, you can make the index more compact. In this case, varints are stored in the return values.
Setting up the indexing and adding data:
name: string @index(rune) .
{
set{
_:ad <name> "Adam" .
_:ad <dgraph.type> "Person" .
_:aa <name> "Aaron" .
_:aa <dgraph.type> "Person" .
_:am <name> "Amy" .
_:am <dgraph.type> "Person" .
_:ro <name> "Ronald" .
_:ro <dgraph.type> "Person" .
}
}
Now queries can be performed.
The only person that has all of the runes A
and n
in their name
is Aaron:
{
q(func: allof(name, rune, "An")) {
name
}
}
=>
{
"data": {
"q": [
{ "name": "Aaron" }
]
}
}
But there are multiple people who have both of the runes A
and m
:
{
q(func: allof(name, rune, "Am")) {
name
}
}
=>
{
"data": {
"q": [
{ "name": "Amy" },
{ "name": "Adam" }
]
}
}
Case is taken into account, so if you search for all names containing "ron"
,
you would find "Aaron"
, but not "Ronald"
. But if you were to search for
"no"
, you would match both "Aaron"
and "Ronald"
. The order of the runes in
the strings doesn't matter.
It's possible to search for people that have any of the supplied runes in
their names (rather than all of the supplied runes). To do this, use anyof
instead of allof
:
{
q(func: anyof(name, rune, "mr")) {
name
}
}
=>
{
"data": {
"q": [
{ "name": "Adam" },
{ "name": "Aaron" },
{ "name": "Amy" }
]
}
}
"Ronald"
doesn't contain m
or r
, so isn't found by the search.
{{% notice "note" %}}
Understanding what's going on under the hood can help you intuitively
understand how Tokens
method should be implemented.
When Dgraph sees new edges that are to be indexed by your tokenizer, it will tokenize the value. The resultant tokens are used as keys for posting lists. The edge subject is then added to the posting list for each token.
When a query root search occurs, the search value is tokenized. The result of
the search is all of the nodes in the union or intersection of the corresponding
posting lists (depending on whether anyof
or allof
was used).
{{% /notice %}}
Tokenizers don't always have to be about splitting text up into its constituent parts. This example indexes IP addresses into their CIDR ranges. This allows you to search for all IP addresses that fall into a particular CIDR range.
The plugin code is more complicated than the rune example. The input is an IP
address stored as a string, e.g. "100.55.22.11/32"
. The output are the CIDR
ranges that the IP address could possibly fall into. There could be up to 32
different outputs ("100.55.22.11/32"
does indeed have 32 possible ranges, one
for each mask size).
package main
import "net"
func Tokenizer() interface{} { return CIDRTokenizer{} }
type CIDRTokenizer struct{}
func (CIDRTokenizer) Name() string { return "cidr" }
func (CIDRTokenizer) Type() string { return "string" }
func (CIDRTokenizer) Identifier() byte { return 0xff }
func (t CIDRTokenizer) Tokens(value interface{}) ([]string, error) {
_, ipnet, err := net.ParseCIDR(value.(string))
if err != nil {
return nil, err
}
ones, bits := ipnet.Mask.Size()
var toks []string
for i := ones; i >= 1; i-- {
m := net.CIDRMask(i, bits)
tok := net.IPNet{
IP: ipnet.IP.Mask(m),
Mask: m,
}
toks = append(toks, tok.String())
}
return toks, nil
}
An example of using the tokenizer:
Setting up the indexing and adding data:
ip: string @index(cidr) .
{
set{
_:a <ip> "100.55.22.11/32" .
_:b <ip> "100.33.81.19/32" .
_:c <ip> "100.49.21.25/32" .
_:d <ip> "101.0.0.5/32" .
_:e <ip> "100.176.2.1/32" .
}
}
{
q(func: allof(ip, cidr, "100.48.0.0/12")) {
ip
}
}
=>
{
"data": {
"q": [
{ "ip": "100.55.22.11/32" },
{ "ip": "100.49.21.25/32" }
]
}
}
The CIDR ranges of 100.55.22.11/32
and 100.49.21.25/32
are both
100.48.0.0/12
. The other IP addresses in the database aren't included in the
search result, since they have different CIDR ranges for 12 bit masks
(100.32.0.0/12
, 101.0.0.0/12
, 100.154.0.0/12
for 100.33.81.19/32
,
101.0.0.5/32
, and 100.176.2.1/32
respectively).
Note that we're using allof
instead of anyof
. Only allof
will work
correctly with this index. Remember that the tokenizer generates all possible
CIDR ranges for an IP address. If we were to use anyof
then the search result
would include all IP addresses under the 1 bit mask (in this case, 0.0.0.0/1
,
which would match all IPs in this dataset).
Tokenizers don't always have to return multiple tokens. If you just want to index data into groups, have the tokenizer just return an identifying member of that group.
In this example, we want to find groups of words that are anagrams of each other.
A token to correspond to a group of anagrams could just be the letters in the anagram in sorted order, as implemented below:
package main
import "sort"
func Tokenizer() interface{} { return AnagramTokenizer{} }
type AnagramTokenizer struct{}
func (AnagramTokenizer) Name() string { return "anagram" }
func (AnagramTokenizer) Type() string { return "string" }
func (AnagramTokenizer) Identifier() byte { return 0xfc }
func (t AnagramTokenizer) Tokens(value interface{}) ([]string, error) {
b := []byte(value.(string))
sort.Slice(b, func(i, j int) bool { return b[i] < b[j] })
return []string{string(b)}, nil
}
In action:
Setting up the indexing and adding data:
word: string @index(anagram) .
{
set{
_:1 <word> "airmen" .
_:2 <word> "marine" .
_:3 <word> "beat" .
_:4 <word> "beta" .
_:5 <word> "race" .
_:6 <word> "care" .
}
}
{
q(func: allof(word, anagram, "remain")) {
word
}
}
=>
{
"data": {
"q": [
{ "word": "airmen" },
{ "word": "marine" }
]
}
}
Since a single token is only ever generated, it doesn't matter if anyof
or
allof
is used. The result will always be the same.
All of the custom tokenizers shown previously have worked with strings. However, other data types can be used as well. This example is contrived, but nonetheless shows some advanced usages of custom tokenizers.
The tokenizer creates a token for each prime factor in the input.
package main
import (
"encoding/binary"
"fmt"
)
func Tokenizer() interface{} { return FactorTokenizer{} }
type FactorTokenizer struct{}
func (FactorTokenizer) Name() string { return "factor" }
func (FactorTokenizer) Type() string { return "int" }
func (FactorTokenizer) Identifier() byte { return 0xfe }
func (FactorTokenizer) Tokens(value interface{}) ([]string, error) {
x := value.(int64)
if x <= 1 {
return nil, fmt.Errorf("Cannot factor int <= 1: %d", x)
}
var toks []string
for p := int64(2); x > 1; p++ {
if x%p == 0 {
toks = append(toks, encodeInt(p))
for x%p == 0 {
x /= p
}
}
}
return toks, nil
}
func encodeInt(x int64) string {
var buf [binary.MaxVarintLen64]byte
n := binary.PutVarint(buf[:], x)
return string(buf[:n])
}
{{% notice "note" %}}
Notice that the return of Type()
is "int"
, corresponding to the concrete
type of the input to Tokens
(which is int64
).
{{% /notice %}}
This allows you do things like search for all numbers that share prime factors with a particular number.
In particular, we search for numbers that contain any of the prime factors of 15, i.e. any numbers that are divisible by either 3 or 5.
Setting up the indexing and adding data:
num: int @index(factor) .
{
set{
_:2 <num> "2"^^<xs:int> .
_:3 <num> "3"^^<xs:int> .
_:4 <num> "4"^^<xs:int> .
_:5 <num> "5"^^<xs:int> .
_:6 <num> "6"^^<xs:int> .
_:7 <num> "7"^^<xs:int> .
_:8 <num> "8"^^<xs:int> .
_:9 <num> "9"^^<xs:int> .
_:10 <num> "10"^^<xs:int> .
_:11 <num> "11"^^<xs:int> .
_:12 <num> "12"^^<xs:int> .
_:13 <num> "13"^^<xs:int> .
_:14 <num> "14"^^<xs:int> .
_:15 <num> "15"^^<xs:int> .
_:16 <num> "16"^^<xs:int> .
_:17 <num> "17"^^<xs:int> .
_:18 <num> "18"^^<xs:int> .
_:19 <num> "19"^^<xs:int> .
_:20 <num> "20"^^<xs:int> .
_:21 <num> "21"^^<xs:int> .
_:22 <num> "22"^^<xs:int> .
_:23 <num> "23"^^<xs:int> .
_:24 <num> "24"^^<xs:int> .
_:25 <num> "25"^^<xs:int> .
_:26 <num> "26"^^<xs:int> .
_:27 <num> "27"^^<xs:int> .
_:28 <num> "28"^^<xs:int> .
_:29 <num> "29"^^<xs:int> .
_:30 <num> "30"^^<xs:int> .
}
}
{
q(func: anyof(num, factor, 15)) {
num
}
}
=>
{
"data": {
"q": [
{ "num": 3 },
{ "num": 5 },
{ "num": 6 },
{ "num": 9 },
{ "num": 10 },
{ "num": 12 },
{ "num": 15 },
{ "num": 18 }
{ "num": 20 },
{ "num": 21 },
{ "num": 25 },
{ "num": 24 },
{ "num": 27 },
{ "num": 30 },
]
}
}