diff --git a/content/dql/predicate-indexing.md b/content/dql/predicate-indexing.md new file mode 100644 index 00000000..db203408 --- /dev/null +++ b/content/dql/predicate-indexing.md @@ -0,0 +1,217 @@ ++++ +date = "2017-03-20T22:25:17+11:00" +title = "Predicate indexing" +weight = 4 +[menu.main] + parent = "dql" ++++ + +Filtering on a predicate by applying a [function]({{< relref "query-language/functions.md" >}}) requires an index. + +When filtering by applying a function, Dgraph uses the index to make the search through a potentially large dataset efficient. + +All scalar types can be indexed. + +Types `int`, `float`, `bool` and `geo` have only a default index each: with tokenizers named `int`, `float`, `bool` and `geo`. + +Types `string` and `dateTime` have a number of indices. + +## String Indices +The indices available for strings are as follows. + +| Dgraph function | Required index / tokenizer | Notes | +| :----------------------- | :------------ | :--- | +| `eq` | `hash`, `exact`, `term`, or `fulltext` | The most performant index for `eq` is `hash`. Only use `term` or `fulltext` if you also require term or full-text search. If you're already using `term`, there is no need to use `hash` or `exact` as well. | +| `le`, `ge`, `lt`, `gt` | `exact` | Allows faster sorting. | +| `allofterms`, `anyofterms` | `term` | Allows searching by a term in a sentence. | +| `alloftext`, `anyoftext` | `fulltext` | Matching with language specific stemming and stopwords. | +| `regexp` | `trigram` | Regular expression matching. Can also be used for equality checking. | + +{{% notice "warning" %}} +Incorrect index choice can impose performance penalties and an increased +transaction conflict rate. Use only the minimum number of and simplest indexes +that your application needs. +{{% /notice %}} + +## DateTime Indices + +The indices available for `dateTime` are as follows. + +| Index name / Tokenizer | Part of date indexed | +| :----------- | :------------------------------------------------------------------ | +| `year` | index on year (default) | +| `month` | index on year and month | +| `day` | index on year, month and day | +| `hour` | index on year, month, day and hour | + +The choices of `dateTime` index allow selecting the precision of the index. Applications, such as the movies examples in these docs, that require searching over dates but have relatively few nodes per year may prefer the `year` tokenizer; applications that are dependent on fine grained date searches, such as real-time sensor readings, may prefer the `hour` index. + + +All the `dateTime` indices are sortable. + + +## Sortable Indices + +Not all the indices establish a total order among the values that they index. Sortable indices allow inequality functions and sorting. + +* Indexes `int` and `float` are sortable. +* `string` index `exact` is sortable. +* All `dateTime` indices are sortable. + +For example, given an edge `name` of `string` type, to sort by `name` or perform inequality filtering on names, the `exact` index must have been specified. In which case a schema query would return at least the following tokenizers. + +``` +{ + "predicate": "name", + "type": "string", + "index": true, + "tokenizer": [ + "exact" + ] +} +``` + +## Count index + +For predicates with the `@count` Dgraph indexes the number of edges out of each node. This enables fast queries of the form: +``` +{ + q(func: gt(count(pred), threshold)) { + ... + } +} +``` + +## List Type + +Predicate with scalar types can also store a list of values if specified in the schema. The scalar +type needs to be enclosed within `[]` to indicate that its a list type. + +``` +occupations: [string] . +score: [int] . +``` + +* A set operation adds to the list of values. The order of the stored values is non-deterministic. +* A delete operation deletes the value from the list. +* Querying for these predicates would return the list in an array. +* Indexes can be applied on predicates which have a list type and you can use [Functions]({{}}) on them. +* Sorting is not allowed using these predicates. +* These lists are like an unordered set. For example: `["e1", "e1", "e2"]` may get stored as `["e2", "e1"]`, i.e., duplicate values will not be stored and order may not be preserved. + +## Filtering on list + +Dgraph supports filtering based on the list. +Filtering works similarly to how it works on edges and has the same available functions. + +For example, `@filter(eq(occupations, "Teacher"))` at the root of the query or the +parent edge will display all the occupations from a list of each node in an array but +will only include nodes which have `Teacher` as one of the occupations. However, filtering +on value edge is not supported. + +## Reverse Edges + +A graph edge is unidirectional. For node-node edges, sometimes modeling requires reverse edges. If only some subject-predicate-object triples have a reverse, these must be manually added. But if a predicate always has a reverse, Dgraph computes the reverse edges if `@reverse` is specified in the schema. + +The reverse edge of `anEdge` is `~anEdge`. + +For existing data, Dgraph computes all reverse edges. For data added after the schema mutation, Dgraph computes and stores the reverse edge for each added triple. + +``` +type Person { + name string +} +type Car { + regnbr string + owner Person +} +owner uid @reverse . +regnbr string @index(exact) . +name string @index(exact) . +``` + +This makes it possible to query Persons and their cars by using: +``` +q(func type(Person)) { + name + ~owner { name } +} +``` +To get a different key than `~owner` in the result, the query can be written with the wanted label +(`cars` in this case): + +``` +q(func type(Person)) { + name + cars: ~owner { name } +} +``` + +This also works if there are multiple "owners" of a `car`: +``` +owner [uid] @reverse . +``` + +In both cases the `owner` edge should be set on the `Car`: +``` +_:p1 "Mary" . +_:p1 "Person" . +_:c1 "ABC123" . +_:c1 "Car" . +_:c1 _:p1 +``` + +## Querying Schema + +A schema query queries for the whole schema: + +``` +schema {} +``` + +{{% notice "note" %}} Unlike regular queries, the schema query is not surrounded +by curly braces. Also, schema queries and regular queries cannot be combined. +{{% /notice %}} + +You can query for particular schema fields in the query body. + +``` +schema { + type + index + reverse + tokenizer + list + count + upsert + lang +} +``` + +You can also query for particular predicates: + +``` +schema(pred: [name, friend]) { + type + index + reverse + tokenizer + list + count + upsert + lang +} +``` + +{{% notice "note" %}} If ACL is enabled, then the schema query returns only the +predicates for which the logged-in ACL user has read access. {{% /notice %}} + +Types can also be queried. Below are some example queries. + +``` +schema(type: Movie) {} +schema(type: [Person, Animal]) {} +``` + +Note that type queries do not contain anything between the curly braces. The +output will be the entire definition of the requested types. diff --git a/content/dql/predicate-types.md b/content/dql/predicate-types.md index 0c8788ec..254d6242 100644 --- a/content/dql/predicate-types.md +++ b/content/dql/predicate-types.md @@ -6,26 +6,25 @@ weight = 3 parent = "dql" +++ -For each predicate, the schema specifies the target's type. If a predicate `p` has type `T`, then for all subject-predicate-object triples `s p o` the object `o` is of schema type `T`. +A predicate is the smallest piece of information about an object. A predicate can hold a literal value or can describe a relation to another entity : +- when we store that an entity name is "Alice". The predicate is ``name`` and predicate value is the string "Alice". +- when we store that Alice knows Bob, we may use a predicate ``knows`` with the node representing Alice. The value of this predicate would be the [uid]{{}}) of the node representing Bob. In that case, ``knows`` is a [relationship](#relationship). -* On mutations, scalar types are checked and an error thrown if the value cannot be converted to the schema type. -* On query, value results are returned according to the schema type of the predicate. +Dgraph maintains a list of all predicates names and their type in the Dgraph schema. -If a schema type isn't specified before a mutation adds triples for a predicate, then the type is inferred from the first mutation. This type is either: +A predicate type is either created +- by an alter operation (See [Update Dgraph types]({{}}) ) +or +- during a mutation : + If a predicate type isn't specified, then the type is inferred from the first mutation. + If the mutation is using [RDF format]({{}}) with an RDF type, Dgraph uses this information to infer the predicate type. -* type `uid`, if the first mutation for the predicate has nodes for the subject and object, or -* derived from the [RDF type]({{< relref "#rdf-types" >}}), if the object is a literal and an RDF type is present in the first mutation, or +If no type can be inferred, the predicate type is set to `default`. -* `default` type, otherwise. - -## Schema Types - -Dgraph supports scalar types and the UID type. - -### Scalar Types +## Scalar Types For all triples with a predicate of scalar types the object is a literal. @@ -45,119 +44,15 @@ For all triples with a predicate of scalar types the object is a literal. are RFC 3339 compatible which is different from ISO 8601(as defined in the RDF spec). You should convert your values to RFC 3339 format before sending them to Dgraph.{{% /notice %}} -### UID Type +## UID Type -The `uid` type denotes a node-node edge; internally each node is represented as a `uint64` id. +The `uid` type denotes a relationship; internally each node is identified by it's UID which is a `uint64`. | Dgraph Type | Go type | | ------------|:--------| | `uid` | uint64 | -## Adding or Modifying Schema - -Schema mutations add or modify schema. - -Multiple scalar values can also be added for a `S P` by specifying the schema to be of -list type. Occupations in the example below can store a list of strings for each `S P`. - -An index is specified with `@index`, with arguments to specify the tokenizer. When specifying an -index for a predicate it is mandatory to specify the type of the index. For example: - -``` -name: string @index(exact, fulltext) @count . -multiname: string @lang . -age: int @index(int) . -friend: [uid] @count . -dob: dateTime . -location: geo @index(geo) . -occupations: [string] @index(term) . -``` - -If no data has been stored for the predicates, a schema mutation sets up an empty schema ready to receive triples. - -If data is already stored before the mutation, existing values are not checked to conform to the new schema. On query, Dgraph tries to convert existing values to the new schema types, ignoring any that fail conversion. - -If data exists and new indices are specified in a schema mutation, any index not in the updated list is dropped and a new index is created for every new tokenizer specified. - -Reverse edges are also computed if specified by a schema mutation. - -{{% notice "note" %}}You can't define predicate names starting with `dgraph.`, it is reserved as the -namespace for Dgraph's internal types/predicates. For example, defining `dgraph.name` as a -predicate is invalid.{{% /notice %}} - - -## Indexes in Background - -Indexes may take long time to compute depending upon the size of the data. -Starting Dgraph version `20.03.0`, indexes can be computed in the background, -and thus indexing may still be running after an Alter operation returns. -This requires that you wait for indexing to complete before running queries -that require newly created indices. Such queries will fail with an error -notifying that a given predicate is not indexed or doesn't have reverse edges. - -An alter operation will also fail if one is already in progress with an error -`schema is already being modified. Please retry`. Though, mutations can -be successfully executed while indexing is going on. - -For example, let's say we execute an Alter operation with the following schema: - -``` -name: string @index(fulltext, term) . -age: int @index(int) @upsert . -friend: [uid] @count @reverse . -``` - -Once the Alter operation returns, Dgraph will report the following schema -and start background tasks to compute all the new indexes: - -``` -name: string . -age: int @upsert . -friend: [uid] . -``` - -When indexes are done computing, Dgraph will start reporting the indexes in the -schema. In a multi-node cluster, it is possible that the alphas will finish -computing indexes at different times. Alphas may return different schema in such -a case until all the indexes are done computing on all the Alphas. - -Background indexing task may fail if an unexpected error occurs while computing -the indexes. You should retry the Alter operation in order to update the schema, -or sync the schema across all the alphas. - -To learn about how to check background indexing status, see -[Querying Health](https://dgraph.io/docs/main/deploy/dgraph-alpha/#querying-health). - -### HTTP API - -You can specify the flag `runInBackground` to `true` to run -index computation in the background. - -```sh -curl localhost:8080/alter?runInBackground=true -XPOST -d $' - name: string @index(fulltext, term) . - age: int @index(int) @upsert . - friend: [uid] @count @reverse . -' | python -m json.tool | less -``` - -### Grpc API - -You can set `RunInBackground` field to `true` of the `api.Operation` -struct before passing it to the `Alter` function. - -```go -op := &api.Operation{} -op.Schema = ` - name: string @index(fulltext, term) . - age: int @index(int) @upsert . - friend: [uid] @count @reverse . -` -op.RunInBackground = true -err = dg.Alter(context.Background(), op) -``` - ## Predicate name rules @@ -231,7 +126,7 @@ Query: ## Upsert directive -To use [upsert operations]({{< relref "howto/upserts.md">}}) on a +To use [upsert operations]({{}}) on a predicate, specify the `@upsert` directive in the schema. When committing transactions involving predicates with the `@upsert` directive, Dgraph checks index keys for conflicts, helping to enforce uniqueness constraints when running @@ -281,11 +176,9 @@ Dgraph: * checks `"14"` can be converted to `int`, but stores as `string`, * throws an error for the remaining two triples, because `"14.5"` can't be converted to `int`. -## Extended Types -The following types are also accepted. -### Password type +## Password type A password for an entity is set with setting the schema for the attribute to be of type `password`. Passwords cannot be queried directly, only checked for a match using the `checkpwd` function. The passwords are encrypted using [bcrypt](https://en.wikipedia.org/wiki/Bcrypt). @@ -353,214 +246,3 @@ output: } ``` -## Indexing - -{{% notice "note" %}}Filtering on a predicate by applying a [function]({{< relref "query-language/functions.md" >}}) requires an index.{{% /notice %}} - -When filtering by applying a function, Dgraph uses the index to make the search through a potentially large dataset efficient. - -All scalar types can be indexed. - -Types `int`, `float`, `bool` and `geo` have only a default index each: with tokenizers named `int`, `float`, `bool` and `geo`. - -Types `string` and `dateTime` have a number of indices. - -### String Indices -The indices available for strings are as follows. - -| Dgraph function | Required index / tokenizer | Notes | -| :----------------------- | :------------ | :--- | -| `eq` | `hash`, `exact`, `term`, or `fulltext` | The most performant index for `eq` is `hash`. Only use `term` or `fulltext` if you also require term or full-text search. If you're already using `term`, there is no need to use `hash` or `exact` as well. | -| `le`, `ge`, `lt`, `gt` | `exact` | Allows faster sorting. | -| `allofterms`, `anyofterms` | `term` | Allows searching by a term in a sentence. | -| `alloftext`, `anyoftext` | `fulltext` | Matching with language specific stemming and stopwords. | -| `regexp` | `trigram` | Regular expression matching. Can also be used for equality checking. | - -{{% notice "warning" %}} -Incorrect index choice can impose performance penalties and an increased -transaction conflict rate. Use only the minimum number of and simplest indexes -that your application needs. -{{% /notice %}} - -### DateTime Indices - -The indices available for `dateTime` are as follows. - -| Index name / Tokenizer | Part of date indexed | -| :----------- | :------------------------------------------------------------------ | -| `year` | index on year (default) | -| `month` | index on year and month | -| `day` | index on year, month and day | -| `hour` | index on year, month, day and hour | - -The choices of `dateTime` index allow selecting the precision of the index. Applications, such as the movies examples in these docs, that require searching over dates but have relatively few nodes per year may prefer the `year` tokenizer; applications that are dependent on fine grained date searches, such as real-time sensor readings, may prefer the `hour` index. - - -All the `dateTime` indices are sortable. - - -### Sortable Indices - -Not all the indices establish a total order among the values that they index. Sortable indices allow inequality functions and sorting. - -* Indexes `int` and `float` are sortable. -* `string` index `exact` is sortable. -* All `dateTime` indices are sortable. - -For example, given an edge `name` of `string` type, to sort by `name` or perform inequality filtering on names, the `exact` index must have been specified. In which case a schema query would return at least the following tokenizers. - -``` -{ - "predicate": "name", - "type": "string", - "index": true, - "tokenizer": [ - "exact" - ] -} -``` - -### Count index - -For predicates with the `@count` Dgraph indexes the number of edges out of each node. This enables fast queries of the form: -``` -{ - q(func: gt(count(pred), threshold)) { - ... - } -} -``` - -## List Type - -Predicate with scalar types can also store a list of values if specified in the schema. The scalar -type needs to be enclosed within `[]` to indicate that its a list type. - -``` -occupations: [string] . -score: [int] . -``` - -* A set operation adds to the list of values. The order of the stored values is non-deterministic. -* A delete operation deletes the value from the list. -* Querying for these predicates would return the list in an array. -* Indexes can be applied on predicates which have a list type and you can use [Functions]({{}}) on them. -* Sorting is not allowed using these predicates. -* These lists are like an unordered set. For example: `["e1", "e1", "e2"]` may get stored as `["e2", "e1"]`, i.e., duplicate values will not be stored and order may not be preserved. - -## Filtering on list - -Dgraph supports filtering based on the list. -Filtering works similarly to how it works on edges and has the same available functions. - -For example, `@filter(eq(occupations, "Teacher"))` at the root of the query or the -parent edge will display all the occupations from a list of each node in an array but -will only include nodes which have `Teacher` as one of the occupations. However, filtering -on value edge is not supported. - -## Reverse Edges - -A graph edge is unidirectional. For node-node edges, sometimes modeling requires reverse edges. If only some subject-predicate-object triples have a reverse, these must be manually added. But if a predicate always has a reverse, Dgraph computes the reverse edges if `@reverse` is specified in the schema. - -The reverse edge of `anEdge` is `~anEdge`. - -For existing data, Dgraph computes all reverse edges. For data added after the schema mutation, Dgraph computes and stores the reverse edge for each added triple. - -``` -type Person { - name string -} -type Car { - regnbr string - owner Person -} -owner uid @reverse . -regnbr string @index(exact) . -name string @index(exact) . -``` - -This makes it possible to query Persons and their cars by using: -``` -q(func type(Person)) { - name - ~owner { name } -} -``` -To get a different key than `~owner` in the result, the query can be written with the wanted label -(`cars` in this case): - -``` -q(func type(Person)) { - name - cars: ~owner { name } -} -``` - -This also works if there are multiple "owners" of a `car`: -``` -owner [uid] @reverse . -``` - -In both cases the `owner` edge should be set on the `Car`: -``` -_:p1 "Mary" . -_:p1 "Person" . -_:c1 "ABC123" . -_:c1 "Car" . -_:c1 _:p1 -``` - -## Querying Schema - -A schema query queries for the whole schema: - -``` -schema {} -``` - -{{% notice "note" %}} Unlike regular queries, the schema query is not surrounded -by curly braces. Also, schema queries and regular queries cannot be combined. -{{% /notice %}} - -You can query for particular schema fields in the query body. - -``` -schema { - type - index - reverse - tokenizer - list - count - upsert - lang -} -``` - -You can also query for particular predicates: - -``` -schema(pred: [name, friend]) { - type - index - reverse - tokenizer - list - count - upsert - lang -} -``` - -{{% notice "note" %}} If ACL is enabled, then the schema query returns only the -predicates for which the logged-in ACL user has read access. {{% /notice %}} - -Types can also be queried. Below are some example queries. - -``` -schema(type: Movie) {} -schema(type: [Person, Animal]) {} -``` - -Note that type queries do not contain anything between the curly braces. The -output will be the entire definition of the requested types. diff --git a/content/dql/type-system.md b/content/dql/type-system.md index d6049a5a..d58df768 100644 --- a/content/dql/type-system.md +++ b/content/dql/type-system.md @@ -1,7 +1,7 @@ +++ date = "2017-03-20T22:25:17+11:00" title = "Node types" -weight = 4 +weight = 5 [menu.main] parent = "dql" +++ diff --git a/content/howto/dql-schema-request.md b/content/howto/dql-schema-request.md new file mode 100644 index 00000000..0544b25c --- /dev/null +++ b/content/howto/dql-schema-request.md @@ -0,0 +1,61 @@ ++++ +date = "2017-03-20T22:25:17+11:00" +title = "Query Dgraph types" +weight = 14 +[menu.main] + parent = "howto" ++++ + + +The list of predicates and node types is retrieved using a query on the `/query` endpoint. + +``` +schema {} +``` + +{{% notice "note" %}} Unlike regular queries, the schema query is not surrounded +by curly braces. Also, schema queries and regular queries cannot be combined. +{{% /notice %}} + +You can query for particular schema fields in the query body. + +``` +schema { + type + index + reverse + tokenizer + list + count + upsert + lang +} +``` + +You can also query for particular predicates: + +``` +schema(pred: [name, friend]) { + type + index + reverse + tokenizer + list + count + upsert + lang +} +``` + +{{% notice "note" %}} If ACL is enabled, then the schema query returns only the +predicates for which the logged-in ACL user has read access. {{% /notice %}} + +Types can also be queried. Below are some example queries. + +``` +schema(type: Movie) {} +schema(type: [Person, Animal]) {} +``` + +Note that type queries do not contain anything between the curly braces. The +output will be the entire definition of the requested types. diff --git a/content/howto/update-dgraph-types.md b/content/howto/update-dgraph-types.md new file mode 100644 index 00000000..e716c687 --- /dev/null +++ b/content/howto/update-dgraph-types.md @@ -0,0 +1,100 @@ ++++ +date = "2017-03-20T22:25:17+11:00" +title = "Update Dgraph types" +weight = 15 +[menu.main] + parent = "howto" ++++ + + +## Adding or Modifying Dgraph types + +You modify Dgraph types (node types and predicates types) using the /alter endpoint in Raw HTTP or the alter operation of a client library. +### HTTP API + +You can specify the flag `runInBackground` to `true` to run +index computation in the background. + +```sh +curl localhost:8080/alter?runInBackground=true -XPOST -d $' + name: string @index(fulltext, term) . + age: int @index(int) @upsert . + friend: [uid] @count @reverse . +' | python -m json.tool | less +``` + +### Grpc API + +You can set `RunInBackground` field to `true` of the `api.Operation` +struct before passing it to the `Alter` function. + +```go +op := &api.Operation{} +op.Schema = ` + name: string @index(fulltext, term) . + age: int @index(int) @upsert . + friend: [uid] @count @reverse . +` +op.RunInBackground = true +err = dg.Alter(context.Background(), op) +``` + + + +If no data has been stored for the predicates, a schema mutation sets up an empty schema ready to receive triples. + +If data is already stored before the mutation, existing values are not checked to conform to the new schema. + +On query, Dgraph tries to convert existing values to the new schema types, ignoring any that fail conversion. + +If data exists and new indices are specified in a schema mutation, any index not in the updated list is dropped and a new index is created for every new tokenizer specified. + + + + +## Indexes in Background + +Indexes may take long time to compute depending upon the size of the data. + +Indexes can be computed in the background and thus indexing may still be running after an Alter operation returns. + +To run index computation in the background set the flag `runInBackground` to `true` . + +```sh +curl localhost:8080/alter?runInBackground=true -XPOST -d $' + name: string @index(fulltext, term) . + age: int @index(int) @upsert . + friend: [uid] @count @reverse . +' | python -m json.tool | less +``` + +```go +op := &api.Operation{} +op.Schema = ` + name: string @index(fulltext, term) . + age: int @index(int) @upsert . + friend: [uid] @count @reverse . +` +op.RunInBackground = true +err = dg.Alter(context.Background(), op) +``` + +{{% notice "note" %}}If executed before the indexing finishes, queries that require the new indices will fail with an error +notifying that a given predicate is not indexed or doesn't have reverse edges.{{% /notice %}} + +You can check the background indexing status using the [Health](https://dgraph.io/docs/main/deploy/dgraph-alpha/#querying-health) query on the `/admin` endpoint. + + +{{% notice "note" %}}An alter operation will also fail if one is already in progress with an error +`schema is already being modified. Please retry`.{{% /notice %}} + +For example, let's say we execute an Alter operation with the following schema: + +Dgraph reports + +Dgraph will report the indexes in the schema only when the indexes are done computing. + +In a multi-node cluster, it is possible that the alphas will finish computing indexes at different times. Alphas may return different schema in such a case until all the indexes are done computing on all the Alphas. + + +