diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 12bc66c1..a35148a3 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -1,4 +1,4 @@ # CODEOWNERS info: https://help.github.com/en/articles/about-code-owners # Owners are automatically requested for review for PRs that changes code # that they own. -* @rderbier @MichelDiz @damonfeldman @rarvikar @Rajakavitha1 +* @dgraph-io/committers @rderbier diff --git a/README.md b/README.md index b4e09fd7..4131a2c6 100644 --- a/README.md +++ b/README.md @@ -106,5 +106,6 @@ Pass custom Go-GRPC example to the runnable by passing a `customExampleGoGRPC` t **Note:** Runnable doesn't support passing a multiline string as an argument to a shortcode. Therefore, you have to create the whole custom example in a single line string by replacing newlines with `\n`. ## History +v24.0: +======= add Hypermode banner by updating the hugo-docs repository with topbat template. -v24.0: \ No newline at end of file diff --git a/content/dql/dql-schema.md b/content/dql/dql-schema.md index c55ca531..c52c2bfe 100644 --- a/content/dql/dql-schema.md +++ b/content/dql/dql-schema.md @@ -16,6 +16,9 @@ revenue: float . running_time: int . starring: [uid] . director: [uid] . +description: string . + +description_vector: float32vector @index(hnsw(metric:"cosine")) . type Person { name @@ -28,6 +31,8 @@ type Film { running_time starring director + description + description_vector } ``` @@ -112,6 +117,15 @@ For all triples with a predicate of scalar types the object is a literal. are RFC 3339 compatible which is different from ISO 8601(as defined in the RDF spec). You should convert your values to RFC 3339 format before sending them to Dgraph.{{% /notice %}} +### Vector Type + +The `float32vector` type denotes a vector of floating point numbers, i.e an ordered array of float32. A node type can contain more than one vector predicate. + +Vectors are normaly used to store embeddings obtained from other information through an ML model. When a `float32vector` is [indexed]({{}}), the DQL [similar_to]({{}}) function can be used for similarity search. + + + + ### UID Type The `uid` type denotes a relationship; internally each node is identified by it's UID which is a `uint64`. diff --git a/content/dql/predicate-indexing.md b/content/dql/predicate-indexing.md index 2f7669c4..17daa541 100644 --- a/content/dql/predicate-indexing.md +++ b/content/dql/predicate-indexing.md @@ -9,6 +9,15 @@ weight = 4 Filtering on a predicate by applying a [function]({{< relref "query-language/functions.md" >}}) requires an index. +Indices are defined in the [Dgraph types schema]({{}}) using `@index` directive. + +Here are some examples: +``` +name: string @index(term) . +release_date: datetime @index(year) . +description_vector: float32vector @index(hnsw(metric:"cosine")) . +``` + When filtering by applying a function, Dgraph uses the index to make the search through a potentially large dataset efficient. All scalar types can be indexed. @@ -17,6 +26,8 @@ Types `int`, `float`, `bool` and `geo` have only a default index each: with toke Types `string` and `dateTime` have a number of indices. +Type `float32vector` supports `hsnw` index. + ## String Indices The indices available for strings are as follows. @@ -34,6 +45,30 @@ transaction conflict rate. Use only the minimum number of and simplest indexes that your application needs. {{% /notice %}} +## Vector Indices + +The indices available for `float32vector` are as follows. + +| Dgraph function | Required index / tokenizer | Notes | +| :----------------------- | :------------ | :--- | +| `similar_to` | `hsnw` | HSNW index supports parameters `metric` and `exponent`. | + + +# + +`hsnw` (**Hierarchical Navigable Small World**) index supports the following parameters +- metric : indicate the metric to use to compute vector similarity. One of `cosine`, `euclidean`, and `dotproduct`. Default is `euclidean`. + +- exponent : An integer, represented as a string, roughly representing the number of vectors expected in the index in power of 10. The exponent value,is used to set "reasonable defaults" for HSNW internal tuning parameters. Default is "4" (10^4 vectors). + + +Here are some examples: +``` +simple_vector: float32vector @index(hnsw) . +description_vector: float32vector @index(hnsw(metric:"cosine")) . +large_vector: float32vector @index(hnsw(metric:"euclidean",exponent:"6")) . +``` + ## DateTime Indices The indices available for `dateTime` are as follows. diff --git a/content/query-language/functions.md b/content/query-language/functions.md index f05f0f94..b410f922 100644 --- a/content/query-language/functions.md +++ b/content/query-language/functions.md @@ -177,6 +177,21 @@ Same query with a Levenshtein distance of 3. } {{< /runnable >}} +## Vector Similarity Search + +Syntax Examples: `similar_to(predicate, 3, "[0.9, 0.8, 0, 0]")` + +Alternatively the vector can be passed as a variable: `similar_to(predicate, 3, $vec)` + +This function finds the nodes that have `predicate` close to the provided vector. The search is based on the distance metric specified in the index (`cosine`, `euclidean`, or `dotproduct`). The shorter distance indicates more similarity. +The second parameter, `3` specifies that top 3 matches be returned. + +Schema Types: `float32vector` + +Index Required: `hnsw` + + + ## Full-Text Search Syntax Examples: `alloftext(predicate, "space-separated text")` and `anyoftext(predicate, "space-separated text")`