Skip to content
2 changes: 1 addition & 1 deletion docs/ad/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ In this case, a feature is the field in your index that you to check for anomali

For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature.

You can add a maximum of five features for a detector.
A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. We recommend adding fewer features to your detector for a higher accuracy. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting.
{: .note }

1. On the **Model configuration** page, enter the **Feature name**.
Expand Down
1 change: 1 addition & 0 deletions docs/im/ism/policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Field | Description | Type | Required | Read Only
:--- | :--- |:--- |:--- |
`policy_id` | The name of the policy. | `string` | Yes | Yes
`description` | A human-readable description of the policy. | `string` | Yes | No
`ism_template` | Specify an ISM template pattern that matches the index to apply the policy. | `nested list of objects` | No | No
`last_updated_time` | The time the policy was last updated. | `timestamp` | Yes | Yes
`error_notification` | The destination and message template for error notifications. The destination could be Amazon Chime, Slack, or a webhook URL. | `object` | No | No
`default_state` | The default starting state for each index that uses this policy. | `string` | Yes | No
Expand Down
37 changes: 24 additions & 13 deletions docs/knn/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,38 +7,43 @@ has_children: false
---

# API

The k-NN plugin adds two API operations in order to allow users to better manage the plugin's functionality.


## Stats

The k-NN `stats` API provides information about the current status of the k-NN Plugin. The plugin keeps track of both cluster level and node level stats. Cluster level stats have a single value for the entire cluster. Node level stats have a single value for each node in the cluster. You can filter their query by nodeID and statName in the following way:
```
GET /_opendistro/_knn/nodeId1,nodeId2/stats/statName1,statName2
```

Statistic | Description
:--- | :---
`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This is only relevant to approximate k-NN search.
`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This is only relevant to approximate k-NN search.
`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. *note:* explicit evictions that occur because of index deletion are not counted. This is only relevant to approximate k-NN search.
`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This is only relevant to approximate k-NN search.
`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This is only relevant to approximate k-NN search.
`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This is only relevant to approximate k-NN search.
`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This statistic is only relevant to approximate k-NN search.
`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This statistic is only relevant to approximate k-NN search.
`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. Note: Explicit evictions that occur because of index deletion are not counted. This statistic is only relevant to approximate k-NN search.
`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This statistic is only relevant to approximate k-NN search.
`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This statistic is only relevant to approximate k-NN search.
`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This statistic is only relevant to approximate k-NN search.
`graph_memory_usage_percentage` | The current weight of the cache as a percentage of the maximum cache capacity.
`graph_index_requests` | The number of requests to add the knn_vector field of a document into a graph.
`graph_index_errors` | The number of requests to add the knn_vector field of a document into a graph that have produced an error.
`graph_query_requests` | The number of graph queries that have been made.
`graph_query_errors` | The number of graph queries that have produced an error.
`knn_query_requests` | The number of KNN query requests received.
`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This is only relevant to approximate k-NN search.
`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This is only relevant to approximate k-NN search.
`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This is only relevant to approximate k-NN search.
`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This statistic is only relevant to approximate k-NN search.
`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This statistic is only relevant to approximate k-NN search.
`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This statistic is only relevant to approximate k-NN search.
`indices_in_cache` | For each index that has graphs in the cache, this stat provides the number of graphs that index has and the total graph_memory_usage that index is using in Kilobytes.
`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This is only relevant to k-NN score script search.
`script_compilation_errors` | The number of errors during script compilation. This is only relevant to k-NN score script search.
`script_query_requests` | The total number of script queries. This is only relevant to k-NN score script search.
`script_query_errors` | The number of errors during script queries. This is only relevant to k-NN score script search.
`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This statistic is only relevant to k-NN score script search.
`script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN score script search.
`script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN score script search.
`script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN score script search.


### Usage

```json
GET /_opendistro/_knn/stats?pretty
{
Expand Down Expand Up @@ -99,7 +104,9 @@ GET /_opendistro/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,gra
}
```


## Warmup operation

The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory.

If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort.
Expand All @@ -108,7 +115,9 @@ As an alternative, you can avoid this latency issue by running the k-NN plugin w

After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory.


### Usage

This request performs a warmup on three indices:

```json
Expand All @@ -132,7 +141,9 @@ GET /_tasks

After the operation has finished, use the [k-NN `_stats` API operation](#Stats) to see what the k-NN plugin loaded into the graph.


### Best practices

For the warmup operation to function properly, follow these best practices.

First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present.
Expand Down
1 change: 1 addition & 0 deletions docs/knn/performance-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ Recall depends on multiple factors like number of vectors, number of dimensions,
Recall can be configured by adjusting the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm params that control recall are m, ef_construction, ef_search. For more details on influence of algorithm parameters on the indexing and search recall, please refer to the [HNSW algorithm parameters document](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values could help recall (leading to better search results) but at the cost of higher memory utilization and increased indexing time. Our default values work on a broader set of use cases from our experiments, but we encourage users to run their own experiments on their data sets and choose the appropriate values. For index-level settings, please refer to the [settings page](../settings#index-settings). We will add details on our experiments here shortly.

## Estimating Memory Usage

Typically, in an Elasticsearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs to a portion of the remaining RAM. This portion's size is determined by the circuit_breaker_limit cluster setting. By default, the circuit breaker limit is set at 50%.

The memory required for graphs is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector.
Expand Down
8 changes: 7 additions & 1 deletion docs/security/access-control/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -402,12 +402,15 @@ DELETE _opendistro/_security/api/internalusers/<username>

Creates or replaces the specified user. You must specify either `password` (plain text) or `hash` (the hashed user password). If you specify `password`, the security plugin automatically hashes the password before storing it.

Note that any role you supply in the `opendistro_security_roles` array must already exist for the security plugin to map the user to that role. To see predefined roles, refer to [the list of predefined roles](../users-roles/#predefined-roles). For instructions on how to create a role, refer to [creating a role](./#create-role).

#### Request

```json
PUT _opendistro/_security/api/internalusers/<username>
{
"password": "kirkpass",
"opendistro_security_roles": ["maintenance_staff", "weapons"],
"backend_roles": ["captains", "starfleet"],
"attributes": {
"attribute1": "value1",
Expand All @@ -428,7 +431,7 @@ PUT _opendistro/_security/api/internalusers/<username>

### Patch user

Updates individual attributes of an internal user.
Updates individual attributes of an internal user.

#### Request

Expand All @@ -438,6 +441,9 @@ PATCH _opendistro/_security/api/internalusers/<username>
{
"op": "replace", "path": "/backend_roles", "value": ["klingons"]
},
{
"op": "replace", "path": "/opendistro_security_roles", "value": ["ship_manager"]
},
{
"op": "replace", "path": "/attributes", "value": { "newattribute": "newvalue" }
}
Expand Down
22 changes: 11 additions & 11 deletions docs/security/access-control/cross-cluster-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,13 +117,13 @@ networks:
After the clusters start, verify the names of each:

```json
curl -XGET -u 'admin:admin' -k https://localhost:9200
curl -XGET -u 'admin:admin' -k 'https://localhost:9200'
{
"cluster_name" : "odfe-cluster1",
...
}

curl -XGET -u 'admin:admin' -k https://localhost:9250
curl -XGET -u 'admin:admin' -k 'https://localhost:9250'
{
"cluster_name" : "odfe-cluster2",
...
Expand Down Expand Up @@ -151,7 +151,7 @@ docker inspect --format='{% raw %}{{range .NetworkSettings.Networks}}{{.IPAddres
On the coordinating cluster, add the remote cluster name and the IP address (with port 9300) for each "seed node." In this case, you only have one seed node:

```json
curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' https://localhost:9250/_cluster/settings -d '
curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9250/_cluster/settings' -d '
{
"persistent": {
"search.remote": {
Expand All @@ -166,13 +166,13 @@ curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' https://local
On the remote cluster, index a document:

```bash
curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' https://localhost:9200/books/_doc/1 -d '{"Dracula": "Bram Stoker"}'
curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/books/_doc/1' -d '{"Dracula": "Bram Stoker"}'
```

At this point, cross-cluster search works. You can test it using the `admin` user:

```bash
curl -XGET -k -u 'admin:admin' https://localhost:9250/odfe-cluster1:books/_search?pretty
curl -XGET -k -u 'admin:admin' 'https://localhost:9250/odfe-cluster1:books/_search?pretty'
{
...
"hits": [{
Expand All @@ -190,14 +190,14 @@ curl -XGET -k -u 'admin:admin' https://localhost:9250/odfe-cluster1:books/_searc
To continue testing, create a new user on both clusters:

```bash
curl -XPUT -k -u 'admin:admin' https://localhost:9200/_opendistro/_security/api/internalusers/booksuser -H 'Content-Type: application/json' -d '{"password":"password"}'
curl -XPUT -k -u 'admin:admin' https://localhost:9250/_opendistro/_security/api/internalusers/booksuser -H 'Content-Type: application/json' -d '{"password":"password"}'
curl -XPUT -k -u 'admin:admin' 'https://localhost:9200/_opendistro/_security/api/internalusers/booksuser' -H 'Content-Type: application/json' -d '{"password":"password"}'
curl -XPUT -k -u 'admin:admin' 'https://localhost:9250/_opendistro/_security/api/internalusers/booksuser' -H 'Content-Type: application/json' -d '{"password":"password"}'
```

Then run the same search as before with `booksuser`:

```json
curl -XGET -k -u booksuser:password https://localhost:9250/odfe-cluster1:books/_search?pretty
curl -XGET -k -u booksuser:password 'https://localhost:9250/odfe-cluster1:books/_search?pretty'
{
"error" : {
"root_cause" : [
Expand All @@ -216,8 +216,8 @@ curl -XGET -k -u booksuser:password https://localhost:9250/odfe-cluster1:books/_
Note the permissions error. On the remote cluster, create a role with the appropriate permissions, and map `booksuser` to that role:

```bash
curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' https://localhost:9200/_opendistro/_security/api/roles/booksrole -d '{"index_permissions":[{"index_patterns":["books"],"allowed_actions":["indices:admin/shards/search_shards","indices:data/read/search"]}]}'
curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' https://localhost:9200/_opendistro/_security/api/rolesmapping/booksrole -d '{"users" : ["booksuser"]}'
curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' 'https://localhost:9200/_opendistro/_security/api/roles/booksrole' -d '{"index_permissions":[{"index_patterns":["books"],"allowed_actions":["indices:admin/shards/search_shards","indices:data/read/search"]}]}'
curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' 'https://localhost:9200/_opendistro/_security/api/rolesmapping/booksrole' -d '{"users" : ["booksuser"]}'
```

Both clusters must have the user, but only the remote cluster needs the role and mapping; in this case, the coordinating cluster handles authentication (i.e. "Does this request include valid user credentials?"), and the remote cluster handles authorization (i.e. "Can this user access this data?").
Expand All @@ -226,7 +226,7 @@ Both clusters must have the user, but only the remote cluster needs the role and
Finally, repeat the search:

```bash
curl -XGET -k -u booksuser:password https://localhost:9250/odfe-cluster1:books/_search?pretty
curl -XGET -k -u booksuser:password 'https://localhost:9250/odfe-cluster1:books/_search?pretty'
{
...
"hits": [{
Expand Down
6 changes: 2 additions & 4 deletions docs/security/access-control/users-roles.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Unless you need to create new [read-only or hidden users](../api/#read-only-and-

## Create users

You can create users using Kibana, `internal_users.yml`, or the REST API.
You can create users using Kibana, `internal_users.yml`, or the REST API. When creating a user, you can map users to roles using `internal_users.yml` or the REST API, but that feature is not currently available in Kibana.

### Kibana

Expand All @@ -38,7 +38,6 @@ You can create users using Kibana, `internal_users.yml`, or the REST API.

1. Choose **Submit**.


### internal_users.yml

See [YAML files](../../configuration/yaml/#internal_usersyml).
Expand Down Expand Up @@ -77,11 +76,10 @@ See [Create role](../api/#create-role).

## Map users to roles

After creating roles, you map users (or backend roles) to them. Intuitively, people often think of this process as giving a user one or more roles, but in the security plugin, the process is reversed; you select a role and then map one or more users to it.
If you didn't specify roles when you created your user, you can map roles to it afterwards.

Just like users and roles, you create role mappings using Kibana, `roles_mapping.yml`, or the REST API.


### Kibana

1. Choose **Security**, **Roles**, and a role.
Expand Down
4 changes: 2 additions & 2 deletions docs/security/configuration/yaml.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ new-user:
reserved: false
hidden: false
opendistro_security_roles:
- "some-security-role"
- "specify-some-security-role-here"
backend_roles:
- "some-backend-role"
- "specify-some-backend-role-here"
attributes:
attribute1: "value1"
static: false
Expand Down