From 413f77e96998b478b26b31556d1c85b9599146f3 Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Tue, 16 Mar 2021 23:55:06 -0700 Subject: [PATCH 1/7] added ism_template to policy attributes --- docs/im/ism/policies.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/im/ism/policies.md b/docs/im/ism/policies.md index 3b70e1f7..f2bb36a7 100644 --- a/docs/im/ism/policies.md +++ b/docs/im/ism/policies.md @@ -26,6 +26,7 @@ Field | Description | Type | Required | Read Only :--- | :--- |:--- |:--- | `policy_id` | The name of the policy. | `string` | Yes | Yes `description` | A human-readable description of the policy. | `string` | Yes | No +`ism_template` | Specify an ISM template pattern that matches the index to apply the policy. | `nested list of objects` | No | No `last_updated_time` | The time the policy was last updated. | `timestamp` | Yes | Yes `error_notification` | The destination and message template for error notifications. The destination could be Amazon Chime, Slack, or a webhook URL. | `object` | No | No `default_state` | The default starting state for each index that uses this policy. | `string` | Yes | No From 8954688ff1cbcbe18ca4a22f3ae37f54ca502c5d Mon Sep 17 00:00:00 2001 From: aetter Date: Thu, 18 Mar 2021 09:26:43 -0700 Subject: [PATCH 2/7] Add some single quotes for safety --- .../access-control/cross-cluster-search.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/security/access-control/cross-cluster-search.md b/docs/security/access-control/cross-cluster-search.md index 54ed2a02..30c4155c 100644 --- a/docs/security/access-control/cross-cluster-search.md +++ b/docs/security/access-control/cross-cluster-search.md @@ -117,13 +117,13 @@ networks: After the clusters start, verify the names of each: ```json -curl -XGET -u 'admin:admin' -k https://localhost:9200 +curl -XGET -u 'admin:admin' -k 'https://localhost:9200' { "cluster_name" : "odfe-cluster1", ... } -curl -XGET -u 'admin:admin' -k https://localhost:9250 +curl -XGET -u 'admin:admin' -k 'https://localhost:9250' { "cluster_name" : "odfe-cluster2", ... @@ -151,7 +151,7 @@ docker inspect --format='{% raw %}{{range .NetworkSettings.Networks}}{{.IPAddres On the coordinating cluster, add the remote cluster name and the IP address (with port 9300) for each "seed node." In this case, you only have one seed node: ```json -curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' https://localhost:9250/_cluster/settings -d ' +curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9250/_cluster/settings' -d ' { "persistent": { "search.remote": { @@ -166,13 +166,13 @@ curl -k -XPUT -H 'Content-Type: application/json' -u 'admin:admin' https://local On the remote cluster, index a document: ```bash -curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' https://localhost:9200/books/_doc/1 -d '{"Dracula": "Bram Stoker"}' +curl -XPUT -k -H 'Content-Type: application/json' -u 'admin:admin' 'https://localhost:9200/books/_doc/1' -d '{"Dracula": "Bram Stoker"}' ``` At this point, cross-cluster search works. You can test it using the `admin` user: ```bash -curl -XGET -k -u 'admin:admin' https://localhost:9250/odfe-cluster1:books/_search?pretty +curl -XGET -k -u 'admin:admin' 'https://localhost:9250/odfe-cluster1:books/_search?pretty' { ... "hits": [{ @@ -190,14 +190,14 @@ curl -XGET -k -u 'admin:admin' https://localhost:9250/odfe-cluster1:books/_searc To continue testing, create a new user on both clusters: ```bash -curl -XPUT -k -u 'admin:admin' https://localhost:9200/_opendistro/_security/api/internalusers/booksuser -H 'Content-Type: application/json' -d '{"password":"password"}' -curl -XPUT -k -u 'admin:admin' https://localhost:9250/_opendistro/_security/api/internalusers/booksuser -H 'Content-Type: application/json' -d '{"password":"password"}' +curl -XPUT -k -u 'admin:admin' 'https://localhost:9200/_opendistro/_security/api/internalusers/booksuser' -H 'Content-Type: application/json' -d '{"password":"password"}' +curl -XPUT -k -u 'admin:admin' 'https://localhost:9250/_opendistro/_security/api/internalusers/booksuser' -H 'Content-Type: application/json' -d '{"password":"password"}' ``` Then run the same search as before with `booksuser`: ```json -curl -XGET -k -u booksuser:password https://localhost:9250/odfe-cluster1:books/_search?pretty +curl -XGET -k -u booksuser:password 'https://localhost:9250/odfe-cluster1:books/_search?pretty' { "error" : { "root_cause" : [ @@ -216,8 +216,8 @@ curl -XGET -k -u booksuser:password https://localhost:9250/odfe-cluster1:books/_ Note the permissions error. On the remote cluster, create a role with the appropriate permissions, and map `booksuser` to that role: ```bash -curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' https://localhost:9200/_opendistro/_security/api/roles/booksrole -d '{"index_permissions":[{"index_patterns":["books"],"allowed_actions":["indices:admin/shards/search_shards","indices:data/read/search"]}]}' -curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' https://localhost:9200/_opendistro/_security/api/rolesmapping/booksrole -d '{"users" : ["booksuser"]}' +curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' 'https://localhost:9200/_opendistro/_security/api/roles/booksrole' -d '{"index_permissions":[{"index_patterns":["books"],"allowed_actions":["indices:admin/shards/search_shards","indices:data/read/search"]}]}' +curl -XPUT -k -u 'admin:admin' -H 'Content-Type: application/json' 'https://localhost:9200/_opendistro/_security/api/rolesmapping/booksrole' -d '{"users" : ["booksuser"]}' ``` Both clusters must have the user, but only the remote cluster needs the role and mapping; in this case, the coordinating cluster handles authentication (i.e. "Does this request include valid user credentials?"), and the remote cluster handles authorization (i.e. "Can this user access this data?"). @@ -226,7 +226,7 @@ Both clusters must have the user, but only the remote cluster needs the role and Finally, repeat the search: ```bash -curl -XGET -k -u booksuser:password https://localhost:9250/odfe-cluster1:books/_search?pretty +curl -XGET -k -u booksuser:password 'https://localhost:9250/odfe-cluster1:books/_search?pretty' { ... "hits": [{ From 038ef3d72de2440e1b87e7fa2e98794f872ad79c Mon Sep 17 00:00:00 2001 From: keithhc2 Date: Fri, 19 Mar 2021 14:08:33 -0700 Subject: [PATCH 3/7] Updated to include instructions on how to map opendistro_security_roles --- docs/security/access-control/api.md | 8 +++++++- docs/security/access-control/users-roles.md | 6 ++---- docs/security/configuration/yaml.md | 4 ++-- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/docs/security/access-control/api.md b/docs/security/access-control/api.md index 13ebbfc9..5d62febc 100644 --- a/docs/security/access-control/api.md +++ b/docs/security/access-control/api.md @@ -402,12 +402,15 @@ DELETE _opendistro/_security/api/internalusers/ Creates or replaces the specified user. You must specify either `password` (plain text) or `hash` (the hashed user password). If you specify `password`, the security plugin automatically hashes the password before storing it. +Note that any role you supply in the `opendistro_security_roles` array must already exist for the security plugin to map the user to that role. To see predefined roles, refer to [the list of predefined roles](../users-roles/#predefined-roles). For instructions on how to create a role, refer to [creating a role](./#create-role). + #### Request ```json PUT _opendistro/_security/api/internalusers/ { "password": "kirkpass", + "opendistro_security_roles": ["maintenance_staff", "weapons"], "backend_roles": ["captains", "starfleet"], "attributes": { "attribute1": "value1", @@ -428,7 +431,7 @@ PUT _opendistro/_security/api/internalusers/ ### Patch user -Updates individual attributes of an internal user. +Updates individual attributes of an internal user. #### Request @@ -438,6 +441,9 @@ PATCH _opendistro/_security/api/internalusers/ { "op": "replace", "path": "/backend_roles", "value": ["klingons"] }, + { + "op": "replace", "path": "/opendistro_security_roles", "value": ["ship_manager"] + }, { "op": "replace", "path": "/attributes", "value": { "newattribute": "newvalue" } } diff --git a/docs/security/access-control/users-roles.md b/docs/security/access-control/users-roles.md index 2cc850f8..211b0a05 100644 --- a/docs/security/access-control/users-roles.md +++ b/docs/security/access-control/users-roles.md @@ -26,7 +26,7 @@ Unless you need to create new [read-only or hidden users](../api/#read-only-and- ## Create users -You can create users using Kibana, `internal_users.yml`, or the REST API. +You can create users using Kibana, `internal_users.yml`, or the REST API. When creating a user, you can map users to roles using `internal_users.yml` or the REST API, but that feature is not currently available in Kibana. ### Kibana @@ -38,7 +38,6 @@ You can create users using Kibana, `internal_users.yml`, or the REST API. 1. Choose **Submit**. - ### internal_users.yml See [YAML files](../../configuration/yaml/#internal_usersyml). @@ -77,11 +76,10 @@ See [Create role](../api/#create-role). ## Map users to roles -After creating roles, you map users (or backend roles) to them. Intuitively, people often think of this process as giving a user one or more roles, but in the security plugin, the process is reversed; you select a role and then map one or more users to it. +If you didn't specify a role while creating your user, you can map a role to it afterwards. Just like users and roles, you create role mappings using Kibana, `roles_mapping.yml`, or the REST API. - ### Kibana 1. Choose **Security**, **Roles**, and a role. diff --git a/docs/security/configuration/yaml.md b/docs/security/configuration/yaml.md index 4d0dd4e8..dbdc077d 100644 --- a/docs/security/configuration/yaml.md +++ b/docs/security/configuration/yaml.md @@ -34,9 +34,9 @@ new-user: reserved: false hidden: false opendistro_security_roles: - - "some-security-role" + - "specify-some-security-role-here" backend_roles: - - "some-backend-role" + - "specify-some-backend-role-here" attributes: attribute1: "value1" static: false From 07d435592e71cdb1397611eaafee4765e10fe010 Mon Sep 17 00:00:00 2001 From: aetter Date: Mon, 22 Mar 2021 15:07:09 -0700 Subject: [PATCH 4/7] Spacing and wording. --- docs/knn/api.md | 37 ++++++++++++++++++++++------------ docs/knn/performance-tuning.md | 1 + 2 files changed, 25 insertions(+), 13 deletions(-) diff --git a/docs/knn/api.md b/docs/knn/api.md index 3a4580c2..af9207da 100644 --- a/docs/knn/api.md +++ b/docs/knn/api.md @@ -7,9 +7,12 @@ has_children: false --- # API + The k-NN plugin adds two API operations in order to allow users to better manage the plugin's functionality. + ## Stats + The k-NN `stats` API provides information about the current status of the k-NN Plugin. The plugin keeps track of both cluster level and node level stats. Cluster level stats have a single value for the entire cluster. Node level stats have a single value for each node in the cluster. You can filter their query by nodeID and statName in the following way: ``` GET /_opendistro/_knn/nodeId1,nodeId2/stats/statName1,statName2 @@ -17,28 +20,30 @@ GET /_opendistro/_knn/nodeId1,nodeId2/stats/statName1,statName2 Statistic | Description :--- | :--- -`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This is only relevant to approximate k-NN search. -`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This is only relevant to approximate k-NN search. -`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. *note:* explicit evictions that occur because of index deletion are not counted. This is only relevant to approximate k-NN search. -`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This is only relevant to approximate k-NN search. -`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This is only relevant to approximate k-NN search. -`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This is only relevant to approximate k-NN search. +`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This statistic is only relevant to approximate k-NN search. +`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This statistic is only relevant to approximate k-NN search. +`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. Note: Explicit evictions that occur because of index deletion are not counted. This statistic is only relevant to approximate k-NN search. +`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This statistic is only relevant to approximate k-NN search. +`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This statistic is only relevant to approximate k-NN search. +`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This statistic is only relevant to approximate k-NN search. `graph_memory_usage_percentage` | The current weight of the cache as a percentage of the maximum cache capacity. `graph_index_requests` | The number of requests to add the knn_vector field of a document into a graph. `graph_index_errors` | The number of requests to add the knn_vector field of a document into a graph that have produced an error. `graph_query_requests` | The number of graph queries that have been made. `graph_query_errors` | The number of graph queries that have produced an error. `knn_query_requests` | The number of KNN query requests received. -`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This is only relevant to approximate k-NN search. -`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This is only relevant to approximate k-NN search. -`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This is only relevant to approximate k-NN search. +`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This statistic is only relevant to approximate k-NN search. +`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This statistic is only relevant to approximate k-NN search. +`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This statistic is only relevant to approximate k-NN search. `indices_in_cache` | For each index that has graphs in the cache, this stat provides the number of graphs that index has and the total graph_memory_usage that index is using in Kilobytes. -`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This is only relevant to k-NN score script search. -`script_compilation_errors` | The number of errors during script compilation. This is only relevant to k-NN score script search. -`script_query_requests` | The total number of script queries. This is only relevant to k-NN score script search. -`script_query_errors` | The number of errors during script queries. This is only relevant to k-NN score script search. +`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This statistic is only relevant to k-NN score script search. +`script_compilation_errors` | The number of errors during script compilation. This statistic is only relevant to k-NN score script search. +`script_query_requests` | The total number of script queries. This statistic is only relevant to k-NN score script search. +`script_query_errors` | The number of errors during script queries. This statistic is only relevant to k-NN score script search. + ### Usage + ```json GET /_opendistro/_knn/stats?pretty { @@ -99,7 +104,9 @@ GET /_opendistro/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,gra } ``` + ## Warmup operation + The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory. If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort. @@ -108,7 +115,9 @@ As an alternative, you can avoid this latency issue by running the k-NN plugin w After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory. + ### Usage + This request performs a warmup on three indices: ```json @@ -132,7 +141,9 @@ GET /_tasks After the operation has finished, use the [k-NN `_stats` API operation](#Stats) to see what the k-NN plugin loaded into the graph. + ### Best practices + For the warmup operation to function properly, follow these best practices. First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present. diff --git a/docs/knn/performance-tuning.md b/docs/knn/performance-tuning.md index 58218730..d6f9669d 100644 --- a/docs/knn/performance-tuning.md +++ b/docs/knn/performance-tuning.md @@ -84,6 +84,7 @@ Recall depends on multiple factors like number of vectors, number of dimensions, Recall can be configured by adjusting the algorithm parameters of the HNSW algorithm exposed through index settings. Algorithm params that control recall are m, ef_construction, ef_search. For more details on influence of algorithm parameters on the indexing and search recall, please refer to the [HNSW algorithm parameters document](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values could help recall (leading to better search results) but at the cost of higher memory utilization and increased indexing time. Our default values work on a broader set of use cases from our experiments, but we encourage users to run their own experiments on their data sets and choose the appropriate values. For index-level settings, please refer to the [settings page](../settings#index-settings). We will add details on our experiments here shortly. ## Estimating Memory Usage + Typically, in an Elasticsearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs to a portion of the remaining RAM. This portion's size is determined by the circuit_breaker_limit cluster setting. By default, the circuit breaker limit is set at 50%. The memory required for graphs is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector. From dc88e4cbc7aeda6d6fc2cfbd5c0776a12d0f5c99 Mon Sep 17 00:00:00 2001 From: keithhc2 Date: Mon, 22 Mar 2021 15:15:26 -0700 Subject: [PATCH 5/7] Addressed comment --- docs/security/access-control/users-roles.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/security/access-control/users-roles.md b/docs/security/access-control/users-roles.md index 211b0a05..3973214c 100644 --- a/docs/security/access-control/users-roles.md +++ b/docs/security/access-control/users-roles.md @@ -76,7 +76,7 @@ See [Create role](../api/#create-role). ## Map users to roles -If you didn't specify a role while creating your user, you can map a role to it afterwards. +If you didn't specify roles when you created your user, you can map roles to it afterwards. Just like users and roles, you create role mappings using Kibana, `roles_mapping.yml`, or the REST API. From 07d587a753c02b776e2c11fbe1ece94548ff8261 Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Tue, 23 Mar 2021 00:58:36 -0700 Subject: [PATCH 6/7] added an explanation on the impact of more features on a model --- docs/ad/index.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/ad/index.md b/docs/ad/index.md index 937a9d47..2494d9eb 100644 --- a/docs/ad/index.md +++ b/docs/ad/index.md @@ -52,7 +52,10 @@ In this case, a feature is the field in your index that you to check for anomali For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature. -You can add a maximum of five features for a detector. +A multi-feature model correlates anomalies across all its features. It's difficult for the model to identify anomalies for an individual feature. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. +A higher proportion of noise in your data might further amplify this negative impact. +We recommend adding fewer features to your detector for a higher accuracy. By default, the maximum number of features for a detector is 5. +You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting. {: .note } 1. On the **Model configuration** page, enter the **Feature name**. From 1771677e7f817c63ea42e566ab363263d529fc65 Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Wed, 24 Mar 2021 00:15:38 -0700 Subject: [PATCH 7/7] incorporated feedback --- docs/ad/index.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/docs/ad/index.md b/docs/ad/index.md index 2494d9eb..f483a39d 100644 --- a/docs/ad/index.md +++ b/docs/ad/index.md @@ -52,10 +52,7 @@ In this case, a feature is the field in your index that you to check for anomali For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature. -A multi-feature model correlates anomalies across all its features. It's difficult for the model to identify anomalies for an individual feature. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. -A higher proportion of noise in your data might further amplify this negative impact. -We recommend adding fewer features to your detector for a higher accuracy. By default, the maximum number of features for a detector is 5. -You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting. +A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. We recommend adding fewer features to your detector for a higher accuracy. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `opendistro.anomaly_detection.max_anomaly_features` setting. {: .note } 1. On the **Model configuration** page, enter the **Feature name**.