Add challange for long-running benchmarks

Add a new challenge `elasticlogs-continuous-index-and-query` suitable for long-running benchmarks. This commit also includes: * an updated version of the `deleteindex_runner.py` to help keep rolled-over indices to a defined size by deleting older ones. * updates to `README.md`. * a test/helper script tests/validate_challanges.py to assist with the JSON validation of challenges that contain embedded j2 DSL. Relates elastic#18
henningandersen · Jul 27, 2018 · 1a549f1 · 1a549f1
1 parent 78d4baa
commit 1a549f1
Show file tree

Hide file tree

Showing 8 changed files with 410 additions and 85 deletions.
diff --git a/README.md b/README.md
@@ -49,6 +49,55 @@ This challenge assumes that the *elasticlogs-1bn-load* track has been executed a
 
 In this challenge rate-limited indexing at varying levels is combined with a fixed level of querying. If metrics from the run are stored in Elasticsearch, it is possible analyse these in Kibana in order to identify how indexing rate affects query latency and vice versa.
 
+### 7) elasticlogs-continuous-index-and-query
+
+This challenge is suitable for long term execution and runs in two phases. Both phases (`p1`, `p2`) index documents containing auto-generated event, however, `p1` indexes events at the max possible speed, whereas `p2` throttles indexing to a specified rate and in parallel executes four queries simulating Kibana dashboards and queries. The created index gets rolled over after the configured max size and the maximum amount of rolled over indices are also configurable.
+
+The table below shows the track parameters that can be adjusted along with default values:
+
+| Parameter | Explanation | Type | Default Value |
+| --------- | ----------- | ---- | ------------- |
+| `number_of_replicas` | Number of index replicas | `int` | `0` |
+| `shard_count` | Number of primary shards | `int` | `2` |
+| `p1_bulk_indexing_clients` | Number of [clients](https://esrally.readthedocs.io/en/stable/track.html?highlight=number%20of%20clients#schedule) used to index during phase 1 | `int` | `40` |
+| `p1_bulk_size` | The [build-size](https://esrally.readthedocs.io/en/stable/track.html?highlight=number%20of%20clients#bulk) for the autogenerated events during phase 1 | `int` | `1000` |
+| `p1_duration_secs` | Duration of phase 1 execution in sec | `int` | `7200` |
+| `p2_bulk_indexing_clients` | Number of [clients](https://esrally.readthedocs.io/en/stable/track.html?highlight=number%20of%20clients#schedule) used to index during phase 2 | `int` | `16` |
+| `p2_bulk_size` | The [build-size](https://esrally.readthedocs.io/en/stable/track.html?highlight=number%20of%20clients#bulk) for the autogenerated events during phase 2 | `int` | `1000` |
+| `p2_ops` | Number of bulk indexing ops/s for phase 2. A value of `10` with `p2_bulk_size=10` throttles indexing to 10000 docs/s | `int` | `10` |
+| `index_alias` | Specifies default index alias. | `str` | `elasticlogs_q_write` |
+| `rollover_max_size` | Max index size condition for [rollover API](https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html#indices-rollover-index) | `str` | `30gb` |
+| `rollover_max_age` | Max age condition for [rollover API](https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html#indices-rollover-index) | `str` | `1d` |
+| `p2_query1_target_interval` | Frequency of execution (every N sec) of Kibana query: `kibana-traffic-country-dashboard_60m` | `int` | `30` |
+| `p2_query2_target_interval` | Frequency of execution (every N sec) of Kibana query: `kibana-discover_30m` | `int` | `30` |
+| `p2_query3_target_interval` | Frequency of execution (every N sec) of Kibana query: `kibana-traffic-dashboard_30m` | `int` | `30` |
+| `p2_query4_target_interval` | Frequency of execution (every N sec) of Kibana query: `kibana-content_issues-dashboard_30m"` | `int` | `30` |
+| `max_rolledover_indices` | Max amount of recently rolled over indices to retain | `int` | `20` |
+| `indices_delete_pattern` | pattern to use for matching and deleting old rolled over indices. See also suffix_separator. | `str` | `elasticlogs_q-*` |
+| `rolledover_indices_suffix_separator` | Separator for extracting suffix to help determining which rolled-over indices to delete  | `str` | `-` |
+
+The indices use the alias `elasticlogs_q_write` and start with `elasticlogs_q-000001`. As an example, for a cluster with rolled over indices:  `elasticlogs-000001`, `elasticlogs-000002`, ... `000010` a value of `max_rolledover_indices=8` results to the removal of `elasticlogs-000001` and `elasticlogs-000002`.
+
+A value of `max_rolledover_indices=20` on a three node bare-metal cluster with [these specifications](https://elasticsearch-benchmarks-internal.elastic.co/app/kibana#/visualize/edit/02c3be00-8a66-11e8-8558-f33069e7a81e?_g=()&_a=(filters:!(),linked:!f,query:(language:lucene,query:(query_string:(analyze_wildcard:!t,query:'*'))),uiState:(),vis:(aggs:!(),params:(fontSize:12,markdown:'%23%23%23%20Benchmarking%20Methodology%0A%0AAll%20benchmarks%20are%20run%20by%20Rally%20against%20the%20Elasticsearch%20latest%20snapshot%20as%20of%20the%20start%20date.%20Each%20benchmark%20runs%20for%2030%20days.%0A%0AThe%20benchmark%20uses%20four%20machines.%20On%20one%20we%20run%20the%20benchmark%20driver%20(Rally),%20on%20the%20other%20three%20the%20benchmark%20candidates.%0A%0AThe%20Elasticsearch%20node%20uses%20default%20settings%20except%20for:%0A%0AAdapted%20JVM%20settings:%0A%0A*%20Heap%20is%20increased%20to%208GB%20(%60-Xms8G%20-Xmx8G%60)%0A*%20Assertions%20are%20enabled%20(%60-ea%60)%0A*%20GC%20log%20is%20enabled%20(rolling)%0A%0AAdapted%20Elasticsearch%20settings:%0A%0A*%20%60network.host:%200.0.0.0%60%0A*%20%60bootstrap.memory_lock:%20true%60%0A%0AWe%20also%20run%20this%20node%20with%20the%20following%20plugins:%0A*%20x-pack%20(authentication%20backed%20by%20a%20file%20store%20%2B%20SSL%20enabled%20with%20self-signed%20certificates)%0A*%20ingest-geoip%0A%0AAll%20benchmarks%20are%20run%20on%20a%20bare%20metal%20machine%20with%20the%20following%20specifications:%0A%0A*%20CPU:%20Intel(R)%20Core(TM)%20i7-6700%20CPU%20@%203.40GHz%0A*%20RAM:%2032%20GB%0A*%20SSD:%20Crucial%20MX200%0A*%20OS:%20Linux%20Kernel%20version%204.13.0-38%0A*%20OS%20tuning:%0A%20%20*%20Turbo%20boost%20disabled%20(%60%2Fsys%2Fdevices%2Fsystem%2Fcpu%2Fintel_pstate%2Fno_turbo%60)%0A%20%20*%20THP%20at%20default%20%60madvise%60%20(%60%2Fsys%2Fkernel%2Fmm%2Ftransparent_hugepage%2F%7Bdefrag,enabled%7D%60)%0A*%20JVM:%20Oracle%20JDK%201.8.0_131%0A%0A%23%23%23%20Benchmark%0A%0AThese%20benchmarks%20run%20the%20%5Bcontinuous%20index%20and%20query%20challenge%5D(https:%2F%2Fgithub.com%2Fdliappis%2Frally-eventdata-track%2Fblob%2Flongrun-benchmarks%2Feventdata%2Fchallenges%2Felasticlogs-continuous-index-and-query.json)%20from%20the%20%5Brally-eventdata-track%5D(https:%2F%2Fgithub.com%2Fdliappis%2Frally-eventdata-track%2Ftree%2Flongrun-benchmarks)%20with%20the%20following%20parameters:%0A%0A%60%60%60%0A%7B%0A%20%20%22number_of_replicas%22:%201,%0A%20%20%22shard_count%22:%203,%0A%20%20%22p1_bulk_indexing_clients%22:%2032,%0A%20%20%22p1_bulk_size%22:%201000,%0A%20%20%22p1_duration_secs%22:%2028800,%0A%20%20%22p2_bulk_indexing_clients%22:%2012,%0A%20%20%22p2_bulk_size%22:%201000,%0A%20%20%22p2_ops%22:%2030,%0A%20%20%22max_rolledover_indices%22:%2020,%0A%20%20%22rollover_max_size%22:%20%2230gb%22%0A%7D%0A%60%60%60',type:markdown),title:'Benchmarking%20Methodology%20v2',type:markdown))) ends up consuming a constant of `407GiB` per node.
+
+It is recommended to store any track parameters in a json file and pass them to Rally using `--track-params=./params-file.json`. Example content:
+
+``` shell
+$ cat params-file.json
+{
+  "number_of_replicas": 1,
+  "shard_count": 3,
+  "p1_bulk_indexing_clients": 32,
+  "p1_bulk_size": 1000,
+  "p1_duration_secs": 28800,
+  "p2_bulk_indexing_clients": 12,
+  "p2_bulk_size": 1000,
+  "p2_ops": 30,
+  "max_rolledover_indices": 20,
+  "rollover_max_size": "30gb"
+}
+```
+
 ## Custom parameter sources
 
 ### elasticlogs\_bulk\_source
@@ -59,59 +108,59 @@ The generator allows data to be generated in real-time or against a set date/tin
 
 ```
 {
-	"@timestamp": "2017-06-01T00:01:08.866644Z",
-	"offset": 7631775,
-	"user_name": "-",
-	"source": "/usr/local/var/log/nginx/access.log",
-	"fileset": {
-		"module": "nginx",
-		"name": "access"
-	},
-	"input": {
-		"type": "log"
-	},
-	"beat": {
-		"version": "6.3.0",
-		"hostname": "web-EU-1.elastic.co",
-		"name": "web-EU-1.elastic.co"
-	},
-	"prospector": {
-		"type": "log"
-	},
-	"nginx": {
-		"access": {
-			"user_agent": {
-				"major": "44",
-				"os": "Mac OS X",
-				"os_major": "10",
-				"name": "Firefox",
-				"os_name": "Mac OS X",
-				"device": "Other"
-			},
-			"remote_ip": "5.134.208.0",
-			"remote_ip_list": [
-				"5.134.208.0"
-			],
-			"geoip": {
-				"continent_name": "Europe",
-				"city_name": "Grupa",
-				"country_name": "Poland",
-				"country_iso_code": "PL",
-				"location": {
-					"lat": 53.5076,
-					"lon": 18.6358
-				}
-			},
-			"referrer": "https://www.elastic.co/guide/en/marvel/current/getting-started.html",
-			"url": "/guide/en/kibana/current/images/autorefresh-pause.png",
-			"body_sent": {
-				"bytes": 2122
-			},
-			"method": "GET",
-			"response_code": "200",
-			"http_version": "1.1"
-		}
-	}
+  "@timestamp": "2017-06-01T00:01:08.866644Z",
+  "offset": 7631775,
+  "user_name": "-",
+  "source": "/usr/local/var/log/nginx/access.log",
+  "fileset": {
+    "module": "nginx",
+    "name": "access"
+  },
+  "input": {
+    "type": "log"
+  },
+  "beat": {
+    "version": "6.3.0",
+    "hostname": "web-EU-1.elastic.co",
+    "name": "web-EU-1.elastic.co"
+  },
+  "prospector": {
+    "type": "log"
+  },
+  "nginx": {
+    "access": {
+      "user_agent": {
+        "major": "44",
+        "os": "Mac OS X",
+        "os_major": "10",
+        "name": "Firefox",
+        "os_name": "Mac OS X",
+        "device": "Other"
+      },
+      "remote_ip": "5.134.208.0",
+      "remote_ip_list": [
+        "5.134.208.0"
+      ],
+      "geoip": {
+        "continent_name": "Europe",
+        "city_name": "Grupa",
+        "country_name": "Poland",
+        "country_iso_code": "PL",
+        "location": {
+          "lat": 53.5076,
+          "lon": 18.6358
+        }
+      },
+      "referrer": "https://www.elastic.co/guide/en/marvel/current/getting-started.html",
+      "url": "/guide/en/kibana/current/images/autorefresh-pause.png",
+      "body_sent": {
+        "bytes": 2122
+      },
+      "method": "GET",
+      "response_code": "200",
+      "http_version": "1.1"
+    }
+  }
 }
 ```
 
@@ -155,7 +204,7 @@ As you can see, branches can match exact release numbers but Rally is also lenie
 
 Apart from that, the master branch is always considered to be compatible with the Elasticsearch master branch.
 
-To specify the version to check against, add `--distribution-version` when running Rally. It it is not specified, Rally assumes that you want to benchmark against the Elasticsearch master version. 
+To specify the version to check against, add `--distribution-version` when running Rally. It it is not specified, Rally assumes that you want to benchmark against the Elasticsearch master version.
 
 Example: If you want to benchmark Elasticsearch 6.2.4, run the following command:
 
@@ -167,12 +216,12 @@ How to Contribute
 -----------------
 
 If you want to contribute to this track, please ensure that it works against the master version of Elasticsearch (i.e. submit PRs against the master branch). We can then check whether it's feasible to backport the track to earlier Elasticsearch versions.
- 
+
 See all details in the [contributor guidelines](https://github.com/elastic/rally/blob/master/CONTRIBUTING.md).
 
 License
 -------
- 
+
 This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.
 
 Copyright 2015-2018 Elasticsearch <https://www.elastic.co>

diff --git a/eventdata/challenges/combined-indexing-and-querying.json b/eventdata/challenges/combined-indexing-and-querying.json
@@ -10,7 +10,7 @@
   "name": "combined-indexing-and-querying",
   "description": "This challenge simulates a set of Kibana queries against historical data (elasticlogs_q-* indices) as well as against the most recent data currently being indexed. It combined this with rate-limited indexing at varying levels. It assumes one of the challenges creating elasticlogs_q-* indices has been run.",
   "meta": {
-    "benchmark_type": "indexing/querying",  
+    "benchmark_type": "indexing/querying",
     "target_kibana_queries_per_minute": 7
   },
   "schedule": [
@@ -25,7 +25,7 @@
     },
     {
       "operation": "relative-kibana-content_issues-dashboard_50%",
-      "target-interval": 60,  
+      "target-interval": 60,
       "warmup-time-period": 0,
       "clients": 1,
       "time-period": {{ p_rate_limit_duration_secs }},
@@ -40,14 +40,14 @@
       "warmup-iterations": 0,
       "iterations": 1
     },
-    {# Add some data to index so it does not start empty #}   
+    {# Add some data to index so it does not start empty #}
     {
       "operation": "index-append-1000-elasticlogs_i_write",
       "time-period": {{ p_rate_limit_duration_secs }},
       "target-throughput": 10,
       "clients": {{ p_client_count }}
     },
-    
+
     {% for ops in range(p_rate_limit_step, p_rate_limit_max, p_rate_limit_step) %}
 
 
@@ -94,8 +94,8 @@
             },
             "schedule": "poisson"
           },
-          { 
-            "name": "current-kibana-content_issues-dashboard_30m-{{rate}}",     
+          {
+            "name": "current-kibana-content_issues-dashboard_30m-{{rate}}",
             "operation": "current-kibana-content_issues-dashboard_30m",
             "target-interval": 60,
             "clients": 2,
@@ -105,7 +105,7 @@
             },
             "schedule": "poisson"
           },
-          { 
+          {
             "name": "current-kibana-traffic-dashboard_15m-{{rate}}",
             "operation": "current-kibana-traffic-dashboard_15m",
             "target-interval": 30,

diff --git a/eventdata/challenges/elasticlogs-continuous-index-and-query.json b/eventdata/challenges/elasticlogs-continuous-index-and-query.json
@@ -0,0 +1,143 @@
+{% set p1_bulk_indexing_clients = (p1_bulk_indexing_clients | default(40)) %}
+{% set p2_bulk_indexing_clients = (p2_bulk_indexing_clients | default(16)) %}
+{# Phase 1 is indexing only at max speed for 2 hours #}
+{% set p1_duration = (p1_duration_secs | default(7200)) %}
+{# Phase 2 is indexing and querying for 29 days #}
+{% set p2_duration = (p2_duration_secs | default(2505600)) %}
+{% set p2_ops = (p2_ops | default(10)) %}
+{% set p2_rate = (p2_ops * (p2_bulk_size | default(1000))) %}
+{
+  "name": "elasticlogs-continuous-index-and-query",
+  "description": "Indexes 1bn (default) documents into elasticlogs_q-* indices. IDs are autogenerated by Elasticsearch, meaning there are no conflicts.",
+  "meta": {
+    "benchmark_type": "indexing"
+  },
+  "schedule": [
+    {
+      "operation": "deleteindex_elasticlogs_q-*",
+      "clients": 1,
+      "warmup-iterations": 0,
+      "iterations": 1
+    },
+    {
+      "operation": "create_elasticlogs_q_write",
+      "clients": 1,
+      "warmup-iterations": 0,
+      "iterations": 1
+    },
+    {
+      "parallel": {
+        "time-period": {{ p1_duration }},
+        "warmup-time-period": 0,
+        "tasks": [
+          {
+            "name": "index-append-elasticlogs_q_write-phase1",
+            "operation": {
+              "operation-type": "bulk",
+              "index": "elasticlogs_q_write",
+              "param-source": "elasticlogs_bulk",
+              "bulk-size": {{ p1_bulk_size | default(1000) | int }}
+            },
+            "clients": {{ p1_bulk_indexing_clients }},
+            "meta": {
+              "querying": "no"
+            }
+          },
+          {
+            "#COMMENT": "Check if index alias needs to be rolled over every 30seconds.",
+            "name": "rollover-indices-phase1",
+            "operation": "rollover_custom_alias",
+            "clients": 1,
+            "target-interval": 30
+          },
+          {
+            "#COMMENT": "Delete indices that have been rolled over more than (by default) 20 times",
+            "name": "delete-rolledover-indices-phase1",
+            "operation": "delete_rolledover_index_pattern",
+            "clients": 1,
+            "target-interval": 30
+          }
+        ]
+      }
+    },
+    {
+      "parallel": {
+        "time-period": {{ p2_duration }},
+        "warmup-time-period": 0,
+        "tasks": [
+          {
+            "name": "index-append-elasticlogs_q_write-phase2",
+            "operation": {
+              "operation-type": "bulk",
+              "index": "elasticlogs_q_write",
+              "param-source": "elasticlogs_bulk",
+              "bulk-size": {{ p2_bulk_size | default(1000) | int }}
+            },
+            "target-throughput": {{ p2_ops }},
+            "clients": {{ p2_bulk_indexing_clients }},
+            "meta": {
+              "target_indexing_rate": {{ p2_rate }}
+            }
+          },
+          {
+            "name": "rollover-indices-phase2",
+            "operation": "rollover_custom_alias",
+            "clients": 1,
+            "target-interval": 30
+          },
+          {
+            "name": "delete_rolled_over_indices-phase2",
+            "operation": "delete_rolledover_index_pattern",
+            "clients": 1,
+            "target-interval": 30
+          },
+          {
+            "name": "current-kibana-traffic-country-dashboard_60m-querying",
+            "operation": "current-kibana-traffic-country-dashboard_60m",
+            "clients": 1,
+            "target-interval": {{ p2_query1_target_interval | default(30) | int }},
+            "meta": {
+              "querying": "yes",
+              "query_type": "current"
+            },
+            "schedule": "poisson"
+          },
+          {
+            "name": "current-kibana-discover_30m-querying",
+            "operation": "current-kibana-discover_30m",
+            "clients": 1,
+            "target-interval": {{ p2_query2_target_interval | default(30) | int }},
+            "meta": {
+              "querying": "yes",
+              "query_type": "current"
+            },
+            "schedule": "poisson"
+          },
+          {
+            "name": "current-kibana-traffic-dashboard_30m-querying",
+            "operation": "current-kibana-traffic-dashboard_30m",
+            "clients": 1,
+            "target-interval": {{ p2_query3_target_interval | default(30) | int }},
+            "meta": {
+              "querying": "yes",
+              "query_type": "current"
+            },
+            "schedule": "poisson"
+          },
+          {
+            "name": "current-kibana-content_issues-dashboard_30m-querying",
+            "#COMMENT": "Looks only for 404s about 1-1.5% of data",
+            "operation": "current-kibana-content_issues-dashboard_30m",
+            "clients": 1,
+            "target-interval": {{ p2_query4_target_interval | default(30) | int }},
+            "meta": {
+              "querying": "yes",
+              "query_type": "current"
+            },
+            "schedule": "poisson"
+          }
+        ]
+      }
+    }
+  ]
+}