
Loading…
Various cleanups for Elasticsearch test #51
|
|
dakrone |
Add logging configuration
…
This increases the default logging to DEBUG, and sets TRACE logging for gateway and discovery packages. |
aa66375
|
|
|
|
dakrone |
Move `nuke!` to before the test starts
…
This allows someone to collect the ES logs and data *after* a test run. Otherwise the logs and data is removed and ES is stopped, making further debugging impossible. |
efc15e4
|
|
|
|
dakrone |
Wait for index to become green after creation
…
After an index is created, clients should always wait for the index to be fully created (the request returns immediately) before starting the test. |
22a50b2
|
|
|
|
dakrone |
Use more reasonable settings for scroll
…
No need for the `query_then_fetch` setting, use ten seconds instead of one minute, and a more reasonable size of 20 rather than 2. |
b1cb921
|
|
I'd really prefer to know about connection errors etc that happen here; the only reason it's appropriate to noop is if the index already exists.
dakrone
added a note
Sure, I will remove this to only handle (try+
...
(catch [:status 400]
;; ignore
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
|
Yes, that's much more sensible, thank you. :)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
|
Same here; I want to be conservative about which errors I'll ignore, haha. :)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
|
Should "5" be a string instead of a number? Either way, we should use
dakrone
added a note
It can be either a string or a number, I will change this to be
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
|
|
dakrone |
Catch only the `IndexAlreadyExistsException`, use dynamic node count
…
Instead of catching any throwable during index creation, catch a specific exception and ensure it was only because the index already existed. Additionally, this changes the hardcoded node count of 5 to `(count (:nodes test))` so a dynamic number of nodes can be used. |
4f3a27f
|
Pushed another commit addressing your feedback, thanks for taking a look!
Do these tests run for you? Elasticsearch doesn't even start on my nodes any more; times out waiting for cluster recovery.
-
dakrone authored
This increases the default logging to DEBUG, and sets TRACE logging for gateway and discovery packages.
-
Move `nuke!` to before the test starts …
dakrone authoredThis allows someone to collect the ES logs and data *after* a test run. Otherwise the logs and data is removed and ES is stopped, making further debugging impossible.
-
Wait for index to become green after creation …
dakrone authoredAfter an index is created, clients should always wait for the index to be fully created (the request returns immediately) before starting the test.
-
Use more reasonable settings for scroll …
dakrone authoredNo need for the `query_then_fetch` setting, use ten seconds instead of one minute, and a more reasonable size of 20 rather than 2.
-
Catch only the `IndexAlreadyExistsException`, use dynamic node count …
dakrone authoredInstead of catching any throwable during index creation, catch a specific exception and ensure it was only because the index already existed. Additionally, this changes the hardcoded node count of 5 to `(count (:nodes test))` so a dynamic number of nodes can be used.
| @@ -0,0 +1,67 @@ | ||
| +# you can override this using by setting a system property, for example -Des.logger.level=DEBUG | ||
| +es.logger.level: DEBUG | ||
| +rootLogger: ${es.logger.level}, console, file | ||
| +logger: | ||
| + # log action execution errors for easier debugging | ||
| + action: DEBUG | ||
| + # reduce the logging for aws, too much is logged under the default INFO | ||
| + com.amazonaws: WARN | ||
| + | ||
| + # gateway | ||
| + gateway: TRACE | ||
| + index.gateway: TRACE | ||
| + | ||
| + # peer shard recovery | ||
| + #indices.recovery: DEBUG | ||
| + | ||
| + # discovery | ||
| + discovery: TRACE | ||
| + | ||
| + index.search.slowlog: TRACE, index_search_slow_log_file | ||
| + index.indexing.slowlog: TRACE, index_indexing_slow_log_file | ||
| + | ||
| +additivity: | ||
| + index.search.slowlog: false | ||
| + index.indexing.slowlog: false | ||
| + | ||
| +appender: | ||
| + console: | ||
| + type: console | ||
| + layout: | ||
| + type: consolePattern | ||
| + conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" | ||
| + | ||
| + file: | ||
| + type: dailyRollingFile | ||
| + file: ${path.logs}/${cluster.name}.log | ||
| + datePattern: "'.'yyyy-MM-dd" | ||
| + layout: | ||
| + type: pattern | ||
| + conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" | ||
| + | ||
| + # Use the following log4j-extras RollingFileAppender to enable gzip compression of log files. | ||
| + # For more information see https://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/RollingFileAppender.html | ||
| + #file: | ||
| + #type: extrasRollingFile | ||
| + #file: ${path.logs}/${cluster.name}.log | ||
| + #rollingPolicy: timeBased | ||
| + #rollingPolicy.FileNamePattern: ${path.logs}/${cluster.name}.log.%d{yyyy-MM-dd}.gz | ||
| + #layout: | ||
| + #type: pattern | ||
| + #conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" | ||
| + | ||
| + index_search_slow_log_file: | ||
| + type: dailyRollingFile | ||
| + file: ${path.logs}/${cluster.name}_index_search_slowlog.log | ||
| + datePattern: "'.'yyyy-MM-dd" | ||
| + layout: | ||
| + type: pattern | ||
| + conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" | ||
| + | ||
| + index_indexing_slow_log_file: | ||
| + type: dailyRollingFile | ||
| + file: ${path.logs}/${cluster.name}_index_indexing_slowlog.log | ||
| + datePattern: "'.'yyyy-MM-dd" | ||
| + layout: | ||
| + type: pattern | ||
| + conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" |
| @@ -20,6 +20,7 @@ | ||
| [jepsen.os.debian :as debian] | ||
| [clj-http.client :as http] | ||
| [clojurewerkz.elastisch.rest :as es] | ||
| + [clojurewerkz.elastisch.rest.admin :as esa] | ||
| [clojurewerkz.elastisch.rest.document :as esd] | ||
| [clojurewerkz.elastisch.rest.index :as esi] | ||
| [clojurewerkz.elastisch.rest.response :as esr])) | ||
| @@ -161,34 +162,45 @@ | ||
| (reify db/DB | ||
| (setup! [_ test node] | ||
| (doto node | ||
| + (nuke!) | ||
| (install! version) | ||
| (configure! test) | ||
| (start!))) | ||
| (teardown! [_ test node] | ||
| - (nuke! node)))) | ||
| + ;; Leave system up, to collect logs, analyze post mortem, etc | ||
| + ))) | ||
| (def index-name "jepsen-index") | ||
| +(defn index-already-exists-error? | ||
| + "Return true if the error is due to the index already existing, false | ||
| + otherwise." | ||
| + [error] | ||
| + (and | ||
| + (-> error .getData :status (= 400)) | ||
| + (re-find #"IndexAlreadyExistsException" | ||
| + (http-error error)))) | ||
| + | ||
| (defrecord CreateSetClient [client] | ||
| client/Client | ||
| (setup! [_ test node] | ||
| (let [; client (es/connect [[(name node) 9300]])] | ||
| client (es/connect (str "http://" (name node) ":9200"))] | ||
| - ; Create index | ||
| + ;; Create index | ||
| (try | ||
| (esi/create client index-name | ||
| :mappings {"number" {:properties | ||
| {:num {:type "integer" | ||
| :store "yes"}}}} | ||
| - :settings {"index" {"refresh_interval" "1"}}) | ||
| + :settings {"index" {"refresh_interval" "1s"}}) | ||
| (catch clojure.lang.ExceptionInfo e | ||
| - ; Is this seriously how you're supposed to do idempotent | ||
| - ; index creation? I've gotta be doing this wrong. | ||
| - (let [err (http-error e)] | ||
| - (when-not (re-find #"IndexAlreadyExistsException" err) | ||
| - (throw (RuntimeException. err)))))) | ||
| - | ||
| + (when-not (index-already-exists-error? e) | ||
| + (throw e)))) | ||
| + (esa/cluster-health client | ||
| + {:index [index-name] :level "indices" | ||
| + :wait_for_status "green" | ||
| + :wait_for_nodes (count (:nodes test))}) | ||
| (CreateSetClient. client))) | ||
| (invoke! [this test op] | ||
| @@ -220,9 +232,8 @@ | ||
| (esi/flush client index-name) | ||
| (assoc op :type :ok | ||
| :value (->> (esd/search client index-name "number" | ||
| - :search_type "query_then_fetch" | ||
| - :scroll "1m" | ||
| - :size 2) | ||
| + :scroll "10s" | ||
| + :size 20) | ||
|
Yes, that's much more sensible, thank you. :)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
||
| (esd/scroll-seq client) | ||
| (map (comp :num :_source)) | ||
| (into (sorted-set)))) | ||
| @@ -243,16 +254,17 @@ | ||
| (setup! [_ test node] | ||
| (let [; client (es/connect [[(name node) 9300]])] | ||
| client (es/connect (str "http://" (name node) ":9200"))] | ||
| - ; Create index | ||
| + ;; Create index | ||
| (try | ||
| (esi/create client index-name | ||
| :mappings {mapping-type {:properties {}}}) | ||
| (catch clojure.lang.ExceptionInfo e | ||
| - ; Is this seriously how you're supposed to do idempotent | ||
| - ; index creation? I've gotta be doing this wrong. | ||
| - (let [err (http-error e)] | ||
| - (when-not (re-find #"IndexAlreadyExistsException" err) | ||
| - (throw (RuntimeException. err)))))) | ||
| + (when-not (index-already-exists-error? e) | ||
| + (throw e)))) | ||
| + (esa/cluster-health client | ||
| + {:index [index-name] :level "indices" | ||
| + :wait_for_status "green" | ||
| + :wait_for_nodes (count (:nodes test))}) | ||
| ; Initial empty set | ||
| (esd/create client index-name mapping-type {:values []} :id doc-id) | ||
This change makes four changes:
In order to actually determine the root cause of issues, more verbose logging is needed. This defaults to more verbose logging for Elasticsearch and adds the ability to change it from Jepsen in the future (instead of manually by hand).
nuke!to before testsWithout this, Jepsen deletes all traces of itself after running, which makes debugging much more difficult (no logs and no data left).
An optional change, but I figured I would make this change anyway.
This is a key part of testing Elasticsearch, and clients should always do this when creating indices.
Note that I was able to reproduce the failures in elastic/elasticsearch#10426 (about half of the time) without these changes, however after the change which waits for green after index creation, I am no longer able to reproduce data loss with the
create-pausetest (still evaluating the other tests).