Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Clone in Desktop Download ZIP

Loading…

Various cleanups for Elasticsearch test #51

Merged
merged 5 commits into from

2 participants

@dakrone

This change makes four changes:

  • Adding a logging configuration

In order to actually determine the root cause of issues, more verbose logging is needed. This defaults to more verbose logging for Elasticsearch and adds the ability to change it from Jepsen in the future (instead of manually by hand).

  • Moving nuke! to before tests

Without this, Jepsen deletes all traces of itself after running, which makes debugging much more difficult (no logs and no data left).

  • Use more reasonable settings for scroll

An optional change, but I figured I would make this change anyway.

  • Wait for index to become green after creation

This is a key part of testing Elasticsearch, and clients should always do this when creating indices.


Note that I was able to reproduce the failures in elastic/elasticsearch#10426 (about half of the time) without these changes, however after the change which waits for green after index creation, I am no longer able to reproduce data loss with the create-pause test (still evaluating the other tests).

dakrone added some commits
@dakrone dakrone Add logging configuration
This increases the default logging to DEBUG, and sets TRACE logging for
gateway and discovery packages.
aa66375
@dakrone dakrone Move `nuke!` to before the test starts
This allows someone to collect the ES logs and data *after* a test run.
Otherwise the logs and data is removed and ES is stopped, making further
debugging impossible.
efc15e4
@dakrone dakrone Wait for index to become green after creation
After an index is created, clients should always wait for the index to
be fully created (the request returns immediately) before starting the
test.
22a50b2
@dakrone dakrone Use more reasonable settings for scroll
No need for the `query_then_fetch` setting, use ten seconds instead of
one minute, and a more reasonable size of 20 rather than 2.
b1cb921
@dakrone

@aphyr also, how would you feel about me replacing Elastisch with vanilla clj-http? Elastisch uses clj-http internally anyway and it would reduce the number of moving parts in this test. I'm happy to submit another PR if you are interested.

elasticsearch/src/elasticsearch/core.clj
((6 lines not shown))
(try
(esi/create client index-name
:mappings {"number" {:properties
{:num {:type "integer"
:store "yes"}}}}
- :settings {"index" {"refresh_interval" "1"}})
- (catch clojure.lang.ExceptionInfo e
- ; Is this seriously how you're supposed to do idempotent
- ; index creation? I've gotta be doing this wrong.
- (let [err (http-error e)]
- (when-not (re-find #"IndexAlreadyExistsException" err)
- (throw (RuntimeException. err))))))
-
+ :settings {"index" {"refresh_interval" "1s"}})
+ (catch Throwable t))
@aphyr Owner
aphyr added a note

I'd really prefer to know about connection errors etc that happen here; the only reason it's appropriate to noop is if the index already exists.

@dakrone
dakrone added a note

Sure, I will remove this to only handle IndexAlreadyExistsException as you previously did. I do think using basic clj-http would be easier, as it'd allow you to use:

(try+
  ...
  (catch [:status 400]
    ;; ignore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@aphyr aphyr commented on the diff
elasticsearch/src/elasticsearch/core.clj
@@ -220,9 +221,8 @@
(esi/flush client index-name)
(assoc op :type :ok
:value (->> (esd/search client index-name "number"
- :search_type "query_then_fetch"
- :scroll "1m"
- :size 2)
+ :scroll "10s"
+ :size 20)
@aphyr Owner
aphyr added a note

Yes, that's much more sensible, thank you. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
elasticsearch/src/elasticsearch/core.clj
((6 lines not shown))
(try
(esi/create client index-name
:mappings {mapping-type {:properties {}}})
- (catch clojure.lang.ExceptionInfo e
- ; Is this seriously how you're supposed to do idempotent
- ; index creation? I've gotta be doing this wrong.
- (let [err (http-error e)]
- (when-not (re-find #"IndexAlreadyExistsException" err)
- (throw (RuntimeException. err))))))
+ (catch Throwable t))
@aphyr Owner
aphyr added a note

Same here; I want to be conservative about which errors I'll ignore, haha. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
elasticsearch/src/elasticsearch/core.clj
((6 lines not shown))
(try
(esi/create client index-name
:mappings {"number" {:properties
{:num {:type "integer"
:store "yes"}}}}
- :settings {"index" {"refresh_interval" "1"}})
- (catch clojure.lang.ExceptionInfo e
- ; Is this seriously how you're supposed to do idempotent
- ; index creation? I've gotta be doing this wrong.
- (let [err (http-error e)]
- (when-not (re-find #"IndexAlreadyExistsException" err)
- (throw (RuntimeException. err))))))
-
+ :settings {"index" {"refresh_interval" "1s"}})
+ (catch Throwable t))
+ (esa/cluster-health client
+ {:index [index-name] :level "indices"
+ :wait_for_status "green"
+ :wait_for_nodes "5"})
@aphyr Owner
aphyr added a note

Should "5" be a string instead of a number? Either way, we should use (count (:nodes test)).

@dakrone
dakrone added a note

It can be either a string or a number, I will change this to be (count (:nodes test)) as you said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@dakrone dakrone Catch only the `IndexAlreadyExistsException`, use dynamic node count
Instead of catching any throwable during index creation, catch a
specific exception and ensure it was only because the index already
existed.

Additionally, this changes the hardcoded node count of 5 to
`(count (:nodes test))` so a dynamic number of nodes can be used.
4f3a27f
@dakrone

Pushed another commit addressing your feedback, thanks for taking a look!

@aphyr
Owner

Excellent, thanks @dakrone :)

@aphyr aphyr merged commit 8bbd973 into aphyr:master
@aphyr
Owner

Do these tests run for you? Elasticsearch doesn't even start on my nodes any more; times out waiting for cluster recovery.

@dakrone

@aphyr I just double-checked this and the tests are still running for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 7, 2015
  1. @dakrone

    Add logging configuration

    dakrone authored
    This increases the default logging to DEBUG, and sets TRACE logging for
    gateway and discovery packages.
  2. @dakrone

    Move `nuke!` to before the test starts

    dakrone authored
    This allows someone to collect the ES logs and data *after* a test run.
    Otherwise the logs and data is removed and ES is stopped, making further
    debugging impossible.
  3. @dakrone

    Wait for index to become green after creation

    dakrone authored
    After an index is created, clients should always wait for the index to
    be fully created (the request returns immediately) before starting the
    test.
  4. @dakrone

    Use more reasonable settings for scroll

    dakrone authored
    No need for the `query_then_fetch` setting, use ten seconds instead of
    one minute, and a more reasonable size of 20 rather than 2.
Commits on Apr 17, 2015
  1. @dakrone

    Catch only the `IndexAlreadyExistsException`, use dynamic node count

    dakrone authored
    Instead of catching any throwable during index creation, catch a
    specific exception and ensure it was only because the index already
    existed.
    
    Additionally, this changes the hardcoded node count of 5 to
    `(count (:nodes test))` so a dynamic number of nodes can be used.
This page is out of date. Refresh to see the latest.
View
67 elasticsearch/resources/logging.yml
@@ -0,0 +1,67 @@
+# you can override this using by setting a system property, for example -Des.logger.level=DEBUG
+es.logger.level: DEBUG
+rootLogger: ${es.logger.level}, console, file
+logger:
+ # log action execution errors for easier debugging
+ action: DEBUG
+ # reduce the logging for aws, too much is logged under the default INFO
+ com.amazonaws: WARN
+
+ # gateway
+ gateway: TRACE
+ index.gateway: TRACE
+
+ # peer shard recovery
+ #indices.recovery: DEBUG
+
+ # discovery
+ discovery: TRACE
+
+ index.search.slowlog: TRACE, index_search_slow_log_file
+ index.indexing.slowlog: TRACE, index_indexing_slow_log_file
+
+additivity:
+ index.search.slowlog: false
+ index.indexing.slowlog: false
+
+appender:
+ console:
+ type: console
+ layout:
+ type: consolePattern
+ conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
+
+ file:
+ type: dailyRollingFile
+ file: ${path.logs}/${cluster.name}.log
+ datePattern: "'.'yyyy-MM-dd"
+ layout:
+ type: pattern
+ conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
+
+ # Use the following log4j-extras RollingFileAppender to enable gzip compression of log files.
+ # For more information see https://logging.apache.org/log4j/extras/apidocs/org/apache/log4j/rolling/RollingFileAppender.html
+ #file:
+ #type: extrasRollingFile
+ #file: ${path.logs}/${cluster.name}.log
+ #rollingPolicy: timeBased
+ #rollingPolicy.FileNamePattern: ${path.logs}/${cluster.name}.log.%d{yyyy-MM-dd}.gz
+ #layout:
+ #type: pattern
+ #conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
+
+ index_search_slow_log_file:
+ type: dailyRollingFile
+ file: ${path.logs}/${cluster.name}_index_search_slowlog.log
+ datePattern: "'.'yyyy-MM-dd"
+ layout:
+ type: pattern
+ conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
+
+ index_indexing_slow_log_file:
+ type: dailyRollingFile
+ file: ${path.logs}/${cluster.name}_index_indexing_slowlog.log
+ datePattern: "'.'yyyy-MM-dd"
+ layout:
+ type: pattern
+ conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"
View
48 elasticsearch/src/elasticsearch/core.clj
@@ -20,6 +20,7 @@
[jepsen.os.debian :as debian]
[clj-http.client :as http]
[clojurewerkz.elastisch.rest :as es]
+ [clojurewerkz.elastisch.rest.admin :as esa]
[clojurewerkz.elastisch.rest.document :as esd]
[clojurewerkz.elastisch.rest.index :as esi]
[clojurewerkz.elastisch.rest.response :as esr]))
@@ -161,34 +162,45 @@
(reify db/DB
(setup! [_ test node]
(doto node
+ (nuke!)
(install! version)
(configure! test)
(start!)))
(teardown! [_ test node]
- (nuke! node))))
+ ;; Leave system up, to collect logs, analyze post mortem, etc
+ )))
(def index-name "jepsen-index")
+(defn index-already-exists-error?
+ "Return true if the error is due to the index already existing, false
+ otherwise."
+ [error]
+ (and
+ (-> error .getData :status (= 400))
+ (re-find #"IndexAlreadyExistsException"
+ (http-error error))))
+
(defrecord CreateSetClient [client]
client/Client
(setup! [_ test node]
(let [; client (es/connect [[(name node) 9300]])]
client (es/connect (str "http://" (name node) ":9200"))]
- ; Create index
+ ;; Create index
(try
(esi/create client index-name
:mappings {"number" {:properties
{:num {:type "integer"
:store "yes"}}}}
- :settings {"index" {"refresh_interval" "1"}})
+ :settings {"index" {"refresh_interval" "1s"}})
(catch clojure.lang.ExceptionInfo e
- ; Is this seriously how you're supposed to do idempotent
- ; index creation? I've gotta be doing this wrong.
- (let [err (http-error e)]
- (when-not (re-find #"IndexAlreadyExistsException" err)
- (throw (RuntimeException. err))))))
-
+ (when-not (index-already-exists-error? e)
+ (throw e))))
+ (esa/cluster-health client
+ {:index [index-name] :level "indices"
+ :wait_for_status "green"
+ :wait_for_nodes (count (:nodes test))})
(CreateSetClient. client)))
(invoke! [this test op]
@@ -220,9 +232,8 @@
(esi/flush client index-name)
(assoc op :type :ok
:value (->> (esd/search client index-name "number"
- :search_type "query_then_fetch"
- :scroll "1m"
- :size 2)
+ :scroll "10s"
+ :size 20)
@aphyr Owner
aphyr added a note

Yes, that's much more sensible, thank you. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
(esd/scroll-seq client)
(map (comp :num :_source))
(into (sorted-set))))
@@ -243,16 +254,17 @@
(setup! [_ test node]
(let [; client (es/connect [[(name node) 9300]])]
client (es/connect (str "http://" (name node) ":9200"))]
- ; Create index
+ ;; Create index
(try
(esi/create client index-name
:mappings {mapping-type {:properties {}}})
(catch clojure.lang.ExceptionInfo e
- ; Is this seriously how you're supposed to do idempotent
- ; index creation? I've gotta be doing this wrong.
- (let [err (http-error e)]
- (when-not (re-find #"IndexAlreadyExistsException" err)
- (throw (RuntimeException. err))))))
+ (when-not (index-already-exists-error? e)
+ (throw e))))
+ (esa/cluster-health client
+ {:index [index-name] :level "indices"
+ :wait_for_status "green"
+ :wait_for_nodes (count (:nodes test))})
; Initial empty set
(esd/create client index-name mapping-type {:values []} :id doc-id)
Something went wrong with that request. Please try again.