Problem: Occasional corrupt journal likely triggered by sleep/wake cycle #341

trevorbernard · 2017-02-20T13:50:21Z

I've been able to trigger this on OSX and Linux usually after a few sleep wake cycles.

SingleChronicleQueueBuilder(path).build()

08:55:16.130 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.141 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.151 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.161 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.172 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.182 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.192 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.203 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2

The text was updated successfully, but these errors were encountered:

peter-lawrey · 2017-02-20T14:05:14Z

Which version of chronicle queue are you using?

…

On 20 Feb 2017 14:50, "Trevor Bernard" ***@***.***> wrote: I've been able to trigger this on OSX and Linux usually after a few sleep wake cycles. SingleChronicleQueueBuilder(path).build() 08:55:16.130 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.141 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.151 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.161 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.172 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.182 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.192 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.203 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#341>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABBU8Vl0m9y1mXU8KzY5drBl5pNxDaTBks5reZoegaJpZM4MGMxP> .

trevorbernard · 2017-02-20T14:50:06Z

net.openhft/chronicle-queue v4.5.25

trevorbernard · 2017-02-25T06:37:12Z

I don't understand the root cause but it happens far more then the sleep/wake cycle. Attempting to try and find a minimal test case.

dpisklov · 2017-11-23T17:48:18Z

@trevorbernard Hi, have you been able to identify a test case? Does this problem still happen with latest queue version?

trevorbernard · 2017-11-23T20:34:25Z

@dpisklov not a minimal test case but it's very reproducible one. A few awake/sleep cycles of my laptop and closing/re-opening the application/chronicle can put it in the state. I don't think it corrupts the journal as much as it just spam the logs. Writing to the journal/rolling over usually fixes the problem. This also happens in v4.5.27

RobAustin · 2017-12-04T13:23:15Z

@trevorbernard are you able to provide a failing test case for this, that you can commit via a pull request. I see that your thread name is [engine-tailer] where you using chronicle engine, we have not been able to reproduce this issue as such we may have to close it, but if you can help us out with a test case that would be very helpful, thank you in advance.

trevorbernard · 2017-12-04T14:10:36Z

@RobAustin I'm using the chronicle queue -- that's just the name of my thread. Unfortunately, I don't have a minimal test case. This is my tailer code. We operating in a SPSC environment

(defn engine-tailer-thread
  [^String journal-path listener]
  (let [pauser (LongPauser. 1 100 500 10000 TimeUnit/MICROSECONDS)]
    (doto (Thread.
            #(with-open [lock (AffinityLock/acquireLock)
                         queue (.build (SingleChronicleQueueBuilder. journal-path))]
               (let [tailer (.toEnd (.createTailer queue))]
                 (log/info "Starting Matching Engine Tailer...")
                 (while (not (Thread/interrupted))
                   (try
                     (if (p/process-engine-event tailer listener)
                       (.reset pauser)
                       (.pause pauser))
                     (catch Throwable t
                       (log/error t "Uncaught exception in tailer"))))
                 ;; Should we system exit here? Tailer should never shutdown
                 (log/info "Matching Engine Tailer has been shutdown..."))))
      (.setName "engine-tailer"))))

If I recreate this issue with a fresh dev chronicle, I'll submit that in lieu of a minimal test case.

RobAustin · 2017-12-04T15:33:27Z

thanks for this pseudo code ( I appreciate the time you have taken in writing this up ), however real java code is more preferable, submitting a failing test case is going to make it more likely that it will be fixed.

trevorbernard · 2017-12-04T15:58:49Z

@RobAusti it's not pseudo code but the actual Clojure code we use. When I have free cycles, I'll try to create a reproducible test case in Java.

dpisklov · 2017-12-04T16:09:27Z

@trevorbernard and what you mean by corrupt journal? Our rolling mechanism will write EOF marker to the end of a queue file when it needs to roll over to new cycle (e.g. if you use RollCycles.HOURLY rolling, at the end of an hour it will roll).

RobAustin · 2017-12-04T16:09:57Z

Thanks :-)

…

Sent from my iPhone

On 4 Dec 2017, at 3:58 pm, Trevor Bernard ***@***.***> wrote: @robausti it's not pseudo code but the actual Clojure code we use. When I have free cycles, I'll try to create a reproducible test case in Java. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

trevorbernard · 2017-12-04T16:15:58Z

@dpisklov

I don't think it corrupts the journal as much as it just spam the logs.

08:55:16.130 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.141 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.151 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.161 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.172 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.182 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.192 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.203 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2

This prints out every 10ms until something is written to appender or the log rolls over.

dpisklov · 2017-12-04T16:19:34Z

I think it's fixed in 4.6.xx. Can you try with latest version? (unfortunately it's not on maven central but you should be able to easily build it locally and use the jar)

…

On 4 Dec 2017 16:16, "Trevor Bernard" ***@***.***> wrote: @dpisklov <https://github.com/dpisklov> I don't think it corrupts the journal as much as it just spam the logs. 08:55:16.130 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.141 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.151 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.161 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.172 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.182 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.192 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 08:55:16.203 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2 This prints out ever 10ms until something is written to appender or the log rolls over. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#341 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHdPyDuiM5bLChIS-Knq7wS-goF8-EKcks5s9BrDgaJpZM4MGMxP> .

trevorbernard · 2017-12-04T16:23:53Z

Can you point me to the commit please? I'll just cherry pick it on top of v4.5.27. I probably won't be using 4.6.xx until it's stabilized and on central or there is a very compelling reason to.

dpisklov · 2017-12-04T16:30:35Z

Is not that easy, in 4.6 file format is different. 4.6.xx is stable for general purpose, it is used by a number of our clients so you can use it in production without a problem. It has been decided since time ago that we publish recent versions to the private repo available to our clients, so if you want to have support for the chronicle queue, you can contact sales@chronicle.software to get tailored solution. At the moment you can just build from latest tag, to assess it for your app.

…

On 4 Dec 2017 16:23, "Trevor Bernard" ***@***.***> wrote: Can you point me to the commit please? I'll just cherry pick it on top of v4.5.27. I probably won't be using 4.6.xx until it's stabilized and on central. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#341 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHdPyLBPSUSSs0Ul0nwWvictLFaJA1eAks5s9BycgaJpZM4MGMxP> .

dpisklov · 2017-12-05T10:39:10Z

@trevorbernard BTW 4.6.55 is on maven central (although users are encouraged to use bom file, chronicle-bom-1.15.6 is the version you need, as it will also specify all the correct dependencies).

trevorbernard · 2017-12-05T16:47:26Z

@dpisklov From what I gather it stands for bill of materials? How do I use it? So far v4.6.55 has been running stable (re: log spam) and imported our older chronicle journals just fine.

dpisklov · 2017-12-05T17:01:59Z

Add this in dependencyManagement section of your pom file:

<dependency>
    <groupId>net.openhft</groupId>
    <artifactId>chronicle-bom</artifactId>
    <version>1.15.6</version>
    <type>pom</type>
    <scope>import</scope>
</dependency>

And then you can omit version in your dependencies section, and whenever
you need to upgrade, you just upgrade chronicle bom. That way you will
ensure all dependencies have correct versions.

PS of it works in latest, do you mind closing this issue?

Thanks
D

trevorbernard · 2017-12-18T13:44:06Z

Can confirm, no longer seeing this issue with v4.6.55

dpisklov · 2017-12-18T14:00:49Z

@trevorbernard Great thanks!

trevorbernard closed this as completed Dec 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem: Occasional corrupt journal likely triggered by sleep/wake cycle #341

Problem: Occasional corrupt journal likely triggered by sleep/wake cycle #341

trevorbernard commented Feb 20, 2017

peter-lawrey commented Feb 20, 2017 via email

trevorbernard commented Feb 20, 2017

trevorbernard commented Feb 25, 2017 •

edited

dpisklov commented Nov 23, 2017

trevorbernard commented Nov 23, 2017

RobAustin commented Dec 4, 2017

trevorbernard commented Dec 4, 2017

RobAustin commented Dec 4, 2017

trevorbernard commented Dec 4, 2017

dpisklov commented Dec 4, 2017

RobAustin commented Dec 4, 2017 via email

trevorbernard commented Dec 4, 2017 •

edited

dpisklov commented Dec 4, 2017 via email

trevorbernard commented Dec 4, 2017 •

edited

dpisklov commented Dec 4, 2017 via email

dpisklov commented Dec 5, 2017

trevorbernard commented Dec 5, 2017

dpisklov commented Dec 5, 2017

trevorbernard commented Dec 18, 2017

dpisklov commented Dec 18, 2017

Problem: Occasional corrupt journal likely triggered by sleep/wake cycle #341

Problem: Occasional corrupt journal likely triggered by sleep/wake cycle #341

Comments

trevorbernard commented Feb 20, 2017

peter-lawrey commented Feb 20, 2017 via email

trevorbernard commented Feb 20, 2017

trevorbernard commented Feb 25, 2017 • edited

dpisklov commented Nov 23, 2017

trevorbernard commented Nov 23, 2017

RobAustin commented Dec 4, 2017

trevorbernard commented Dec 4, 2017

RobAustin commented Dec 4, 2017

trevorbernard commented Dec 4, 2017

dpisklov commented Dec 4, 2017

RobAustin commented Dec 4, 2017 via email

trevorbernard commented Dec 4, 2017 • edited

dpisklov commented Dec 4, 2017 via email

trevorbernard commented Dec 4, 2017 • edited

dpisklov commented Dec 4, 2017 via email

dpisklov commented Dec 5, 2017

trevorbernard commented Dec 5, 2017

dpisklov commented Dec 5, 2017

trevorbernard commented Dec 18, 2017

dpisklov commented Dec 18, 2017

trevorbernard commented Feb 25, 2017 •

edited

trevorbernard commented Dec 4, 2017 •

edited

trevorbernard commented Dec 4, 2017 •

edited