Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Occasional corrupt journal likely triggered by sleep/wake cycle #341

Closed
trevorbernard opened this issue Feb 20, 2017 · 20 comments
Closed

Comments

@trevorbernard
Copy link

I've been able to trigger this on OSX and Linux usually after a few sleep wake cycles.

SingleChronicleQueueBuilder(path).build()

08:55:16.130 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.141 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.151 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.161 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.172 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.182 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.192 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.203 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
@peter-lawrey
Copy link
Member

peter-lawrey commented Feb 20, 2017 via email

@trevorbernard
Copy link
Author

net.openhft/chronicle-queue v4.5.25

@trevorbernard
Copy link
Author

trevorbernard commented Feb 25, 2017

I don't understand the root cause but it happens far more then the sleep/wake cycle. Attempting to try and find a minimal test case.

@dpisklov
Copy link
Contributor

@trevorbernard Hi, have you been able to identify a test case? Does this problem still happen with latest queue version?

@trevorbernard
Copy link
Author

@dpisklov not a minimal test case but it's very reproducible one. A few awake/sleep cycles of my laptop and closing/re-opening the application/chronicle can put it in the state. I don't think it corrupts the journal as much as it just spam the logs. Writing to the journal/rolling over usually fixes the problem. This also happens in v4.5.27

@RobAustin
Copy link
Member

@trevorbernard are you able to provide a failing test case for this, that you can commit via a pull request. I see that your thread name is [engine-tailer] where you using chronicle engine, we have not been able to reproduce this issue as such we may have to close it, but if you can help us out with a test case that would be very helpful, thank you in advance.

@trevorbernard
Copy link
Author

@RobAustin I'm using the chronicle queue -- that's just the name of my thread. Unfortunately, I don't have a minimal test case. This is my tailer code. We operating in a SPSC environment

(defn engine-tailer-thread
  [^String journal-path listener]
  (let [pauser (LongPauser. 1 100 500 10000 TimeUnit/MICROSECONDS)]
    (doto (Thread.
            #(with-open [lock (AffinityLock/acquireLock)
                         queue (.build (SingleChronicleQueueBuilder. journal-path))]
               (let [tailer (.toEnd (.createTailer queue))]
                 (log/info "Starting Matching Engine Tailer...")
                 (while (not (Thread/interrupted))
                   (try
                     (if (p/process-engine-event tailer listener)
                       (.reset pauser)
                       (.pause pauser))
                     (catch Throwable t
                       (log/error t "Uncaught exception in tailer"))))
                 ;; Should we system exit here? Tailer should never shutdown
                 (log/info "Matching Engine Tailer has been shutdown..."))))
      (.setName "engine-tailer"))))

If I recreate this issue with a fresh dev chronicle, I'll submit that in lieu of a minimal test case.

@RobAustin
Copy link
Member

thanks for this pseudo code ( I appreciate the time you have taken in writing this up ), however real java code is more preferable, submitting a failing test case is going to make it more likely that it will be fixed.

@trevorbernard
Copy link
Author

@RobAusti it's not pseudo code but the actual Clojure code we use. When I have free cycles, I'll try to create a reproducible test case in Java.

@dpisklov
Copy link
Contributor

dpisklov commented Dec 4, 2017

@trevorbernard and what you mean by corrupt journal? Our rolling mechanism will write EOF marker to the end of a queue file when it needs to roll over to new cycle (e.g. if you use RollCycles.HOURLY rolling, at the end of an hour it will roll).

@RobAustin
Copy link
Member

RobAustin commented Dec 4, 2017 via email

@trevorbernard
Copy link
Author

trevorbernard commented Dec 4, 2017

@dpisklov

I don't think it corrupts the journal as much as it just spam the logs.

08:55:16.130 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.141 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.151 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.161 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.172 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.182 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.192 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2
08:55:16.203 [engine-tailer] DEBUG n.o.c.q.i.s.SingleChronicleQueueExcerpts$StoreTailer - moveToIndex: 433e 2

This prints out every 10ms until something is written to appender or the log rolls over.

@dpisklov
Copy link
Contributor

dpisklov commented Dec 4, 2017 via email

@trevorbernard
Copy link
Author

trevorbernard commented Dec 4, 2017

Can you point me to the commit please? I'll just cherry pick it on top of v4.5.27. I probably won't be using 4.6.xx until it's stabilized and on central or there is a very compelling reason to.

@dpisklov
Copy link
Contributor

dpisklov commented Dec 4, 2017 via email

@dpisklov
Copy link
Contributor

dpisklov commented Dec 5, 2017

@trevorbernard BTW 4.6.55 is on maven central (although users are encouraged to use bom file, chronicle-bom-1.15.6 is the version you need, as it will also specify all the correct dependencies).

@trevorbernard
Copy link
Author

@dpisklov From what I gather it stands for bill of materials? How do I use it? So far v4.6.55 has been running stable (re: log spam) and imported our older chronicle journals just fine.

@dpisklov
Copy link
Contributor

dpisklov commented Dec 5, 2017

Add this in dependencyManagement section of your pom file:

<dependency>
    <groupId>net.openhft</groupId>
    <artifactId>chronicle-bom</artifactId>
    <version>1.15.6</version>
    <type>pom</type>
    <scope>import</scope>
</dependency>

And then you can omit version in your dependencies section, and whenever
you need to upgrade, you just upgrade chronicle bom. That way you will
ensure all dependencies have correct versions.

PS of it works in latest, do you mind closing this issue?

Thanks
D

@trevorbernard
Copy link
Author

Can confirm, no longer seeing this issue with v4.6.55

@dpisklov
Copy link
Contributor

@trevorbernard Great thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants