Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make setting index.translog.sync_interval be dynamic #37382

Merged
merged 9 commits into from Mar 20, 2019

Conversation

liketic
Copy link
Contributor

@liketic liketic commented Jan 12, 2019

Currently, we cannot update index setting index.translog.sync_interval if index is open, because it's not dynamic which can be updated for closed index only.

Closes #32763

@colings86 colings86 added the :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. label Jan 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch ywelsch self-requested a review January 14, 2019 10:33
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has the same flaw as the attempt in #34104. Repeating my comment here:
While there's testing on the IndexSettings level, I don't think that the update logic actually takes effect on a live shard. The reason for this is that the fsyncTask (see IndexService) is currently only rescheduled when translog durability changes, not when the translog sync interval changes.

@liketic
Copy link
Contributor Author

liketic commented Jan 19, 2019

Thanks @ywelsch ! I pushed f5a6343 , could you please review again ?

@ywelsch ywelsch self-requested a review January 21, 2019 09:33
@ywelsch
Copy link
Contributor

ywelsch commented Jan 21, 2019

@elasticmachine test this please

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @liketic. I think there's still a bug in this code, which means we should also look at beefing up the tests around setting both durability and the sync interval. I've added a comment to detail how to simplify the implementation. Let me know if you need more assistance on this.

if (durability != oldTranslogDurability) {
final TimeValue syncInterval = indexSettings.getTranslogSyncInterval();
if (syncInterval.equals(oldSyncInterval) == false) {
rescheduleFsyncTask(() -> new AsyncTranslogFSync(IndexService.this));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if both durability and sync interval are changed in the same update? This update here ignores the change to durability. In particular, if durability is SYNC, this will now suddenly schedule a sync task when the sync interval is changed even though there's no such sync task supposed to run due to durability SYNC.

It might be better to restructure the sync task scheduling code to not look at oldTranslogDurability but just at whether we have a current task and whether we're supposed to have one, and if we have both, check that the intervals match (you can get the current tasks interval directly from the task (see refreshTask.getInterval() as an example a few lines above it), and then just do a plain rescheduleFsyncTask without parameters where it gets all it needs from the indexSettings (which have been updated).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ywelsch , so are we going to do something like:

            if (fsyncTask != null && durability == Translog.Durability.ASYNC
                && fsyncTask.getInterval().equals(indexSettings.getTranslogSyncInterval()) == false) {
                rescheduleFsyncTask();
            }

    private void rescheduleFsyncTask() {
        try {
            if (fsyncTask != null) {
                fsyncTask.close();
            }
        } finally {
            fsyncTask = indexSettings.getTranslogDurability() == Translog.Durability.REQUEST ? null : new AsyncTranslogFSync(this);
        }
    }

? I'm a little confused, what if the current fsyncTask is null or durability is changed from ASYNC to REQUEST, the rescheduleFsyncTask() will not be called. Or we can check if durability is changed first, like:

final Translog.Durability oldTranslogDurability = indexSettings.getTranslogDurability();
            if (oldTranslogDurability != durability) {
                rescheduleFsyncTask();
            } else if (fsyncTask != null && fsyncTask.getInterval().equals(indexSettings.getTranslogSyncInterval()) == false) {
                rescheduleFsyncTask();
            }
        }
    }

    private void rescheduleFsyncTask() {
        try {
            if (fsyncTask != null) {
                fsyncTask.close();
            }
        } finally {
            fsyncTask = indexSettings.getTranslogDurability() == Translog.Durability.REQUEST ? null : new AsyncTranslogFSync(this);
        }
    }

Could you help me? Thanks in advance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the following:

diff --git a/server/src/main/java/org/elasticsearch/index/IndexService.java b/server/src/main/java/org/elasticsearch/index/IndexService.java
index 604d27a1e70..da95ef3eff2 100644
--- a/server/src/main/java/org/elasticsearch/index/IndexService.java
+++ b/server/src/main/java/org/elasticsearch/index/IndexService.java
@@ -197,7 +197,7 @@ public class IndexService extends AbstractIndexComponent implements IndicesClust
         this.refreshTask = new AsyncRefreshTask(this);
         this.trimTranslogTask = new AsyncTrimTranslogTask(this);
         this.globalCheckpointTask = new AsyncGlobalCheckpointTask(this);
-        rescheduleFsyncTask(indexSettings.getTranslogDurability());
+        updateFsyncTaskIfNecessary();
     }
 
     public int numberOfShards() {
@@ -636,8 +636,6 @@ public class IndexService extends AbstractIndexComponent implements IndicesClust
 
     @Override
     public synchronized void updateMetaData(final IndexMetaData currentIndexMetaData, final IndexMetaData newIndexMetaData) {
-        final Translog.Durability oldTranslogDurability = indexSettings.getTranslogDurability();
-
         final boolean updateIndexMetaData = indexSettings.updateIndexMetaData(newIndexMetaData);
 
         if (Assertions.ENABLED
@@ -689,20 +687,23 @@ public class IndexService extends AbstractIndexComponent implements IndicesClust
                 });
                 rescheduleRefreshTasks();
             }
-            final Translog.Durability durability = indexSettings.getTranslogDurability();
-            if (durability != oldTranslogDurability) {
-                rescheduleFsyncTask(durability);
-            }
+            updateFsyncTaskIfNecessary();
         }
     }
 
-    private void rescheduleFsyncTask(Translog.Durability durability) {
-        try {
-            if (fsyncTask != null) {
-                fsyncTask.close();
+    private void updateFsyncTaskIfNecessary() {
+        if (indexSettings.getTranslogDurability() == Translog.Durability.REQUEST) {
+            try {
+                if (fsyncTask != null) {
+                    fsyncTask.close();
+                }
+            } finally {
+                fsyncTask = null;
             }
-        } finally {
-            fsyncTask = durability == Translog.Durability.REQUEST ? null : new AsyncTranslogFSync(this);
+        } else if (fsyncTask == null) {
+            fsyncTask = new AsyncTranslogFSync(this);
+        } else {
+            fsyncTask.updateIfNeeded();
         }
     }
 
@@ -841,6 +842,13 @@ public class IndexService extends AbstractIndexComponent implements IndicesClust
             super(indexService, indexService.getIndexSettings().getTranslogSyncInterval());
         }
 
+        void updateIfNeeded() {
+            final TimeValue newInterval = indexService.getIndexSettings().getTranslogSyncInterval();
+            if (newInterval.equals(getInterval()) == false) {
+                setInterval(newInterval);
+            }
+        }
+
         @Override
         protected String getThreadPool() {
             return ThreadPool.Names.FLUSH;

@michaelbaamonde
Copy link
Contributor

@liketic Are you still interested in moving this forward?

@liketic
Copy link
Contributor Author

liketic commented Mar 1, 2019

@michaelbaamonde Yes, will update soon.

@ywelsch
Copy link
Contributor

ywelsch commented Mar 18, 2019

ping @liketic

@liketic
Copy link
Contributor Author

liketic commented Mar 19, 2019

@ywelsch please review again.

@ywelsch ywelsch self-requested a review March 19, 2019 09:41
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @liketic!

@ywelsch
Copy link
Contributor

ywelsch commented Mar 19, 2019

@elasticmachine retest this please

@ywelsch
Copy link
Contributor

ywelsch commented Mar 19, 2019

@elasticmachine retest this please

@ywelsch
Copy link
Contributor

ywelsch commented Mar 20, 2019

@elasticmachine retest this please

@ywelsch
Copy link
Contributor

ywelsch commented Mar 20, 2019

CI run failed for unrelated reasons:

Could not determine the dependencies of task ':distribution:extractWindowsJdk'.
[7.1.0] > Could not resolve all files for configuration ':distribution:jdk_windows'.
[7.1.0] > Could not download windows.zip (jdk:windows:12)
[7.1.0] > Could not get resource 'https://download.java.net/java/GA/jdk12/33/GPL/openjdk-12_windows-x64_bin.zip'.
[7.1.0] > Premature end of Content-Length delimited message body (expected: 196405895; received: 2097152
[7.1.0]

@elasticmachine retest this please

@ywelsch
Copy link
Contributor

ywelsch commented Mar 20, 2019

another unrelated test failure:

ERROR 0.17s J2 | InternalEngineTests.testDeleteWithFatalError <<< FAILURES!
Throwable #1: java.io.IOException: could not remove the following files (in the order of attempts):
/var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/server/build/testrun/unitTest/J2/temp/org.elasticsearch.index.engine.InternalEngineTests_D5B77D5B191D2491-001/translog-primary-001/translog-2.tlog: java.io.IOException: access denied: /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/server/build/testrun/unitTest/J2/temp/org.elasticsearch.index.engine.InternalEngineTests_D5B77D5B191D2491-001/translog-primary-001/translog-2.tlog
/var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/server/build/testrun/unitTest/J2/temp/org.elasticsearch.index.engine.InternalEngineTests_D5B77D5B191D2491-001/translog-primary-001/translog-1.tlog: java.io.IOException: access denied: /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/server/build/testrun/unitTest/J2/temp/org.elasticsearch.index.engine.InternalEngineTests_D5B77D5B191D2491-001/translog-primary-001/translog-1.tlog
/var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/server/build/testrun/unitTest/J2/temp/org.elasticsearch.index.engine.InternalEngineTests_D5B77D5B191D2491-001/translog-primary-001: java.nio.file.DirectoryNotEmptyException: /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/server/build/testrun/unitTest/J2/temp/org.elasticsearch.index.engine.InternalEngineTests_D5B77D5B191D2491-001/translog-primary-001
at __randomizedtesting.SeedInfo.seed([D5B77D5B191D2491:39307CC9B117DA9]:0)
at org.elasticsearch.core.internal.io.IOUtils.rm(IOUtils.java:196)
at org.elasticsearch.index.translog.Translog.createEmptyTranslog(Translog.java:1865)
at org.elasticsearch.index.translog.Translog.createEmptyTranslog(Translog.java:1860)
at org.elasticsearch.index.engine.EngineTestCase.createEngine(EngineTestCase.java:511)
at org.elasticsearch.index.engine.InternalEngineTests.testDeleteWithFatalError(InternalEngineTests.java:3318)
at java.lang.Thread.run(Thread.java:748)

/cc: @dnhatn

@elasticmachine run elasticsearch-ci/1

@ywelsch ywelsch merged commit 3c352a8 into elastic:master Mar 20, 2019
ywelsch pushed a commit that referenced this pull request Mar 20, 2019
Currently, we cannot update index setting index.translog.sync_interval if index is open, because it's
not dynamic which can be updated for closed index only.

Closes #32763
pgomulka pushed a commit to pgomulka/elasticsearch that referenced this pull request Mar 25, 2019
Currently, we cannot update index setting index.translog.sync_interval if index is open, because it's
not dynamic which can be updated for closed index only.

Closes elastic#32763
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation claims "index.translog.sync_interval" is dynamic, but it can not be set dynamically
6 participants