Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17552. Change ipc.client.rpc-timeout.ms from 0 to 120000 by default to avoid potential hang #2727

Merged
merged 7 commits into from
Mar 6, 2021

Conversation

functioner
Copy link
Contributor

I proposed a fix for HADOOP-17552.

…et.SocketTimeoutException due to the mistaken usage of the rpcTimeout configuration
@ferhui
Copy link
Contributor

ferhui commented Mar 1, 2021

@functioner Good catch, and thanks for contribution

@functioner
Copy link
Contributor Author

@ferhui Thanks!
And I was wondering if the fix I propose is satisfactory? To correct the rpcTimeout usage, now I just basically follow the usage of rpcTimeout in the constructor of hadoop.ipc.Client$Connection class. But I guess the value of timeout should be somehow improved. For example, use multiple of pingInterval rather than just a single pingInterval.

@ferhui
Copy link
Contributor

ferhui commented Mar 1, 2021

@functioner see TestRPC#testClientRpcTimeout
Maybe It is expected that Send ping all the time when ipc.client.rpc-timeout.ms less than 0.
@iwasakims Do you have any ideas? Thanks.

@functioner
Copy link
Contributor Author

functioner commented Mar 1, 2021

@functioner see TestRPC#testClientRpcTimeout
Maybe It is expected that Send ping all the time when ipc.client.rpc-timeout.ms less than 0.

@ferhui If this is expected, then the client should be able to automatically request a new connection rather than blindly waiting for the problematic connection it currently uses. Otherwise this endless waiting is meaningless, because now it's impossible for the server to add this connection to its Reader. In this case, timeout is better, because the user (client) at least can be aware of this issue.

@functioner
Copy link
Contributor Author

functioner commented Mar 1, 2021

@functioner see TestRPC#testClientRpcTimeout
Maybe It is expected that Send ping all the time when ipc.client.rpc-timeout.ms less than 0.
@iwasakims Do you have any ideas? Thanks.

@ferhui See TestRPC.java line 1419-1426 (within testClientRpcTimeout)

        try {
          // should not time out because effective rpc-timeout is
          // multiple of ping interval: 1600 (= 800 * (1000 / 800 + 1))
          proxy.sleep(null, newSleepRequest(1300));
        } catch (ServiceException e) {
          LOG.info("got unexpected exception.", e);
          fail("RPC should not time out.");
        }

According to the comment there, I think the correct behavior is to assign the effective rpc-timeout with multiple of ping interval.
In the current implementation (e.g., the constructor of hadoop.ipc.Client$Connection), rpc-timeout is set as a single ping interval, rather than multiple of ping interval.
Therefore, timeout is the expected behavior.

@ferhui
Copy link
Contributor

ferhui commented Mar 1, 2021

@functioner see TestRPC#testClientRpcTimeout
Maybe It is expected that Send ping all the time when ipc.client.rpc-timeout.ms less than 0.

@ferhui If this is expected, then the client should be able to automatically request a new connection rather than blindly waiting for the problematic connection it currently uses. Otherwise this endless waiting is meaningless, because now it's impossible for the server to add this connection to its Reader. In this case, timeout is better, because the user (client) at least can be aware of this issue.

In this case, we can set ipc.client.rpc-timeout.ms suitable value and resolve it. Let us see author(@iwasakims )'s thoughts first and then resolve it carefully.

@functioner
Copy link
Contributor Author

@functioner see TestRPC#testClientRpcTimeout
Maybe It is expected that Send ping all the time when ipc.client.rpc-timeout.ms less than 0.

@ferhui If this is expected, then the client should be able to automatically request a new connection rather than blindly waiting for the problematic connection it currently uses. Otherwise this endless waiting is meaningless, because now it's impossible for the server to add this connection to its Reader. In this case, timeout is better, because the user (client) at least can be aware of this issue.

In this case, we can set ipc.client.rpc-timeout.ms suitable value and resolve it. Let us see author(@iwasakims )'s thoughts first and then resolve it carefully.

Agree.
And just for reference, the explanation of ipc.client.rpc-timeout.ms configuration in core-default.xml also emphasizes "the effective value of the rpc-timeout is rounded up to multiple of ipc.ping.interval".

@iwasakims
Copy link
Member

iwasakims commented Mar 1, 2021

@functioner @ferhui, I made the timeout configurable by introducing ipc.client.rpc-timeout.ms in HADOOP-11252. Since the code was intended to keep behavior on existing deployment not using ipc.client.rpc-timeout.ms in the configuration, you may feel it is not straightforward. IPC_CLIENT_RPC_TIMEOUT_DEFAULT was set to 0 for the same reason. Since there has been no error report around the ipc.client.rpc-timeout.ms for a long time, changing the default value should be OK now. If just updating the the value of IPC_CLIENT_RPC_TIMEOUT_DEFAULT and relevant tests is enough, it should be better than modifying Client.java in order to avoid bringing new bug.

@iwasakims
Copy link
Member

I think the long default timeout (kicked by tcp_retries2) surfaced as RM-HA (YARN) issue like YARN-2578 rather than HDFS issue. Maybe NN-HA client proxy injecting other timeout worked as mitigation.

@iwasakims
Copy link
Member

From the description of HADOOP-17552:

We are doing some systematic fault injection testing in Hadoop-3.2.2 and when we try to run a client (e.g., bin/hdfs dfs -ls /) to our HDFS cluster (1 NameNode, 2 DataNodes), the client gets stuck forever.

@functioner I think the issue was surfaced by half-closed TCP connection (connection loss without RST packet) caused by HW issue like power fault. What kind of fault injection caused this?

@functioner
Copy link
Contributor Author

From the description of HADOOP-17552:

We are doing some systematic fault injection testing in Hadoop-3.2.2 and when we try to run a client (e.g., bin/hdfs dfs -ls /) to our HDFS cluster (1 NameNode, 2 DataNodes), the client gets stuck forever.

@functioner I think the issue was surfaced by half-closed TCP connection (connection loss without RST packet) caused by HW issue like power fault. What kind of fault injection caused this?

@iwasakims In Server.java, the socket channel is accepted in line 1400, and then the fault (IOException) is injected in line 1402 (1403 or 1404 will also work).

    public void run() {
      while (running) {
        SelectionKey key = null;
        try {
          getSelector().select();
          Iterator<SelectionKey> iter = getSelector().selectedKeys().iterator();
          while (iter.hasNext()) {
            key = iter.next();
            iter.remove();
            try {
              if (key.isValid()) {
                if (key.isAcceptable())
                  doAccept(key);
              }
            } catch (IOException e) {                         // line 1350
            }
            key = null;
          }
        } catch (OutOfMemoryError e) {
          // ...
        } catch (Exception e) {
          // ...
        }
      }
    }
    void doAccept(SelectionKey key) throws InterruptedException, IOException, 
        OutOfMemoryError {
      ServerSocketChannel server = (ServerSocketChannel) key.channel();
      SocketChannel channel;
      while ((channel = server.accept()) != null) {           // line 1400

        channel.configureBlocking(false);                     // line 1402
        channel.socket().setTcpNoDelay(tcpNoDelay);           // line 1403
        channel.socket().setKeepAlive(true);                  // line 1404
        
        Reader reader = getReader();
        Connection c = connectionManager.register(channel,
            this.listenPort, this.isOnAuxiliaryPort);
        // If the connectionManager can't take it, close the connection.
        if (c == null) {
          if (channel.isOpen()) {
            IOUtils.cleanup(null, channel);
          }
          connectionManager.droppedConnections.getAndIncrement();
          continue;
        }
        key.attach(c);  // so closeCurrentConnection can get the object
        reader.addConnection(c);
      }
    }

The basic idea of this fault injection is to allow the server to accept the connection in line 1400 but stop it from being added to reader so that server can't be aware of the data from this client. The injected IOException is swallowed in line 1350.

@functioner
Copy link
Contributor Author

@functioner see TestRPC#testClientRpcTimeout
Maybe It is expected that Send ping all the time when ipc.client.rpc-timeout.ms less than 0.
@iwasakims Do you have any ideas? Thanks.

@ferhui See TestRPC.java line 1419-1426 (within testClientRpcTimeout)

        try {
          // should not time out because effective rpc-timeout is
          // multiple of ping interval: 1600 (= 800 * (1000 / 800 + 1))
          proxy.sleep(null, newSleepRequest(1300));
        } catch (ServiceException e) {
          LOG.info("got unexpected exception.", e);
          fail("RPC should not time out.");
        }

According to the comment there, I think the correct behavior is to assign the effective rpc-timeout with multiple of ping interval.
In the current implementation (e.g., the constructor of hadoop.ipc.Client$Connection), rpc-timeout is set as a single ping interval, rather than multiple of ping interval.
Therefore, timeout is the expected behavior.

@iwasakims The current codebase can pass this test, but in the wrong way. The current behavior is always swallowing the SocketTimeoutException in line 548:

     /* Process timeout exception
       * if the connection is not going to be closed or 
       * the RPC is not timed out yet, send a ping.
       */
      private void handleTimeout(SocketTimeoutException e, int waiting)
          throws IOException {
        if (shouldCloseConnection.get() || !running.get() ||
            (0 < rpcTimeout && rpcTimeout <= waiting)) {            // line 545
          throw e;
        } else {
          sendPing();                                               // line 548
        }
      }

According to the comment in that test case, it "should not time out because effective rpc-timeout is multiple of ping interval: 1600 (= 800 * (1000 / 800 + 1))", and it doesn't mean that it shouldn't time out.
Hence, I think the root cause of the bug is the code (line 545) does not enforce the semantic of ipc.client.rpc-timeout.ms configuration in core-default.xml.

@iwasakims
Copy link
Member

According to the comment in that test case, it "should not time out because effective rpc-timeout is multiple of ping interval: 1600 (= 800 * (1000 / 800 + 1))", and it doesn't mean that it shouldn't time out.

The SocketTimeoutException is thrown on Socket#read based on the timeout value set by Socket#setSoTimeout. It is set to 800(pingInterval) in the test because (pingInterval < rpcTimeout). The SocketTimeoutException is thrown at 800ms first time and swallowed because 800 < rpcTimeout. SocketTimeoutException is thrown again at 1600ms then handled because 1600 > rpcTimeout.

@functioner
Copy link
Contributor Author

functioner commented Mar 2, 2021

According to the comment in that test case, it "should not time out because effective rpc-timeout is multiple of ping interval: 1600 (= 800 * (1000 / 800 + 1))", and it doesn't mean that it shouldn't time out.

The SocketTimeoutException is thrown on Socket#read based on the timeout value set by Socket#setSoTimeout. It is set to 800(pingInterval) in the test because (pingInterval < rpcTimeout). The SocketTimeoutException is thrown at 800ms first time and swallowed because 800 < rpcTimeout. SocketTimeoutException is thrown again at 1600ms then handled because 1600 > rpcTimeout.

Oh, I think the current fix I propose does not consider the case of rpcTimeout != 0. It always assumes that rpcTimeout == 0, and then calculates rpcTimeout based on pingInterval.
I'm going to propose:

      private void handleTimeout(SocketTimeoutException e, int waiting)
          throws IOException {
        int timeout = 0;
        if (rpcTimeout > 0) {
          // effective rpc timeout is rounded up to multiple of pingInterval
          // if pingInterval < rpcTimeout.
          timeout = (doPing && pingInterval < rpcTimeout) ?
              (rpcTimeout + pingInterval - 1) / pingInterval * pingInterval  // rounded up to multiple of pingInterval
              : rpcTimeout;
        } else {
          timeout = pingInterval;
        }
        if (shouldCloseConnection.get() || !running.get() ||
            (0 < timeout && timeout <= waiting)) {

@iwasakims Do you agree?

@iwasakims
Copy link
Member

@functioner Just update the value of CommonConfigurationKeys.IPC_CLIENT_RPC_TIMEOUT_DEFAULT and core-default.xml if you want to change the default timeout. "ipc.client.rpc-timeout.ms=0" should mean "no timeout".

@functioner
Copy link
Contributor Author

@functioner Just update the value of CommonConfigurationKeys.IPC_CLIENT_RPC_TIMEOUT_DEFAULT and core-default.xml if you want to change the default timeout. "ipc.client.rpc-timeout.ms=0" should mean "no timeout".

@iwasakims Thanks for your explanation. Now I understand your design.
I agree that we should avoid modifying Client.java in order to avoid bringing new bug.
I also agree that "ipc.client.rpc-timeout.ms=0" should mean "no timeout".

I think we should update the description for ipc.client.rpc-timeout.ms, to specify that "ipc.client.rpc-timeout.ms=0" means no timeout. Otherwise the users might get confused. For example, if they set ipc.client.ping as true and ipc.client.rpc-timeout.ms as 0, they would think the rpcTimeout is pingInterval.

<property>
<name>ipc.client.rpc-timeout.ms</name>
<value>0</value>
<description>Timeout on waiting response from server, in milliseconds.
If ipc.client.ping is set to true and this rpc-timeout is greater than
the value of ipc.ping.interval, the effective value of the rpc-timeout is
rounded up to multiple of ipc.ping.interval.
</description>
</property>

For the default value of ipc.client.rpc-timeout.ms, I think 0 is okay, as long as we make the description of ipc.client.rpc-timeout.ms clear.

@iwasakims I'm going to update the description of ipc.client.rpc-timeout.ms to eliminate the confusion, without modifying any Java code. Do you agree?

@functioner
Copy link
Contributor Author

@iwasakims I've updated the description of ipc.client.rpc-timeout.ms in hadoop/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

You can double-check whether the description is unambiguous, and whether there exists other files requiring similar update.

@functioner
Copy link
Contributor Author

I guess many users may not change the default configuration. If they encounter this endless hang issue, it may take a while for them to figure out what's wrong, even with the clarification on the parameter description.

Therefore, I think we can consider to change the default value of ipc.client.rpc-timeout.ms.

@iwasakims I advocate for setting a reasonable default ipc.client.rpc-timeout.ms, say, 180s (three ping intervals). What do you think of it?

@ferhui
Copy link
Contributor

ferhui commented Mar 3, 2021

@functioner Thanks for comments.
You can change the title here and jira issue.
And i think 60s is more reasonable in production environment. you can push commit and see CI result here.

@iwasakims
Copy link
Member

Considering it affects failover time (for NM to reconnect to new active RM) of YARN-2578, 60 or 120 sounds reasonable. Since 60 (= defaul ipc.ping.interval) effectively disables ipc ping, how about using 120?

@functioner
Copy link
Contributor Author

Considering it affects failover time (for NM to reconnect to new active RM) of YARN-2578, 60 or 120 sounds reasonable. Since 60 (= defaul ipc.ping.interval) effectively disables ipc ping, how about using 120?

It makes sense to me. Then let's try with 120s and see what happens (e.g., result of test).
I will push a commit soon.

@functioner
Copy link
Contributor Author

@functioner Thanks for comments.
You can change the title here and jira issue.
And i think 60s is more reasonable in production environment. you can push commit and see CI result here.

If the current title is not appropriate or concise, what title would be better? Any idea? I can change the title with your suggestions.
The current title is "The read method of hadoop.ipc.Client$Connection$PingInputStream may swallow java.net.SocketTimeoutException due to the mistaken usage of the rpcTimeout configuration", and it does seem a little bit long.

@ferhui
Copy link
Contributor

ferhui commented Mar 3, 2021

SocketTimeoutException

How about "Change ipc.client.rpc-timeout.ms from 0 to 120000 by default to avoid potential hang"

@functioner
Copy link
Contributor Author

SocketTimeoutException

How about "Change ipc.client.rpc-timeout.ms from 0 to 120000 by default to avoid potential hang"

Sounds good. Done.

@functioner functioner changed the title HADOOP-17552. The read method of hadoop.ipc.Client$Connection$PingInputStream may swallow java.net.SocketTimeoutException due to the mistaken usage of the rpcTimeout configuration HADOOP-17552. Change ipc.client.rpc-timeout.ms from 0 to 120000 by default to avoid potential hang Mar 3, 2021
@ferhui
Copy link
Contributor

ferhui commented Mar 3, 2021

@functioner According to CI results, TestIPC#testClientGetTimeout fails. It is related, please check.

@functioner
Copy link
Contributor Author

@functioner According to CI results, TestIPC#testClientGetTimeout fails. It is related, please check.

It fails at line 1459:

@Test
public void testClientGetTimeout() throws IOException {
Configuration config = new Configuration();
assertThat(Client.getTimeout(config)).isEqualTo(-1);
}

Client.getTimeout is:

/**
* The time after which a RPC will timeout.
* If ping is not enabled (via ipc.client.ping), then the timeout value is the
* same as the pingInterval.
* If ping is enabled, then there is no timeout value.
*
* @param conf Configuration
* @return the timeout period in milliseconds. -1 if no timeout value is set
* @deprecated use {@link #getRpcTimeout(Configuration)} instead
*/
@Deprecated
final public static int getTimeout(Configuration conf) {
int timeout = getRpcTimeout(conf);
if (timeout > 0) {
return timeout;
}
if (!conf.getBoolean(CommonConfigurationKeys.IPC_CLIENT_PING_KEY,
CommonConfigurationKeys.IPC_CLIENT_PING_DEFAULT)) {
return getPingInterval(conf);
}
return -1;
}

Before we change the default rpcTimeout:
rpcTimeout is 0, so it won't return at line 251.
CommonConfigurationKeys.IPC_CLIENT_PING_DEFAULT is true, so it won't return at line 255 either.
Finally, it returns -1 at line 257, and passes the test case.

After we change the default rpcTimeout=120000:
It returns at line 251, it fails because 120000 is not -1.

Conclusion:
This test is essentially checking the default value of rpcTimeout.
Since we modified this value, we should also modify this test as assertThat(Client.getTimeout(config)).isEqualTo(120000).
What do you think? @ferhui @iwasakims

@ferhui
Copy link
Contributor

ferhui commented Mar 3, 2021

@functioner That's OK

@iwasakims
Copy link
Member

@functioner The TestIPC#testClientGetTimeout tests deprecated Client#getTimeout which was used before ipc.client.rpc-timeout.ms and Client#getRpcTimeout was introduced. Based on the context, the testClientGetTimeout should check the value of Client#getTimeout when ipc.client.rpc-timeout.ms is set to 0. (-1 is expected if ipc.client.ping is true (default)).

@ferhui
Copy link
Contributor

ferhui commented Mar 4, 2021

@functioner As @iwasakims said, you can add
conf.setInt(CommonConfigurationKeys.IPC_CLIENT_RPC_TIMEOUT_KEY, 0);
before
assertEquals(Client.getTimeout(config), -1);

@functioner
Copy link
Contributor Author

@functioner As @iwasakims said, you can add
conf.setInt(CommonConfigurationKeys.IPC_CLIENT_RPC_TIMEOUT_KEY, 0);
before
assertEquals(Client.getTimeout(config), -1);

It seems it doesn't work. The obtained timeout is still 120000. Any idea?

@functioner
Copy link
Contributor Author

Are we ready to merge? @ferhui @iwasakims

@iwasakims
Copy link
Member

@functioner You should address the checkstyle warning. I think we don't need the comment.

./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java:61:  public static final int IPC_CLIENT_RPC_TIMEOUT_DEFAULT = 120000; // 120 seconds: Line is longer than 80 characters (found 81). [LineLength]

Copy link
Member

@iwasakims iwasakims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Thanks, @functioner and @ferhui.

@iwasakims iwasakims merged commit ef7ab53 into apache:trunk Mar 6, 2021
@functioner functioner deleted the HADOOP-17552 branch March 6, 2021 18:34
kiran-maturi pushed a commit to kiran-maturi/hadoop that referenced this pull request Nov 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants