Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write operations can lock the nfs server #7

Closed
nickb937 opened this issue Apr 23, 2012 · 11 comments
Closed

Write operations can lock the nfs server #7

nickb937 opened this issue Apr 23, 2012 · 11 comments

Comments

@nickb937
Copy link

hadoop version 1.0.0

Copying files to hadoop via the nfs proxy using 'cp' or 'mv' can result in write deadlocks.

I'm finding that I can fairly easily lock the NFS server by running a few parallel write operations:

cd /hdfs/mountpoint/tmp
for x in $(seq 1 5); do ( dd if=/dev/urandom of=foo$x.test bs=1048576 count=500 & ); done

The Pending Write Offsets list never changes
kill -3 provides no output

12/04/23 17:56:30 INFO rpc.RPCServer: class com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler got client 192.168.1.10:944
12/04/23 17:56:30 INFO rpc.ClientWorker: Got SecurityHandler of type com.cloudera.hadoop.hdfs.nfs.security.SecurityHandler
12/04/23 17:56:30 WARN handlers.OperationRequestHandler: 0x7ffffff0 Error for client /192.168.1.10 and OPENResponse
12/04/23 17:56:30 WARN nfs4.NFS4Handler: 0x7ffffff0 Quitting due to 1 on OPENRequest for nickb
12/04/23 17:56:49 INFO handlers.OPENHandler: 0x7ffffff0 Opened /tmp/foo2.test for write org.apache.hadoop.fs.FSDataOutputStream@50703755
12/04/23 17:56:49 INFO handlers.OPENHandler: 0x7ffffff0 Opened /tmp/foo3.test for write org.apache.hadoop.fs.FSDataOutputStream@4dcac775
12/04/23 17:56:49 INFO handlers.OPENHandler: 0x7ffffff0 Opened /tmp/foo4.test for write org.apache.hadoop.fs.FSDataOutputStream@746cd30c
12/04/23 17:56:49 INFO handlers.OPENHandler: 0x7ffffff0 Opened /tmp/foo5.test for write org.apache.hadoop.fs.FSDataOutputStream@3dc81f66
12/04/23 17:57:08 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@3dc81f66 and 34799616
12/04/23 17:57:08 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@50703755 and 38600704
12/04/23 17:57:08 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@746cd30c and 34603008
12/04/23 17:57:08 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@4dcac775 and 49414144
12/04/23 17:57:09 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@3dc81f66 and 105979904
12/04/23 17:57:09 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@50703755 and 133173248
12/04/23 17:57:09 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@746cd30c and 131076096
12/04/23 17:57:09 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@3dc81f66 and 106012672
12/04/23 17:57:09 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@50703755 and 133206016
12/04/23 17:57:09 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@746cd30c and 131108864

12/04/23 17:57:18 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@746cd30c and 34603008
12/04/23 17:57:18 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@3dc81f66 and 34799616
12/04/23 17:57:18 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@50703755 and 38600704
12/04/23 17:57:18 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@4dcac775 and 85655552
12/04/23 17:57:24 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@746cd30c and 161632256
12/04/23 17:57:24 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@50703755 and 38600704
12/04/23 17:57:24 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@3dc81f66 and 90570752
12/04/23 17:57:24 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@4dcac775 and 103546880
12/04/23 17:57:24 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@746cd30c and 34603008
12/04/23 17:57:24 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@746cd30c and 161636352
12/04/23 17:57:24 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@746cd30c and 161640448
12/04/23 17:57:26 INFO nfs4.WriteOrderHandler: Sync for org.apache.hadoop.fs.FSDataOutputStream@50703755 and 163704832
12/04/23 17:57:36 INFO nfs4.WriteOrderHandler: Pending Write Offsets: [38928384, 38993920, 39059456, 39124992, 39190528, 39256064, 39321600, 39387
136, 39452672, 39518208, 39583744, 39649280, 39714816, 39780352, 39845888, 39911424, 39976960, 40042496, 40108032, 40173568, 40239104, 40304640, 4
0370176, 40435712, 40501248, 40566784, 40632320, 40697856, 40763392, 40828928, 40894464, 40960000, 41025536, 41091072, 41156608, 41222144, 4128768
0, 41353216, 41418752, 41484288, 41549824, 41615360, 41680896, 41746432, 41811968, 41877504, 41943040, 42008576, 42074112, 42139648, 42205184, 422
70720, 42336256, 42401792, 42467328, 42532864, 42598400, 42663936, 42729472, 42795008, 42860544, 42926080, 42991616, 43057152, 43122688, 43188224,
43253760, 43319296, 43384832, 43450368, 43515904, 43581440, 43646976, 43712512, 43778048, 43843584, 43909120, 43974656, 44040192, 44105728, 44171
264, 44236800, 44302336, 44367872, 44433408, 44498944, 44564480, 44630016, 44695552, 44761088, 44826624, 44892160, 44957696, 45023232, 45088768, 4
... and lots more

Followed by many of these:

12/04/23 18:00:12 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1845449483
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1828672267
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1811895051
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1795117835
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1778340619
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1761563403
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1744786187
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1728008971
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1711231755
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1694454539
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1677677323
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1660900107
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1644122891
12/04/23 18:00:13 INFO rpc.ClientWorker: 0x7fffffe6 ignoring request 1627345675

@brockn
Copy link
Owner

brockn commented Apr 23, 2012

OS Version, JVM Version, Hadoop Version?

Kill -3 prints nothing to the .out or .err file?

@nickb937
Copy link
Author

OS: ubuntu 11.10
Hadoop: 1.0.0
$ java -version
java version "1.6.0_23"
OpenJDK Runtime Environment (IcedTea6 1.11pre) (6b23pre11-0ubuntu1.11.10.2)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
nickb@nas:
/src/hdfs-nfs-proxy$

Looks like I was looking in the wrong place for kill -3 output:

Full thread dump OpenJDK 64-Bit Server VM (20.0-b11 mixed mode):

"OutputStreamHandler-192.168.1.10:944" daemon prio=10 tid=0x0000000001629000 nid=0x6549 waiting on condition [0x00007f7093599000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000caf508f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:423)
at com.cloudera.hadoop.hdfs.nfs.rpc.OutputStreamHandler.run(OutputStreamHandler.java:76)

"RPCServer-192.168.1.10:944" prio=10 tid=0x0000000001635000 nid=0x6548 runnable [0x00007f709086c000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at com.cloudera.hadoop.hdfs.nfs.rpc.RPCBuffer.readFully(RPCBuffer.java:106)
at com.cloudera.hadoop.hdfs.nfs.rpc.RPCBuffer.from(RPCBuffer.java:73)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker.run(ClientWorker.java:147)

"ResponseProcessor for block blk_-1335885186091250665_38679" daemon prio=10 tid=0x0000000001a28800 nid=0x64d8 runnable [0x00007f7090e72000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:83)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000dc55d298> (a sun.nio.ch.Util$1)
- locked <0x00000000dc55d2a8> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000dc55d250> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2964)

"ResponseProcessor for block blk_1791805563866326323_38679" daemon prio=10 tid=0x0000000001d56000 nid=0x64d2 runnable [0x00007f7091175000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:83)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000dc55e280> (a sun.nio.ch.Util$1)
- locked <0x00000000dc55e290> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000dc55e238> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2964)

"WriteOrderHandler-/tmp/foo5.test" daemon prio=10 tid=0x00000000014b8000 nid=0x64c1 waiting on condition [0x00007f709076b000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000dc55d6f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:423)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.run(WriteOrderHandler.java:72)

"ResponseProcessor for block blk_-1767613122637356870_38679" daemon prio=10 tid=0x00000000019c9000 nid=0x64c0 runnable [0x00007f7091e82000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:83)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000dc55d808> (a sun.nio.ch.Util$1)
- locked <0x00000000dc55d818> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000dc55d7c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2964)

"WriteOrderHandler-/tmp/foo4.test" daemon prio=10 tid=0x00000000019c7000 nid=0x64b9 waiting on condition [0x00007f709288c000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000dc55da60> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:423)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.run(WriteOrderHandler.java:72)

"WriteOrderHandler-/tmp/foo3.test" daemon prio=10 tid=0x00000000018ca800 nid=0x64b5 waiting on condition [0x00007f7091276000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000dc55db58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:423)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.run(WriteOrderHandler.java:72)

"ResponseProcessor for block blk_1423889860524175386_38679" daemon prio=10 tid=0x0000000001aa4000 nid=0x64b4 runnable [0x00007f7090c70000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:83)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00000000dc55dc70> (a sun.nio.ch.Util$1)
- locked <0x00000000dc55dc80> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000dc55dc28> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2964)

"WriteOrderHandler-/tmp/foo2.test" daemon prio=10 tid=0x00000000014cd000 nid=0x64af waiting on condition [0x00007f7090a6e000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000dc55dec8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:423)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.run(WriteOrderHandler.java:72)

"DataStreamer for file /tmp/foo5.test block blk_-1335885186091250665_38679" daemon prio=10 tid=0x0000000001e69800 nid=0x64ab in Object.wait() [0x00007f7091c80000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2803)
- locked <0x00000000dc55df60> (a java.util.LinkedList)

"DataStreamer for file /tmp/foo4.test block blk_-1767613122637356870_38679" daemon prio=10 tid=0x0000000001ff5800 nid=0x64a7 in Object.wait() [0x00007f7092286000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2803)
- locked <0x00000000dc55dff8> (a java.util.LinkedList)

"DataStreamer for file /tmp/foo3.test block blk_1791805563866326323_38679" daemon prio=10 tid=0x0000000001470000 nid=0x64a2 in Object.wait() [0x00007f7091f83000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2803)
- locked <0x00000000dc55e090> (a java.util.LinkedList)

"DataStreamer for file /tmp/foo2.test block blk_1423889860524175386_38679" daemon prio=10 tid=0x0000000001604800 nid=0x649f in Object.wait() [0x00007f709278b000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2803)
- locked <0x00000000dc55e128> (a java.util.LinkedList)

"LeaseChecker" daemon prio=10 tid=0x0000000001e50800 nid=0x6173 waiting on condition [0x00007f709096d000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-10" prio=10 tid=0x00000000013b2800 nid=0x6116 sleeping[0x00007f7092b8e000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-9" prio=10 tid=0x00000000013b0800 nid=0x6115 sleeping[0x00007f7092c8f000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-8" prio=10 tid=0x00000000013ae800 nid=0x6114 sleeping[0x00007f7092d90000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-7" prio=10 tid=0x00000000013ac800 nid=0x6113 sleeping[0x00007f7092e91000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-6" prio=10 tid=0x0000000001563000 nid=0x6112 sleeping[0x00007f7092f92000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-5" prio=10 tid=0x0000000001560800 nid=0x6111 sleeping[0x00007f7093093000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-4" prio=10 tid=0x0000000001425000 nid=0x6110 sleeping[0x00007f7093194000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-3" prio=10 tid=0x0000000001423000 nid=0x610f sleeping[0x00007f7093295000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-2" prio=10 tid=0x000000000162b800 nid=0x610e sleeping[0x00007f7093396000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"pool-1-thread-1" prio=10 tid=0x0000000001567800 nid=0x610d sleeping[0x00007f7093497000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.pause(WriteOrderHandler.java:183)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.sync(WriteOrderHandler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.write(WriteOrderHandler.java:235)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:59)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.WRITEHandler.doHandle(WRITEHandler.java:39)
at com.cloudera.hadoop.hdfs.nfs.nfs4.handlers.OperationRequestHandler.handle(OperationRequestHandler.java:57)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:150)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$1.run(NFS4Handler.java:137)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:137)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler.process(NFS4Handler.java:65)
at com.cloudera.hadoop.hdfs.nfs.rpc.ClientWorker$ClientTask.run(ClientWorker.java:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)

"RPCServer-NFS4Handler-2050" prio=10 tid=0x000000000180a000 nid=0x610a runnable [0x00007f7093bc3000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
at java.net.ServerSocket.implAccept(ServerSocket.java:470)
at java.net.ServerSocket.accept(ServerSocket.java:438)
at com.cloudera.hadoop.hdfs.nfs.rpc.RPCServer.run(RPCServer.java:85)

"MetricsPrinter" daemon prio=10 tid=0x0000000001625000 nid=0x60fb waiting on condition [0x00007f70942ea000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Handler$MetricsPrinter.run(NFS4Handler.java:849)

"Low Memory Detector" daemon prio=10 tid=0x00000000013f1000 nid=0x60f9 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00000000013ed800 nid=0x60f8 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00000000013eb800 nid=0x60f7 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00000000013dd000 nid=0x60f6 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00000000013c5000 nid=0x60f5 in Object.wait() [0x00007f7094faf000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
- locked <0x00000000caf50e50> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)

"Reference Handler" daemon prio=10 tid=0x00000000013c3800 nid=0x60f4 in Object.wait() [0x00007f70950b0000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x00000000caf50680> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x000000000135d800 nid=0x60f2 waiting on condition [0x00007f709b957000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Server.run(NFS4Server.java:73)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.cloudera.hadoop.hdfs.nfs.nfs4.NFS4Server.main(NFS4Server.java:52)

"VM Thread" prio=10 tid=0x00000000013bc800 nid=0x60f3 runnable

"VM Periodic Task Thread" prio=10 tid=0x00000000013f6000 nid=0x60fa waiting on condition

JNI global references: 1763

Heap
def new generation total 314560K, used 214776K [0x00000000b5a00000, 0x00000000caf50000, 0x00000000caf50000)
eden space 279616K, 64% used [0x00000000b5a00000, 0x00000000c099e098, 0x00000000c6b10000)
from space 34944K, 99% used [0x00000000c6b10000, 0x00000000c8d2fff8, 0x00000000c8d30000)
to space 34944K, 0% used [0x00000000c8d30000, 0x00000000c8d30000, 0x00000000caf50000)
tenured generation total 699072K, used 558101K [0x00000000caf50000, 0x00000000f5a00000, 0x00000000f5a00000)
the space 699072K, 79% used [0x00000000caf50000, 0x00000000ed0554a0, 0x00000000ed055600, 0x00000000f5a00000)
compacting perm gen total 21248K, used 12985K [0x00000000f5a00000, 0x00000000f6ec0000, 0x0000000100000000)
the space 21248K, 61% used [0x00000000f5a00000, 0x00000000f66ae480, 0x00000000f66ae600, 0x00000000f6ec0000)
No shared spaces configured.

@brockn
Copy link
Owner

brockn commented Apr 23, 2012

Awesome.

VM, ec2, hardware? Can you describe the topology (e.g.g single node cluster with the proxy running on the namenode)?

@nickb937
Copy link
Author

This is a cluster consisting of 12 datanodes with the nfs-proxy running on the namenode. The datanodes/namenode are on two Intel 64-bit Linux machines running Ubuntu 11.10. No virtual machines, EC2 or anything like that. All running on a local 1GigE network and connected into the same switch.

I'm writing to the NFS proxy over the network, mounted like this:

192.168.1.2:/ /export/hdfs nfs4 rw,relatime,vers=4,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,port=2050,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.10,minorversion=0,local_lock=none,addr=192.168.1.2 0 0

@cmccabe
Copy link

cmccabe commented Oct 1, 2012

@brockn: Out of curiousity, do you use a reordering buffer to ensure that we always write a contiguous range of bytes to HDFS? I vaguely remember reading somewhere that NFS clients can reorder writes (?)

@brockn
Copy link
Owner

brockn commented Oct 1, 2012

We do re-order the writes, but without question this the weakest area of the code as I wasn't sure what kind of data structure to use. Basically each file that is open has a background thread waiting for write requests. It stores the current file offset and any writes received higher than the current offset. Once it finally gets the current offset, it writes any queued write requests which are sequential. Then goes back to waiting for the write request at the current offset. I'd bet money this is where the bug is at.

https://github.com/brockn/hdfs-nfs-proxy/blob/master/src/main/java/com/cloudera/hadoop/hdfs/nfs/nfs4/WriteOrderHandler.java

@brockn
Copy link
Owner

brockn commented Oct 1, 2012

On RHEL5 5.8 I am only able to reproduce this since when the heap on the NFS Proxy is too small using the following:

for x in $(seq 1 10); do ( dd if=/root/foo.test of=/mnt/hdfs/foo$x.test bs=1048576 count=500 & ); done

I had to set the heap to 5.5GB to get the test to pass. The reason was due to the NFS write re-ordering. The proxy had to buffer 194MB of writes for this single file before the pre-request write showed up. See logs below.

2012-10-01 18:03:02,115 (WriteOrderHandler-/foo4.test) [INFO - com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.doWrite(WriteOrderHandler.java:130)] Writing to /foo4.test 5046272, 65536, new offset = 5111808, hash = -1280890681
2012-10-01 18:03:46,876 (WriteOrderHandler-/foo4.test) [DEBUG - com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.run(WriteOrderHandler.java:88)] Pending writes 194MB, current offset = 5111808
2012-10-01 18:03:46,876 (WriteOrderHandler-/foo4.test) [INFO - com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.doWrite(WriteOrderHandler.java:130)] Writing to /foo4.test 5111808, 65536, new offset = 5177344, hash = 985612019

@brockn
Copy link
Owner

brockn commented Oct 2, 2012

The test passed 150 times overnight on a RHEL5.8 host. I'll see what I can do with an ubuntu client host.

@cmccabe
Copy link

cmccabe commented Oct 2, 2012

On Mon, Oct 1, 2012 at 4:14 PM, Brock Noland notifications@github.comwrote:

On RHEL5 5.8 I am only able to reproduce this since when the heap on the
NFS Proxy is too small using the following:

for x in $(seq 1 10); do ( dd if=/root/foo.test of=/mnt/hdfs/foo$x.test
bs=1048576 count=500 & ); done

I had to set the heap to 5.5GB to get the test to pass. The reason was due
to the NFS write re-ordering. The proxy had to buffer 194MB of writes for
this single file before the pre-request write showed up. See logs below.

2012-10-01 18:03:02,115 (WriteOrderHandler-/foo4.test) [INFO - com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.doWrite(WriteOrderHandler.java:130)] Writing to /foo4.test 5046272, 65536, new offset = 5111808, hash = -1280890681
2012-10-01 18:03:46,876 (WriteOrderHandler-/foo4.test) [DEBUG - com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.run(WriteOrderHandler.java:88)] Pending writes 194MB, current offset = 5111808
2012-10-01 18:03:46,876 (WriteOrderHandler-/foo4.test) [INFO - com.cloudera.hadoop.hdfs.nfs.nfs4.WriteOrderHandler.doWrite(WriteOrderHandler.java:130)] Writing to /foo4.test 5111808, 65536, new offset = 5177344, hash = 985612019

It seems like with a lot of clients, this amount of memory use per client
might not be sustainable. Can we somehow force the NFS clients to send the
writes in order in this scenario? Perhaps by somehow forcing a
retransmission when the writes are reordered? If not, perhaps it's worth
considering storing the buffered data in temporary files instead of in
memory.

Colin

Reply to this email directly or view it on GitHubhttps://github.com//issues/7#issuecomment-9053675.

@brockn
Copy link
Owner

brockn commented Oct 2, 2012

Without doing synchronous io I don't think it's possible to force the clients to no-reorder the writes. I have thought about the temporary file solution and I think it's probably the best option.

@brockn
Copy link
Owner

brockn commented Oct 5, 2012

tl;dr I don't think the heap filling was necessarily your issue. I resolved that on the async-writes branch and will be merging that to master soon. I also resolved the issue we see below. More research needs to be done this, but for now the approach is working.

I think your issue was something rather odd I found. I have only observed this on Ubuntu 12.1 but I am guessing it behaves similarly in 11.1 and newer RH/SuSE. The situation is as follows:

  1. Writes are re-ordered and a pre-requesite write is not sent, say offset 0-128, but 129-512 are sent.
  2. Because we don't have the write 0-128, we queue up 129-512.
  3. The server sends writes 513-768 and 769-1024 with the SYNC bit flipped, but still has not sent 0-128.

I am not entirely sure why it flips to sync mode. I think it might be because it feels we are responding to slowly. I do know that if the NFS server responds with an error it will flip over to sync mode so that the error can be propagated back up the client during a write(). That is a patch I tested years back. So I am guessing we don't respond in time, so the re-transmit trips that condition and it starts writing with the SYNC flag. Why at that point it doesn't then forward along 0-128 I don't know.

However, the issue is that it will stop sending writes waiting for a response to the SYNC writes. As such, in this scenario we cannot honor the SYNC flag on the write.

@brockn brockn closed this as completed Oct 5, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants