Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another MultiAction issue #25

Closed
ylangisc opened this issue Apr 23, 2012 · 10 comments
Closed

Another MultiAction issue #25

ylangisc opened this issue Apr 23, 2012 · 10 comments
Assignees
Labels

Comments

@ylangisc
Copy link

I see the following issue with the tsuna/asynchbase fork. I did not test stumbleupon/asynchbase but it's probably the same. I'm on HBase 0.92.0.

2012-04-22 23:23:38,978 ERROR org.apache.hadoop.hbase.io.HbaseObjectWritable: Error in readFields
java.io.IOException: Error in readFields
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:503)
        at org.apache.hadoop.hbase.client.MultiAction.readFields(MultiAction.java:116)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
        at org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1238)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1167)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:703)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:495)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:470)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Error in readFields
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:503)
        at org.apache.hadoop.hbase.client.Action.readFields(Action.java:101)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
        ... 12 more
Caused by: java.lang.NegativeArraySizeException: -1
        at org.apache.hadoop.hbase.util.Bytes.readByteArray(Bytes.java:147)
        at org.apache.hadoop.hbase.client.OperationWithAttributes.readAttributes(OperationWithAttributes.java:102)
        at org.apache.hadoop.hbase.client.Delete.readFields(Delete.java:262)
        at org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
        ... 15 more
@ghost ghost assigned tsuna Apr 24, 2012
@tsuna
Copy link
Member

tsuna commented Apr 24, 2012

Can you help me reproduce the bug? Is your code doing anything special? Are you able to reproduce the problem consistently? If so, can you share the DEBUG level log?

@ylangisc
Copy link
Author

This happened on our productive cluster and locally I'm not able anymore to reproduce the bug. Nothing special, the request is constructed as follows:

                    new DeleteRequest(TOTAL_BYTECOUNT_TABLE_NAME,
                            rr.getRow(),
                            TOTAL_BYTECOUNT_TABLE_COLUMN_FAMILY,
                            UNPROCESSED_QUALIFIER
                    )

According to the stack trace the server tries to deserialize attributes which exist as of version 2 (DeleteRequest) only. Looking into your code you still serialize v1 delete requests:

buf.writeByte(1); // Delete#DELETE_VERSION. Stick to v1 here for now.

HBase does not try to deserialize any attributes if the version is 1:

org.apache.hadoop.hbase.client.Delete:

if (version > 1) {
  readAttributes(in);
}

I do not understand how this exception can happen at all.

@tsuna
Copy link
Member

tsuna commented Apr 30, 2012

Generally these issues indicate that there is a problem with the way the RPC got serialized. If the RPC isn't serialized properly, then HBase can fail to deserialize it because the fields aren't where it expects them to be. The problem is that it's really hard to track down what the problem is without having more details about the failure: what was the RPC exactly, what did it contain, how did it get serialized? asynchbase's DEBUG log would tell us all of that. But so far we can only deplore the lack of details logged by HBase itself when this problem occurs.

@tsuna
Copy link
Member

tsuna commented May 3, 2012

So I'm not sure what should we do with this issue...

@dingwa
Copy link

dingwa commented Sep 15, 2012

I have the exact same issue running async 1.3.2 and HBase 0.92.1-cdh4.0.1. I have been able to reliably reproduce the issue, however only from within our application. I have not been able to create a test case to reproduce the issue

Our DEBUG log is as follows. For brevity, I have not included all the other failed RPC stacktraces that have failed. From what I can see, every PutRequest / DeleteRequest inside the MultiAction has failed.

2012-09-15 10:11:09.460 [New I/O client worker #2-1] DEBUG org.hbase.async.RegionClient - ------------------>> ENTERING DECODE >>------------------
2012-09-15 10:11:09.461 [New I/O client worker #2-1] DEBUG org.hbase.async.RegionClient - rpcid=2188, response size=102 bytes, 0 readable bytes left, rpc=MultiAction(batch=[PutRequest(table="BaseServerActiveSessions",
 key="APPLICATION.SUBSYSTEM:DEMO:59d7497c-ba41-4810-9984-57bbfdf1a17c", family="data", qualifiers=["sessionToken"], values=["59d7497c-ba41-4810-9984-57bbfdf1a17c"], timestamp=9223372036854775807, lockid=-1, durable=tr
ue, bufferable=true, attempt=1, region=RegionInfo(table="BaseServerActiveSessions", region_name="BaseServerActiveSessions,,1346901071674.a4e77a5c9400ec1f3908f40e7f66fff1.", stop_key="")), PutRequest(table="BaseServerA
ctiveSessions", key="APPLICATION.SUBSYSTEM:DEMO:59d7497c-ba41-4810-9984-57bbfdf1a17c", family="data", qualifiers=["username"], values=["DEMO"], timestamp=9223372036854775807, lockid=-1, durable=true, bufferable=true, 
attempt=1, region=RegionInfo(table="BaseServerActiveSessions", region_name="BaseServerActiveSessions,,1346901071674.a4e77a5c9400ec1f3908f40e7f66fff1.", stop_key="")), PutRequest(table="BaseServerActiveSessions", key="
APPLICATION.SUBSYSTEM:DEMO:59d7497c-ba41-4810-9984-57bbfdf1a17c", family="data", qualifiers=["subsystem"], values=["APPLICATION.SUBSYSTEM"], timestamp=9223372036854775807, lockid=-1, durable=true, bufferable=true, att
empt=1, region=RegionInfo(table="BaseServerActiveSessions", region_name="BaseServerActiveSessions,,1346901071674.a4e77a5c9400ec1f3908f40e7f66fff1.", stop_key="")), ... 34 RPCs not shown ..., PutRequest(table="StaticDa
taUserNotifications", key="com.celertech.staticdata.dal.exchange.UpdateExchangeRequestListener:59d7497c-ba41-4810-9984-57bbfdf1a17c", family="data", qualifiers=["id"], values=["59d7497c-ba41-4810-9984-57bbfdf1a17c"], 
timestamp=9223372036854775807, lockid=-1, durable=true, bufferable=true, attempt=1, region=RegionInfo(table="StaticDataUserNotifications", region_name="StaticDataUserNotifications,,1346901097009.d7e2f67469be2d2a47fd25
45bbcbc88b.", stop_key=""))])
2012-09-15 10:11:09.474 [New I/O client worker #2-1] ERROR com.celertech.baseserver.persist.hbase.CelerHBaseClient - Exception thrown sending RPC PutRequest(table="BaseServerActiveSessions", key="APPLICATION.SUBSYSTEM
:DEMO:59d7497c-ba41-4810-9984-57bbfdf1a17c", family="data", qualifiers=["sessionToken"], values=["59d7497c-ba41-4810-9984-57bbfdf1a17c"], timestamp=9223372036854775807, lockid=-1, durable=true, bufferable=true, attemp
t=0, region=RegionInfo(table="BaseServerActiveSessions", region_name="BaseServerActiveSessions,,1346901071674.a4e77a5c9400ec1f3908f40e7f66fff1.", stop_key="")) whilst connected to quorum cloudera-dev.localdomain:2181.
 thrown:
org.hbase.async.RemoteException: IPC server unable to read call parameters: Error in readFields
        at org.hbase.async.RegionClient.deserializeException(RegionClient.java:1186) [asynchbase-1.3.2.jar:na]
        at org.hbase.async.RegionClient.deserialize(RegionClient.java:1161) [asynchbase-1.3.2.jar:na]
        at org.hbase.async.RegionClient.decode(RegionClient.java:1082) [asynchbase-1.3.2.jar:na]
        at org.hbase.async.RegionClient.decode(RegionClient.java:82) [asynchbase-1.3.2.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:470) [netty-3.2.7.Final.jar:na]
        at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) [netty-3.2.7.Final.jar:na]
        at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:80) [netty-3.2.7.Final.jar:na]
        at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:938) [asynchbase-1.3.2.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.2.7.Final.jar:na]
        at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.2.7.Final.jar:na]
        at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:2408) [asynchbase-1.3.2.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) [netty-3.2.7.Final.jar:na]
        at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) [netty-3.2.7.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:351) [netty-3.2.7.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:282) [netty-3.2.7.Final.jar:na]
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:202) [netty-3.2.7.Final.jar:na]
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.2.7.Final.jar:na]
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44) [netty-3.2.7.Final.jar:na]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [na:1.7.0_10-ea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_10-ea]
        at java.lang.Thread.run(Thread.java:722) [na:1.7.0_10-ea]

Please let me know if you require any more information

@dingwa
Copy link

dingwa commented Sep 16, 2012

Just some more information on this from more testing, I have pointed my application to an earlier version of HBase (0.90.3) and I have not been able to reproduce the issue I was seeing above. Has something in the HBase binary API changed between 0.90.3 and 0.92.1 that we haven't catered for?

@tsuna
Copy link
Member

tsuna commented Sep 16, 2012

Yes HBase regularly changes the on-wire details of RPCs, so it is very possible that this bug could only be triggered with 0.92.x

@tsuna
Copy link
Member

tsuna commented Sep 24, 2012

I think I nailed down this bug. Can you please confirm that tsuna/asynchbase@d3a9f74 looks good and works for you too?

Thank you for your patience and sorry it took me so long to set some time aside to fix this bug. Plus the fix is only a one-line change.

@tsuna
Copy link
Member

tsuna commented Sep 24, 2012

For the records, this issue is a duplicate of #33.

@dingwa
Copy link

dingwa commented Sep 26, 2012

Hi Benoit, thanks for the patch. I have tested it using the same test case I was using before, and it looks like your patch has fixed the issue. Thank you for the quick turnaround.

Scott

@tsuna tsuna closed this as completed in d3a9f74 Sep 29, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants