Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsd fsck warning message #895

Open
lordang opened this issue Nov 23, 2016 · 4 comments
Open

tsd fsck warning message #895

lordang opened this issue Nov 23, 2016 · 4 comments
Labels

Comments

@lordang
Copy link

lordang commented Nov 23, 2016

It seems I have tsd name and UID mapping error on uid table.
Our cluster has large tagv value cuz we use client ip as tagv.
And when I executed uid fsck command, uid java process used all RAM (we have 128G RAM)
and continuously ran GC and comsumed all CPU and RAM.
And then I got following warning message.

2016-11-23 10:19:16,882 WARN [New I/O worker #6] Scanner: RegionInfo(table="tsdb-uid", region_name="tsdb-uid,\x1Bx\xB4\xB7,1475154085093.65338521f3a7a06523eec77f11e2ca23.", stop_key="168220431") pretends to not know Scanner(table="tsdb-uid", start_key="!q\x10\xB2", stop_key="", columns=org.hbase.async.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 1484925, already closed?
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1966)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30438)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2016)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:110)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:90)
at java.lang.Thread.run(Thread.java:745)

Caused by RPC: GetNextRowsRequest(scanner_id=0x000000000006A87D, max_num_rows=1024, region=null, attempt=0), populate_blockcache=true, max_num_rows=1024, max_num_kvs=4096, region=null, filter=null, scanner_id=0x000000000006A87D). I will retry to open a scanner but this is typically because you've been holding the scanner open and idle for too long (possibly due to a long GC pause on your side or in the RegionServer)
2016-11-23 10:19:16,887 ERROR [main] UidManager: Duplicate reverse tagv mapping: 284081517 -> 284081517 and 284081517 -> 217110B2. kv=KeyValue(key="!q\x10\xB2", family="name", qualifier="tagv", value="284081517", timestamp=1460540461278)

Can I ignore this message and continue running fsck and wait for end?
Or Must I increase RAM and try again?

@manolama manolama added the bug label Nov 27, 2016
@manolama
Copy link
Member

Hello @lordang, The scanner exception you're seeing is normal for JVM undergoing massive GC as the underlying connection to HBase will be killed after a timeout period.

But fsck shouldn't eat up 128G of RAM so it sounds like there's a bug in there. If you could restart it and take a heap-dump of the JVM at around 4G or so I'd love to see it. Then we can fix it up. Thanks!

@lordang
Copy link
Author

lordang commented Dec 1, 2016

I took heap dump, but it's too big to attach to github. It's about 4GB. How can I show this?

@manolama
Copy link
Member

If you can drop-box it or post it in a GDrive that would be great.

@lordang
Copy link
Author

lordang commented Dec 26, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants