Search before asking
Fluss version
0.9.0 (latest release)
Please describe the bug 🐞
We have a Flink job that has been consuming a Fluss log table for about a week, but recently its throughput dropped to zero. After analyzing the subtask thread dump, it appears that remote log fetching is stuck because a semaphore permit cannot be acquired.
One subtask's thread dump is as follows
and then to use arthas to print the internal variable.
vmtool -c 312787f0 --action getInstances --className org.apache.fluss.client.table.scanner.log.LogFetchBuffer --limit 5 --express '
instances.length == 0 ? "no LogFetchBuffer" : (
#buf = instances[0],
#f = #buf.getClass().getDeclaredField("pendingFetches"),
#f.setAccessible(true),
#pmap = #f.get(#buf),
#pmap.isEmpty() ? "pendingFetches empty" : (
#firstList = #pmap.values().iterator().next(),
#firstList.isEmpty() ? "first list empty" : #firstList.get(0).toString()
)
)
and then to print some vars in the RemoteLogDownloader
vmtool -c 312787f0 --action getInstances --className org.apache.fluss.client.table.scanner.log.RemoteLogDownloader --limit 10 --express 'instances.length==0 ? "no RemoteLogDownloader" : (#d=instances[0],#c=#d.getClass(),#f1=#c.getDeclaredField("segmentsToFetch"),#f1.setAccessible(true),#f2=#c.getDeclaredField("segmentsToRecycle"),#f2.setAccessible(true),#f3=#c.getDeclaredField("prefetchSemaphore"),#f3.setAccessible(true),#m=new java.util.LinkedHashMap(),#m.put("segmentsToFetch_size",#f1.get(#d).size()),#m.put("segmentsToRecycle_size",#f2.get(#d).size()),#m.put("availablePermits",#f3.get(#d).availablePermits()),#m)'
And there's no any download failure logs in the taskmanager logs.
Solution
No response
Are you willing to submit a PR?
Search before asking
Fluss version
0.9.0 (latest release)
Please describe the bug 🐞
We have a Flink job that has been consuming a Fluss log table for about a week, but recently its throughput dropped to zero. After analyzing the subtask thread dump, it appears that remote log fetching is stuck because a semaphore permit cannot be acquired.
One subtask's thread dump is as follows
and then to use arthas to print the internal variable.
and then to print some vars in the
RemoteLogDownloaderAnd there's no any download failure logs in the taskmanager logs.
Solution
No response
Are you willing to submit a PR?