Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input/output error when read file through alluxio-fuse #9345

Closed
FrankYu opened this issue Jun 20, 2019 · 12 comments
Closed

Input/output error when read file through alluxio-fuse #9345

FrankYu opened this issue Jun 20, 2019 · 12 comments
Labels
area-fuse Alluxio fuse integration needs-response waiting on alluxio response priority-medium type-bug This issue is about a bug

Comments

@FrankYu
Copy link

FrankYu commented Jun 20, 2019

Alluxio Version:
2.0-rc3

Describe the bug

read file through mount point with alluxio-fuse, met 'Input/output error', while alluxio cmd work well

To Reproduce
mount alluxio-fuse to client(also run alluxio worker, 20G MEM as tieredstore level0 and 2000GB SSD as tieredstore level1)then try to read file(100GB) through the mount point with cmd "cat /path/to/file", met below error:
Caused by: io.grpc.StatusRuntimeException: UNKNOWN at io.grpc.Status.asRuntimeException(Status.java:530) at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434) at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at alluxio.grpc.GrpcChannel$ChannelResponseTracker$1$1.onClose(GrpcChannel.java:151) at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:694) at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397) at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 4278190359, max: 4294967296) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:652) at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:606) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764) at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740) at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:113) at io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:529) at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:89) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799) at io.netty.channel.epoll.EpollDomainSocketChannel$EpollDomainUnsafe.epollInReady(EpollDomainSocketChannel.java:138) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:382) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897) ... 1 more

Expected behavior

give more clear error or, should work fine

Urgency
High

Additional context

alluxio fs cat /path/to/file works good

@FrankYu FrankYu added the type-bug This issue is about a bug label Jun 20, 2019
@apc999
Copy link
Contributor

apc999 commented Jul 2, 2019

@FrankYu do you still see this issue with Alluxio 2.0.0 release?

@FrankYu
Copy link
Author

FrankYu commented Jul 4, 2019

yes, I still can see this error with alluxio 2.0.0 release. I'll post more info after I got how to reproduce it

@apc999
Copy link
Contributor

apc999 commented Jul 11, 2019

@FrankYu once you have more info, please update here so we could improve 2.0.0

@gpang
Copy link
Contributor

gpang commented Aug 14, 2019

@FrankYu Could you provide more details for a small example on how to reproduce this issue? Thanks!

@rastogiasr rastogiasr added the needs-info waiting on issue opener to respond label Sep 10, 2019
@BobLiu20
Copy link

I meet this issue, too. Any news? version : 2.0.1

@BobLiu20
Copy link

I can list it in fuse but not cat

root@ff:/data/alluxio_client/default_tests_files# ls
BASIC_CACHE_ASYNC_THROUGH          BASIC_NO_CACHE_ASYNC_THROUGH                       BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_CACHE_THROUGH
BASIC_CACHE_CACHE_THROUGH          BASIC_NO_CACHE_CACHE_THROUGH                       BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_MUST_CACHE
BASIC_CACHE_MUST_CACHE             BASIC_NO_CACHE_MUST_CACHE                          BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_THROUGH
BASIC_CACHE_PROMOTE_ASYNC_THROUGH  BASIC_NO_CACHE_THROUGH                             BASIC_NON_BYTE_BUFFER_CACHE_THROUGH
BASIC_CACHE_PROMOTE_CACHE_THROUGH  BASIC_NON_BYTE_BUFFER_CACHE_ASYNC_THROUGH          BASIC_NON_BYTE_BUFFER_NO_CACHE_ASYNC_THROUGH
BASIC_CACHE_PROMOTE_MUST_CACHE     BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH          BASIC_NON_BYTE_BUFFER_NO_CACHE_CACHE_THROUGH
BASIC_CACHE_PROMOTE_THROUGH        BASIC_NON_BYTE_BUFFER_CACHE_MUST_CACHE             BASIC_NON_BYTE_BUFFER_NO_CACHE_MUST_CACHE
BASIC_CACHE_THROUGH                BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_ASYNC_THROUGH  BASIC_NON_BYTE_BUFFER_NO_CACHE_THROUGH
root@ff:/data/alluxio_client/default_tests_files# cat BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH
cat: BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH: Input/output error

@cheyang
Copy link
Contributor

cheyang commented Oct 9, 2019

I think it's caused by:

Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 4278190359, max: 4294967296) 

No available direct memory. But my real concern why 4080MB is not enough for alluxio fuse.

@bf8086
Copy link
Contributor

bf8086 commented Oct 9, 2019

@LuQQiu Can you take a look?

@bf8086 bf8086 added needs-response waiting on alluxio response and removed needs-info waiting on issue opener to respond labels Oct 9, 2019
@jiacheliu3
Copy link
Contributor

/ping @LuQQiu

@Binyang2014
Copy link
Contributor

Do we have update for this issue. For production deployment, we need to bound alluxio resource usage

@Binyang2014
Copy link
Contributor

This seems a grpc flow-control issue.
Worker send response base on !isReady && tooManyPendingChunks()

if (eof || cancel || error != null || (!mResponse.isReady() && tooManyPendingChunks())) {

Sometimes the pending chunks already exceed the buffer limit but isReady still return true. At this situation, worker will continue send chunks to client.

Client will put the response into a queue and not block until queue is full. So many chunks will be buffered at client side. This seems the reason why client consumes so much direct memory.

Set alluxio.user.streaming.reader.buffer.size.messages, alluxio.user.streaming.reader.chunk.size.bytes, alluxio.user.network.streaming.flowcontrol.window to a small number seems can limit the memory usage in client.

BTW, the option alluxio.worker.network.reader.buffer.size is quick confusing. Seems we can not limit the memory usage by this config

@LuQQiu
Copy link
Contributor

LuQQiu commented Jan 9, 2023

This issue has been closed due to inactivity. Please re-open if this still requires investigation.

@LuQQiu LuQQiu closed this as completed Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-fuse Alluxio fuse integration needs-response waiting on alluxio response priority-medium type-bug This issue is about a bug
Projects
None yet
Development

No branches or pull requests