-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Expect to support the filesystem not implementing the append-mode. #391
Comments
Do you mean that one dataFlushEvent could directly write a new single oss file? If so, how to maintain the index file? |
Aren't several blocks appended to a data file at present? I want to write a block into a single data file, so it corresponds to an index file. |
Supply a way that supports the non-append storage. |
Could u help give a data layout picture, which includes the relations of data block and oss file name. Besides, we should also consider how to support the local_order in object store. |
Okay, I will provide a design gram to describe it later. |
Yes. I know it’s better to avoid using the append mode to improve the performance. But if the data layout is changed, do we need a index file? In the local file or HDFS storage type, the index file will maintain the relation of block offset in the single one file. |
Great, thanks for reminding me, the index file seems to be not needed anymore. |
I have another question that will we directly use the object store api or to use the hadoop filesystem api to support this? If it is the former, maybe it’s better to introduce the uniffle dedicated filesystem api to wrap different concrete filesystems, also including the cos. |
Emm.. Maybe not. If having no a index structure, it means we don’t a global view to find files we needed in object store, especially for local order. WDYT? @jerqi |
Every data file should have index file. |
The performance will not be good, as we have to read all index files in one time for doing split segments. |
We can read only one index file. |
In fact, Can an abstract storage layer be provided in uniffle, and each manufacturer implements the necessary storage interface? It does not care whether the concrete class uses the native API (such as S3, etc) or the HCFS file system. |
It may require a detailed design doc to illustrate your idea and proposal. In fact, I believe uniffle(and other RSS systems also) makes a big assumption of filesystem capability, such as append support, if we want to support new storages and dropping append requirement, we should reconsider the data layout patterns and all the features it required such as read operation, data distribution, etc. cc @zuston and @LuciferYang. P.S: I think it's nice to have object store as a new storage type supported, we just need to think it throughly, make sure it doesn't introduce too much complexity, and maintain the flexibility to extend more storages. |
Thanks for ping me @advancedxy I am not very familiar with object storage, but I think it is better to design it separately to avoid incompatibility of data layout and negative impact on the performance of the current implementation |
Any update on this? @yuyang733 cc @jerqi @advancedxy If object store is supported, I will use this to store huge partition to reduce HDFS pressure for iQiyi. And this is an important feature for uniffle |
Code of Conduct
Search before asking
Describe the feature
Cloud object storage and its corresponding accelerated cache are widely used in storage-computing separation architecture.
But many of them may not implement the append-mode or the write amplification of the append-mode will be tricky.
Is it possible to consider supporting the way of independently storing each block as a data file at the same time?
Motivation
Cloud object storage and its corresponding accelerated cache are widely used in storage-computing separation architecture.
But many of them may not implement the append-mode or the write amplification of the append-mode will be tricky.
Describe the solution
The initially envisaged solution is to implement a non-append abstract storage type, such as:
AbstractObjectStorageWriteHandler
,AbstractObjectStorageReadHandler
, andAbstractObjectDeleteHandler
, and Implement the basic interface required by RSS in it.The implementation of the
read
,write
, anddelete
interfaces placed in the specific storage layer can be implemented by subclasses.Additional context
No response
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: