[FEATURE] Expect to support the filesystem not implementing the append-mode. #391

yuyang733 · 2022-12-07T08:48:10Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

Describe the feature

Cloud object storage and its corresponding accelerated cache are widely used in storage-computing separation architecture.
But many of them may not implement the append-mode or the write amplification of the append-mode will be tricky.

Is it possible to consider supporting the way of independently storing each block as a data file at the same time?

Motivation

Cloud object storage and its corresponding accelerated cache are widely used in storage-computing separation architecture.
But many of them may not implement the append-mode or the write amplification of the append-mode will be tricky.

Describe the solution

The initially envisaged solution is to implement a non-append abstract storage type, such as: AbstractObjectStorageWriteHandler, AbstractObjectStorageReadHandler, and AbstractObjectDeleteHandler, and Implement the basic interface required by RSS in it.

The implementation of the read, write, and delete interfaces placed in the specific storage layer can be implemented by subclasses.

Additional context

No response

Are you willing to submit PR?

Yes I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

zuston · 2022-12-07T12:34:56Z

Do you mean that one dataFlushEvent could directly write a new single oss file? If so, how to maintain the index file?

yuyang733 · 2022-12-07T12:44:04Z

Do you mean that one dataFlushEvent could directly write a new single oss file? If so, how to maintain the index file?

Aren't several blocks appended to a data file at present? I want to write a block into a single data file, so it corresponds to an index file.

yuyang733 · 2022-12-07T12:44:56Z

Do you mean that one dataFlushEvent could directly write a new single oss file? If so, how to maintain the index file?

Supply a way that supports the non-append storage.

zuston · 2022-12-07T12:44:59Z

Could u help give a data layout picture, which includes the relations of data block and oss file name.

Besides, we should also consider how to support the local_order in object store.

yuyang733 · 2022-12-07T12:46:42Z

Could u help give a data layout picture, which includes the relations of data block and oss file name.

Besides, we should also consider how to support the local_order in object store.

Okay, I will provide a design gram to describe it later.

zuston · 2022-12-07T12:47:36Z

Aren't several blocks appended to a data file at present?

Yes. I know it’s better to avoid using the append mode to improve the performance. But if the data layout is changed, do we need a index file? In the local file or HDFS storage type, the index file will maintain the relation of block offset in the single one file.

yuyang733 · 2022-12-07T12:49:36Z

Aren't several blocks appended to a data file at present?

Yes. I know it’s better to avoid using the append mode to improve the performance. But if the data layout is changed, do we need a index file? In the local file or HDFS storage type, the index file will maintain the relation of block offset in the single one file.

Great, thanks for reminding me, the index file seems to be not needed anymore.

zuston · 2022-12-07T12:53:16Z

I have another question that will we directly use the object store api or to use the hadoop filesystem api to support this?

If it is the former, maybe it’s better to introduce the uniffle dedicated filesystem api to wrap different concrete filesystems, also including the cos.

zuston · 2022-12-07T12:55:48Z

the index file seems to be not needed anymore.

Emm.. Maybe not. If having no a index structure, it means we don’t a global view to find files we needed in object store, especially for local order.

WDYT? @jerqi

jerqi · 2022-12-07T13:13:10Z

Every data file should have index file.

zuston · 2022-12-07T13:21:48Z

Every data file should have index file.

The performance will not be good, as we have to read all index files in one time for doing split segments.

jerqi · 2022-12-07T13:23:16Z

Every data file should have index file.

The performance will not be good, as we have to read all index files in one time for doing split segments.

We can read only one index file.

yuyang733 · 2022-12-07T13:24:57Z

I have another question that will we directly use the object store api or to use the hadoop filesystem api to support this?

If it is the former, maybe it’s better to introduce the uniffle dedicated filesystem api to wrap different concrete filesystems, also including the cos.

In fact, Can an abstract storage layer be provided in uniffle, and each manufacturer implements the necessary storage interface?

It does not care whether the concrete class uses the native API (such as S3, etc) or the HCFS file system.

advancedxy · 2022-12-07T15:01:53Z

In fact, Can an abstract storage layer be provided in uniffle, and each manufacturer implements the necessary storage interface?

It may require a detailed design doc to illustrate your idea and proposal.

In fact, I believe uniffle(and other RSS systems also) makes a big assumption of filesystem capability, such as append support, if we want to support new storages and dropping append requirement, we should reconsider the data layout patterns and all the features it required such as read operation, data distribution, etc. cc @zuston and @LuciferYang.

P.S: I think it's nice to have object store as a new storage type supported, we just need to think it throughly, make sure it doesn't introduce too much complexity, and maintain the flexibility to extend more storages.

LuciferYang · 2022-12-08T04:28:22Z

Thanks for ping me @advancedxy I am not very familiar with object storage, but I think it is better to design it separately to avoid incompatibility of data layout and negative impact on the performance of the current implementation

zuston · 2023-01-16T06:36:25Z

Any update on this? @yuyang733 cc @jerqi @advancedxy

If object store is supported, I will use this to store huge partition to reduce HDFS pressure for iQiyi. And this is an important feature for uniffle

zuston mentioned this issue Feb 2, 2023

[FEATURE] Introduce the general remote fs access layer #317

Closed

3 tasks

xianjingfeng mentioned this issue May 22, 2023

[Improvement] Merge data file and index file #892

Open

3 tasks

jerqi mentioned this issue Jul 23, 2023

[Umbrella] Object Storage Support (Help Wanted) #1030

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Expect to support the filesystem not implementing the append-mode. #391

[FEATURE] Expect to support the filesystem not implementing the append-mode. #391

yuyang733 commented Dec 7, 2022 •

edited

zuston commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

zuston commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

zuston commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

zuston commented Dec 7, 2022

zuston commented Dec 7, 2022 •

edited

jerqi commented Dec 7, 2022

zuston commented Dec 7, 2022

jerqi commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

advancedxy commented Dec 7, 2022

LuciferYang commented Dec 8, 2022

zuston commented Jan 16, 2023

[FEATURE] Expect to support the filesystem not implementing the append-mode. #391

[FEATURE] Expect to support the filesystem not implementing the append-mode. #391

Comments

yuyang733 commented Dec 7, 2022 • edited

Code of Conduct

Search before asking

Describe the feature

Motivation

Describe the solution

Additional context

Are you willing to submit PR?

zuston commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

zuston commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

zuston commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

zuston commented Dec 7, 2022

zuston commented Dec 7, 2022 • edited

jerqi commented Dec 7, 2022

zuston commented Dec 7, 2022

jerqi commented Dec 7, 2022

yuyang733 commented Dec 7, 2022

advancedxy commented Dec 7, 2022

LuciferYang commented Dec 8, 2022

zuston commented Jan 16, 2023

yuyang733 commented Dec 7, 2022 •

edited

zuston commented Dec 7, 2022 •

edited