New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-12378][docs] Consolidate FileSystem Documentation #8326
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big +1 for reorganizing these docs.
Looks good in general. Here are some suggestions to polish it off:
- The spelling is sometimes "filesystem" and sometimes "file system". Would be nice to consolidate all occurrences to one of these spellings.
- For the S3 docs, it might be worth more prominently pointing out that the Hadoop S3 FS supports the streaming file sink, but the presto one does now.
- It may also be worth pointing out that for checkpoints, we typically recommend the presto fs.
docs/ops/deployment/aws.md
Outdated
|
||
{% panel **Note:** You don't have to configure this manually if you are running [Flink on EMR](#emr-elastic-mapreduce). %} | ||
|
||
This setup is a bit more complex and we recommend using our shaded Hadoop/Presto file systems | ||
instead (see above) unless required otherwise, e.g. for using S3 as YARN's resource storage dir | ||
Apache Flink provides native [S3 FileSystem's](../filesystems/s3.html) out of the box and we recomend using them unless required otherwise, e.g. for using S3 as YARN's resource storage dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recomend --> recommend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"native" --> "built-in" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also wondering if we should simply drop the section below, about Hadoop's S3 file systems.
We can mention that Flink also supports Hadoop's file systems and refer to Hadoop docs for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's reasonable. I believe most users are using the built-in S3 filesystems at this point.
docs/ops/filesystems/index.md
Outdated
under the License. | ||
--> | ||
|
||
Apache Flink uses to consume and persistently store data, both for results of applications and for fault tolerance and recovery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache Flink uses file systems ?
docs/ops/filesystems/s3.md
Outdated
|
||
Note that these examples are *not* exhaustive and you can use S3 in other places as well, including your [high availability setup](../jobmanager_high_availability.html) or the [RocksDBStateBackend]({{ site.baseurl }}/ops/state/state_backends.html#the-rocksdbstatebackend); everywhere that Flink expects a FileSystem URI. | ||
|
||
For most use cases, you may use one of our shaded `flink-s3-fs-hadoop` and `flink-s3-fs-presto` S3filesystem wrappers which are self-contained and easy to set up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3 filesystem
You can use S3 objects like regular files by specifying paths in the following format: | ||
|
||
{% highlight plain %} | ||
s3://<your-bucket>/<endpoint> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> ?
Thanks for the review, I consolidated on "file systems" and removed the Hadoop references from AWS and OSS pages and replaced it with an extra section about configuring Hadoop on filesystems/index.md. |
What is the purpose of the change
Currently flink's filesystem documentation is spread across a number of pages without any clear connection. A non-exhaustive list of issues includes:
We should create a filesystem subsection under deployments with multiple pages containing all relevant information about Flink's filesystem abstraction.
This PR also resolves FLINK-8513 and FLINK-10249 which were minor additions to the S3 documentation.
Verifying this change
N/A
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)This does not touch S3 file system code but does touch the documentation.
Documentation