[FLINK-12378][docs] Consolidate FileSystem Documentation #8326

sjwiesman · 2019-04-30T22:02:56Z

What is the purpose of the change

Currently flink's filesystem documentation is spread across a number of pages without any clear connection. A non-exhaustive list of issues includes:

S3 documentation spread across many pages
OSS filesystem is listed under deployments when it is an object store
deployments/filesystem.md has a lot of unrelated information

We should create a filesystem subsection under deployments with multiple pages containing all relevant information about Flink's filesystem abstraction.

This PR also resolves FLINK-8513 and FLINK-10249 which were minor additions to the S3 documentation.

Verifying this change

N/A

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)
This does not touch S3 file system code but does touch the documentation.

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

…points

…on forwarding

flinkbot · 2019-04-30T22:03:45Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❗ 3. Needs [attention] from.
- Needs attention by @fhueske [PMC]
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

sjwiesman · 2019-04-30T22:04:05Z

@flinkbot attention @fhueske

StephanEwen

Big +1 for reorganizing these docs.

Looks good in general. Here are some suggestions to polish it off:

The spelling is sometimes "filesystem" and sometimes "file system". Would be nice to consolidate all occurrences to one of these spellings.
For the S3 docs, it might be worth more prominently pointing out that the Hadoop S3 FS supports the streaming file sink, but the presto one does now.
It may also be worth pointing out that for checkpoints, we typically recommend the presto fs.

StephanEwen · 2019-05-03T07:45:06Z

docs/ops/deployment/aws.md


 {% panel **Note:** You don't have to configure this manually if you are running [Flink on EMR](#emr-elastic-mapreduce). %}

-This setup is a bit more complex and we recommend using our shaded Hadoop/Presto file systems
-instead (see above) unless required otherwise, e.g. for using S3 as YARN's resource storage dir
+Apache Flink provides native [S3 FileSystem's](../filesystems/s3.html) out of the box and we recomend using them unless required otherwise, e.g. for using S3 as YARN's resource storage dir


recomend --> recommend

"native" --> "built-in" ?

I am also wondering if we should simply drop the section below, about Hadoop's S3 file systems.
We can mention that Flink also supports Hadoop's file systems and refer to Hadoop docs for details.

I think that's reasonable. I believe most users are using the built-in S3 filesystems at this point.

StephanEwen · 2019-05-03T07:48:09Z

docs/ops/filesystems/index.md

+under the License.
+-->
+
+Apache Flink uses to consume and persistently store data, both for results of applications and for fault tolerance and recovery.


Apache Flink uses file systems ?

StephanEwen · 2019-05-03T07:51:34Z

docs/ops/filesystems/s3.md

+
+Note that these examples are *not* exhaustive and you can use S3 in other places as well, including your [high availability setup](../jobmanager_high_availability.html) or the [RocksDBStateBackend]({{ site.baseurl }}/ops/state/state_backends.html#the-rocksdbstatebackend); everywhere that Flink expects a FileSystem URI.
+
+For most use cases, you may use one of our shaded `flink-s3-fs-hadoop` and `flink-s3-fs-presto` S3filesystem wrappers which are self-contained and easy to set up.


S3 filesystem

StephanEwen · 2019-05-03T07:51:50Z

docs/ops/filesystems/s3.md

+You can use S3 objects like regular files by specifying paths in the following format:
+
+{% highlight plain %}
+s3://<your-bucket>/<endpoint>


sjwiesman · 2019-05-06T13:25:05Z

Thanks for the review, I consolidated on "file systems" and removed the Hadoop references from AWS and OSS pages and replaced it with an extra section about configuring Hadoop on filesystems/index.md.

sjwiesman added 3 commits April 30, 2019 16:58

[FLINK-12378][docs] Consolidate FileSystem Documentation

6ffdbe4

[FLINK-8513][docs] Add documentation for connecting to non-AWS S3 end…

8d58020

…points

[FLINK-10249][docs] Document hadoop/presto s3 file system configurati…

c482a4b

…on forwarding

sjwiesman changed the title ~~Flink 12378~~ [Flink-12378][docs] Consolidate FileSystem Documentation Apr 30, 2019

sjwiesman changed the title ~~[Flink-12378][docs] Consolidate FileSystem Documentation~~ [FLINK-12378][docs] Consolidate FileSystem Documentation Apr 30, 2019

rmetzger added the review=description? label Apr 30, 2019

rmetzger requested a review from fhueske April 30, 2019 22:05

rmetzger added component=Documentation component=FileSystems labels Apr 30, 2019

[FLINK-12378][docs] Consolidate FileSystem Documentation

d065e74

StephanEwen reviewed May 3, 2019

View reviewed changes

fixup! [FLINK-12378][docs] Consolidate FileSystem Documentation

837f30f

asfgit closed this in 4c0bbc4 May 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-12378][docs] Consolidate FileSystem Documentation #8326

[FLINK-12378][docs] Consolidate FileSystem Documentation #8326

sjwiesman commented Apr 30, 2019

flinkbot commented Apr 30, 2019 •

edited

sjwiesman commented Apr 30, 2019

StephanEwen left a comment

StephanEwen May 3, 2019

StephanEwen May 3, 2019

StephanEwen May 3, 2019

sjwiesman May 3, 2019

StephanEwen May 3, 2019

StephanEwen May 3, 2019

StephanEwen May 3, 2019

sjwiesman commented May 6, 2019


		Note that these examples are not exhaustive and you can use S3 in other places as well, including your [high availability setup](../jobmanager_high_availability.html) or the [RocksDBStateBackend]({{ site.baseurl }}/ops/state/state_backends.html#the-rocksdbstatebackend); everywhere that Flink expects a FileSystem URI.

		For most use cases, you may use one of our shaded `flink-s3-fs-hadoop` and `flink-s3-fs-presto` S3filesystem wrappers which are self-contained and easy to set up.

[FLINK-12378][docs] Consolidate FileSystem Documentation #8326

[FLINK-12378][docs] Consolidate FileSystem Documentation #8326

Conversation

sjwiesman commented Apr 30, 2019

What is the purpose of the change

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Apr 30, 2019 • edited

Review Progress

sjwiesman commented Apr 30, 2019

StephanEwen left a comment

Choose a reason for hiding this comment

StephanEwen May 3, 2019

Choose a reason for hiding this comment

StephanEwen May 3, 2019

Choose a reason for hiding this comment

StephanEwen May 3, 2019

Choose a reason for hiding this comment

sjwiesman May 3, 2019

Choose a reason for hiding this comment

StephanEwen May 3, 2019

Choose a reason for hiding this comment

StephanEwen May 3, 2019

Choose a reason for hiding this comment

StephanEwen May 3, 2019

Choose a reason for hiding this comment

sjwiesman commented May 6, 2019

flinkbot commented Apr 30, 2019 •

edited