Updates list of built-in IO transforms#394
Conversation
|
retest this please |
melap
left a comment
There was a problem hiding this comment.
Thanks for these updates! A few minor comments below.
src/documentation/io/built-in.md
Outdated
| <td>Java</td> | ||
| <td> | ||
| <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system">Apache Hadoop File System</a></p> | ||
| <p>Beam Java supports HDFS, S3, GCS and local filesystems out of the |
There was a problem hiding this comment.
By "out of the box", is that saying "you don't need to specify a dependency"? Could we simplify it to:
Apache HDFS, Amazon S3, Google Cloud Storage, and local filesystems (no dependency needed)
I think this whole table is overdue for an overhaul -- perhaps as one IO per line that checks off what languages support it and lists the required dependency if applicable. I'll put that on my list.
There was a problem hiding this comment.
Done (but without "no dependency needed": I'm not aware of anyone developing a filesystem support for Beam for which a dependency is needed)
src/documentation/io/built-in.md
Outdated
| <p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system">Apache Hadoop File System</a></p> | ||
| <p>Beam Java supports HDFS, S3, GCS and local filesystems out of the | ||
| box.</p> | ||
| <p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java">FileIO</a> (general-purpose reading, writing and matching of files)</p> |
src/documentation/io/built-in.md
Outdated
| <tr> | ||
| <td>Python</td> | ||
| <td> | ||
| <p>Beam Python supports GCS and local filesystems out of the box. Support for HDFS is in progress (<a href="https://issues.apache.org/jira/browse/BEAM-3099">BEAM-3099</a>).</p> |
There was a problem hiding this comment.
Similarly, though not sure about the equivalent dependency statement for Python?
Google Cloud Storage and local filesystems
src/documentation/io/built-in.md
Outdated
| <tr> | ||
| <td>Python</td> | ||
| <td> | ||
| <p>Beam Python supports GCS and local filesystems out of the box. Support for HDFS is in progress (<a href="https://issues.apache.org/jira/browse/BEAM-3099">BEAM-3099</a>).</p> |
There was a problem hiding this comment.
Apache HDFS
Should this second part be added to the table below instead?
src/documentation/io/built-in.md
Outdated
| <tr> | ||
| <td>Python</td> | ||
| <td> | ||
| <p>Beam Python supports GCS and local filesystems out of the box. Support for HDFS is in progress (<a href="https://issues.apache.org/jira/browse/BEAM-3099">BEAM-3099</a>).</p> |
There was a problem hiding this comment.
Should we add Variant Call Format (VCF) (I think added in 2.3.0?) for Python?
|
Thanks, PTAL. |
|
@asfgit merge |
Some new ones have been added, some in-progress ones have been finished, some aren't being worked on (JSON).