Skip to content

Updates list of built-in IO transforms#394

Closed
jkff wants to merge 1 commit intoapache:asf-sitefrom
jkff:builtin-io
Closed

Updates list of built-in IO transforms#394
jkff wants to merge 1 commit intoapache:asf-sitefrom
jkff:builtin-io

Conversation

@jkff
Copy link

@jkff jkff commented Feb 26, 2018

Some new ones have been added, some in-progress ones have been finished, some aren't being worked on (JSON).

@jkff jkff requested a review from melap February 26, 2018 20:33
@melap
Copy link

melap commented Feb 26, 2018

retest this please

@melap melap self-assigned this Feb 26, 2018
@melap
Copy link

melap commented Feb 26, 2018

Copy link

@melap melap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these updates! A few minor comments below.

<td>Java</td>
<td>
<p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system">Apache Hadoop File System</a></p>
<p>Beam Java supports HDFS, S3, GCS and local filesystems out of the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "out of the box", is that saying "you don't need to specify a dependency"? Could we simplify it to:
Apache HDFS, Amazon S3, Google Cloud Storage, and local filesystems (no dependency needed)

I think this whole table is overdue for an overhaul -- perhaps as one IO per line that checks off what languages support it and lists the required dependency if applicable. I'll put that on my list.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (but without "no dependency needed": I'm not aware of anyone developing a filesystem support for Beam for which a dependency is needed)

<p><a href="https://github.com/apache/beam/tree/master/sdks/java/io/hadoop-file-system">Apache Hadoop File System</a></p>
<p>Beam Java supports HDFS, S3, GCS and local filesystems out of the
box.</p>
<p><a href="https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java">FileIO</a> (general-purpose reading, writing and matching of files)</p>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comma: writing,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<tr>
<td>Python</td>
<td>
<p>Beam Python supports GCS and local filesystems out of the box. Support for HDFS is in progress (<a href="https://issues.apache.org/jira/browse/BEAM-3099">BEAM-3099</a>).</p>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, though not sure about the equivalent dependency statement for Python?
Google Cloud Storage and local filesystems

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

<tr>
<td>Python</td>
<td>
<p>Beam Python supports GCS and local filesystems out of the box. Support for HDFS is in progress (<a href="https://issues.apache.org/jira/browse/BEAM-3099">BEAM-3099</a>).</p>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache HDFS
Should this second part be added to the table below instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moved.

<tr>
<td>Python</td>
<td>
<p>Beam Python supports GCS and local filesystems out of the box. Support for HDFS is in progress (<a href="https://issues.apache.org/jira/browse/BEAM-3099">BEAM-3099</a>).</p>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add Variant Call Format (VCF) (I think added in 2.3.0?) for Python?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@melap melap assigned jkff and unassigned melap Feb 26, 2018
@jkff
Copy link
Author

jkff commented Feb 27, 2018

Thanks, PTAL.

@jkff jkff assigned melap and unassigned jkff Feb 27, 2018
Copy link

@melap melap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@melap
Copy link

melap commented Feb 27, 2018

@asfgit merge

@asfgit asfgit closed this in d77143a Feb 27, 2018
@jkff jkff deleted the builtin-io branch February 27, 2018 19:31
robertwb pushed a commit to robertwb/incubator-beam that referenced this pull request Jun 5, 2018
charlesccychen pushed a commit to cosmoskitten/beam that referenced this pull request Jun 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants