Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-29028][DOCS] Add links to IBM Cloud Object Storage connector in cloud-integration.md #25737

Closed
wants to merge 2 commits into from

Conversation

dilipbiswal
Copy link
Contributor

@dilipbiswal dilipbiswal commented Sep 9, 2019

What changes were proposed in this pull request?

Add links to IBM Cloud Storage connector in cloud-integration.md

Why are the changes needed?

This page mentions the connectors to cloud providers. Currently connector to
IBM cloud storage is not specified. This PR adds the necessary links for
completeness.

Does this PR introduce any user-facing change?

Yes.

Before:
Screen Shot 2019-09-09 at 3 52 44 PM

After.

Screen Shot 2019-09-10 at 8 16 49 AM

How was this patch tested?

Tested using jykyll build --serve

@dilipbiswal
Copy link
Contributor Author

cc @srowen Please let me know your thoughts on whether this can be added ?

@@ -257,4 +257,5 @@ Here is the documentation on the standard connectors both from Apache and the cl
* [Amazon EMR File System (EMRFS)](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html). From Amazon
* [Google Cloud Storage Connector for Spark and Hadoop](https://cloud.google.com/hadoop/google-cloud-storage-connector). From Google
* [The Azure Blob Filesystem driver (ABFS)](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-abfs-driver)
* IBM Cloud Object Storage connector for Apache Spark : [Stocator,](https://github.com/CODAIT/stocator) [IBM Object Storage,](https://www.ibm.com/cloud/object-storage) [how-to-use-connector](https://developer.ibm.com/code/2018/08/16/installing-running-stocator-apache-spark-ibm-cloud-object-storage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is more or less the equivalent of "S3 connector docs" for AWS, but for the IBM cloud, it could be OK. However I think this doc is a little more about what Spark directly supports, particularly through hadoop-cloud. (In any event I think you need to fix the anchors? they have commas in them.) Would this be more appropriate at https://github.com/apache/spark-website/blob/asf-site/third-party-projects.md ? it seems to refer to a third-party integration, not first-party cloud docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Thank you very much for your quick response. So here, i was trying to model after how Google Cloud Storage Connector is specified in this section.

In any event I think you need to fix the anchors? they have commas in them.

So Sean, here i had three links 1) To the connector 2) IBM cloud storage 3) A devworks ariticle that ties them together. So i had them separated by comma. Should i just remove the commas and have just a space as a separator ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I am wondering first whether this is the right place. hadoop-cloud and thus Spark doesn't have special support for this connector, and that's what this doc is about.

I'm also just noting that it seemed odd to put the comma within the hyperlinked text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Thanks.. I am not a 100% sure about whether this is the right place :-). Could you please help me understand how Google Cloud Storage Connector for Spark and Hadoop is placed here ? When i click here and navigate to the connector link .. i end up here `https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs which is the connector for Google cloud storage which i thought is similar to the stocator link ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this place is fine, after re-reading the doc. It is a more general reference. I would just fix the links a bit. [...](...), not [...,](...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Thank you. I have updated per your advice. I have also update the screen-shot.

@SparkQA
Copy link

SparkQA commented Sep 9, 2019

Test build #110375 has finished for PR 25737 at commit 8f9da49.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -257,4 +257,5 @@ Here is the documentation on the standard connectors both from Apache and the cl
* [Amazon EMR File System (EMRFS)](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-fs.html). From Amazon
* [Google Cloud Storage Connector for Spark and Hadoop](https://cloud.google.com/hadoop/google-cloud-storage-connector). From Google
* [The Azure Blob Filesystem driver (ABFS)](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-abfs-driver)
* IBM Cloud Object Storage connector for Apache Spark : [Stocator,](https://github.com/CODAIT/stocator) [IBM Object Storage,](https://www.ibm.com/cloud/object-storage) [how-to-use-connector](https://developer.ibm.com/code/2018/08/16/installing-running-stocator-apache-spark-ibm-cloud-object-storage)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it better to add . From IBM like Amazon and Google?

@SparkQA
Copy link

SparkQA commented Sep 10, 2019

Test build #110420 has finished for PR 25737 at commit dc6dd1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen srowen closed this in 7309e02 Sep 10, 2019
@srowen
Copy link
Member

srowen commented Sep 10, 2019

Merged to master

@dilipbiswal
Copy link
Contributor Author

Thank you very much @srowen

PavithraRamachandran pushed a commit to PavithraRamachandran/spark that referenced this pull request Sep 15, 2019
…n cloud-integration.md

### What changes were proposed in this pull request?
Add links to IBM Cloud Storage connector in cloud-integration.md

### Why are the changes needed?
This page mentions the connectors to cloud providers.  Currently connector to
IBM cloud storage is not specified. This PR adds the necessary links for
completeness.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
<img width="1234" alt="Screen Shot 2019-09-09 at 3 52 44 PM" src="https://user-images.githubusercontent.com/14225158/64571863-11a2c080-d31a-11e9-82e3-78c02675adb9.png">

**After.**

<img width="1234" alt="Screen Shot 2019-09-10 at 8 16 49 AM" src="https://user-images.githubusercontent.com/14225158/64626857-663e4e00-d3a3-11e9-8fa3-15ebf52ea832.png">

### How was this patch tested?
Tested using jykyll build --serve

Closes apache#25737 from dilipbiswal/ibm-cloud-storage.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants