Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page #32204

Closed
wants to merge 9 commits into from

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Apr 16, 2021

What changes were proposed in this pull request?

This PR proposes move JSON data source options from Python, Scala and Java into a single page.

Why are the changes needed?

So far, the documentation for JSON data source options is separated into different pages for each language API documents. However, this makes managing many options inconvenient, so it is efficient to manage all options in a single page and provide a link to that page in the API of each language.

Does this PR introduce any user-facing change?

Yes, the documents will be shown below after this change:

  • "JSON Files" page

Screen Shot 2021-05-20 at 8 48 27 PM

  • Python

Screen Shot 2021-04-16 at 5 04 11 PM

  • Scala

Screen Shot 2021-04-16 at 5 04 54 PM

  • Java

Screen Shot 2021-04-16 at 5 06 11 PM

How was this patch tested?

Manually build docs and confirm the page.

@github-actions
Copy link

Test build #754872701 for PR 32204 at commit 89d9be1.

@SparkQA
Copy link

SparkQA commented Apr 16, 2021

Test build #137474 has finished for PR 32204 at commit 89d9be1.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 16, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42050/

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's address the same comments in #32161. cc @MaxGekk too

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137548 has finished for PR 32204 at commit 0a5412c.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42123/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42123/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42126/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42126/

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137551 has finished for PR 32204 at commit c31c6f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137573 has finished for PR 32204 at commit 89d9be1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 19, 2021

Test build #137597 has finished for PR 32204 at commit c31c6f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-34494] Move JSON data source options from Python and Scala into a single page [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page Apr 20, 2021
@SparkQA
Copy link

SparkQA commented Apr 29, 2021

Test build #138063 has finished for PR 32204 at commit 538b4cc.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42582/

@SparkQA
Copy link

SparkQA commented Apr 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42582/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Test build #138344 has finished for PR 32204 at commit cbecf61.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42866/

@SparkQA
Copy link

SparkQA commented May 11, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42866/

@HyukjinKwon
Copy link
Member

HyukjinKwon commented May 13, 2021

@itholic:

  1. Please check the option one by one and see if each exists, and is matched.
  2. Document general options in https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html if there are missing ones
  3. If you're going to do 2. separately in another PR and JIRA, don't remove general options in API documentations for now.

@SparkQA
Copy link

SparkQA commented May 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43275/

@SparkQA
Copy link

SparkQA commented May 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43275/

@SparkQA
Copy link

SparkQA commented May 20, 2021

Test build #138741 has finished for PR 32204 at commit f4d9843.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 20, 2021

Test build #138753 has finished for PR 32204 at commit e3bf606.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 21, 2021

Test build #138778 has started for PR 32204 at commit 2379a6d.

@SparkQA
Copy link

SparkQA commented May 21, 2021

Test build #138784 has started for PR 32204 at commit a10586c.

@SparkQA
Copy link

SparkQA commented May 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43302/

@SparkQA
Copy link

SparkQA commented May 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43302/

@HyukjinKwon
Copy link
Member

Merged to master.

<tr>
<td><code>encoding</code></td>
<td>None</td>
<td>For reading, allows to forcibly set one of standard basic or extended encoding for the JSON files. For example UTF-16BE, UTF-32LE. If None is set, the encoding of input JSON will be detected automatically when the multiLine option is set to <code>true</code>. For writing, Specifies encoding (charset) of saved json files. If None is set, the default UTF-8 charset will be used.</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fix the docs properly from None to something else. That only applies to Python side.

* `DataFrameReader`
* `DataFrameWriter`
* `DataStreamReader`
* `DataStreamWriter`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add JSON functions here too

* `DataFrameReader`
* `DataFrameWriter`
* `DataStreamReader`
* `DataStreamWriter`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also mention:

* `OPTIONS` clause at [CREATE TABLE USING DATA_SOURCE](sql-ref-syntax-ddl-create-table-datasource.html)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants