Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12146] [SparkR] SparkR jsonFile should support multiple input files #10145

Closed
wants to merge 5 commits into from

Conversation

yanboliang
Copy link
Contributor

  • jsonFile should support multiple input files, such as:
jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments
jsonFile(sqlContext, “path1,path2”)
  • Meanwhile, jsonFile has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark jsonFile deprecated and use read.json at SparkR side.
  • Replace all jsonFile with read.json at test_sparkSQL.R, but still keep jsonFile test case.
  • If this PR is accepted, we should also make almost the same change for parquetFile.

cc @felixcheung @sun-rui @shivaram

@SparkQA
Copy link

SparkQA commented Dec 4, 2015

Test build #47193 has finished for PR 10145 at commit 012e7ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -206,7 +206,7 @@ setMethod("toDF", signature(x = "RDD"),
#' It goes through the entire dataset once to determine the schema.
#'
#' @param sqlContext SQLContext to use
#' @param path Path of file to read. A vector of multiple paths is allowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this change to the planned new JIRA issue about parquetFile? Let's focus this PR on jsonFile

@SparkQA
Copy link

SparkQA commented Dec 7, 2015

Test build #47264 has finished for PR 10145 at commit 4decf22.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Dec 7, 2015

@yanboliang We moved the test file locations in #10030 -- So you'll need to rebase to master branch

# Convert a string vector of paths to a string containing comma separated paths
path <- paste(path, collapse = ",")
sdf <- callJMethod(sqlContext, "jsonFile", path)
paths <- as.list(suppressWarnings(normalizePath(splitString(path))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought @sun-rui noted we should take a list or vector? In such case we should change this code to

paths <- as.list(suppressWarnings(normalizePath(path)))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@yanboliang
Copy link
Contributor Author

I found that it will complain errors if we use functions with .Deprecated after rebase master, so are we still keep the test cases for deprecated functions? Or we use suppress warning? @sun-rui @felixcheung @shivaram

2. Error: read.json()/jsonFile() on a local file returns a DataFrame -----------
(由警告转换成)'jsonFile' is deprecated.
Use 'read.json' instead.
See help("Deprecated")
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: jsonFile(sqlContext, c(jsonPath, jsonPath2)) at test_sparkSQL.R:384
5: .Deprecated("read.json")
6: warning(paste(msg, collapse = ""), call. = FALSE, domain = NA)
7: .signalSimpleWarning("'jsonFile' is deprecated.\nUse 'read.json' instead.\nSee help(\"Deprecated\")", 
       quote(NULL))
8: withRestarts({
       .Internal(.signalCondition(simpleWarning(msg, call), msg, call))
       .Internal(.dfltWarn(msg, call))
   }, muffleWarning = function() NULL)
9: withOneRestart(expr, restarts[[1L]])
10: doWithOneRestart(return(expr), restart)

@sun-rui
Copy link
Contributor

sun-rui commented Dec 8, 2015

I vote for adding suppressWarnings. And add comment for this in test cases

@felixcheung
Copy link
Member

hmm, I guess deprecation is a warning which is now getting turned into an error.
I think it's fine for the test for the deprecated function to suppresswarning

@SparkQA
Copy link

SparkQA commented Dec 8, 2015

Test build #47308 has finished for PR 10145 at commit 47c7ee1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

looks good, thanks for making these changes

@SparkQA
Copy link

SparkQA commented Dec 8, 2015

Test build #47313 has finished for PR 10145 at commit 06ae53d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sun-rui
Copy link
Contributor

sun-rui commented Dec 8, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47516 has finished for PR 10145 at commit 06ae53d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Dec 10, 2015
…etFile

SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10191 from yanboliang/spark-12198.
asfgit pushed a commit that referenced this pull request Dec 10, 2015
…etFile

SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10191 from yanboliang/spark-12198.

(cherry picked from commit eeb5872)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@shivaram
Copy link
Contributor

@yanboliang Could you bring this PR up to date with master ?

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47563 has finished for PR 10145 at commit 1d74b18.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

LGTM. Merging this to master and branch-1.6

asfgit pushed a commit that referenced this pull request Dec 11, 2015
…iles

* ```jsonFile``` should support multiple input files, such as:
```R
jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments
jsonFile(sqlContext, “path1,path2”)
```
* Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side.
* Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case.
* If this PR is accepted, we should also make almost the same change for ```parquetFile```.

cc felixcheung sun-rui shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10145 from yanboliang/spark-12146.

(cherry picked from commit 0fb9825)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@asfgit asfgit closed this in 0fb9825 Dec 11, 2015
@yanboliang yanboliang deleted the spark-12146 branch May 5, 2016 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants