-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12146] [SparkR] SparkR jsonFile should support multiple input files #10145
Conversation
Test build #47193 has finished for PR 10145 at commit
|
@@ -206,7 +206,7 @@ setMethod("toDF", signature(x = "RDD"), | |||
#' It goes through the entire dataset once to determine the schema. | |||
#' | |||
#' @param sqlContext SQLContext to use | |||
#' @param path Path of file to read. A vector of multiple paths is allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this change to the planned new JIRA issue about parquetFile? Let's focus this PR on jsonFile
Test build #47264 has finished for PR 10145 at commit
|
@yanboliang We moved the test file locations in #10030 -- So you'll need to rebase to master branch |
# Convert a string vector of paths to a string containing comma separated paths | ||
path <- paste(path, collapse = ",") | ||
sdf <- callJMethod(sqlContext, "jsonFile", path) | ||
paths <- as.list(suppressWarnings(normalizePath(splitString(path)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought @sun-rui noted we should take a list or vector? In such case we should change this code to
paths <- as.list(suppressWarnings(normalizePath(path)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
I found that it will complain errors if we use functions with 2. Error: read.json()/jsonFile() on a local file returns a DataFrame -----------
(由警告转换成)'jsonFile' is deprecated.
Use 'read.json' instead.
See help("Deprecated")
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: jsonFile(sqlContext, c(jsonPath, jsonPath2)) at test_sparkSQL.R:384
5: .Deprecated("read.json")
6: warning(paste(msg, collapse = ""), call. = FALSE, domain = NA)
7: .signalSimpleWarning("'jsonFile' is deprecated.\nUse 'read.json' instead.\nSee help(\"Deprecated\")",
quote(NULL))
8: withRestarts({
.Internal(.signalCondition(simpleWarning(msg, call), msg, call))
.Internal(.dfltWarn(msg, call))
}, muffleWarning = function() NULL)
9: withOneRestart(expr, restarts[[1L]])
10: doWithOneRestart(return(expr), restart) |
I vote for adding suppressWarnings. And add comment for this in test cases |
hmm, I guess deprecation is a warning which is now getting turned into an error. |
4decf22
to
47c7ee1
Compare
Test build #47308 has finished for PR 10145 at commit
|
looks good, thanks for making these changes |
Test build #47313 has finished for PR 10145 at commit
|
LGTM |
Test build #47516 has finished for PR 10145 at commit
|
…etFile SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10191 from yanboliang/spark-12198.
…etFile SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10191 from yanboliang/spark-12198. (cherry picked from commit eeb5872) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
@yanboliang Could you bring this PR up to date with master ? |
06ae53d
to
1d74b18
Compare
Test build #47563 has finished for PR 10145 at commit
|
LGTM. Merging this to master and branch-1.6 |
…iles * ```jsonFile``` should support multiple input files, such as: ```R jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments jsonFile(sqlContext, “path1,path2”) ``` * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side. * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case. * If this PR is accepted, we should also make almost the same change for ```parquetFile```. cc felixcheung sun-rui shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10145 from yanboliang/spark-12146. (cherry picked from commit 0fb9825) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
jsonFile
should support multiple input files, such as:jsonFile
has been deprecated by Spark SQL and will be removed at Spark 2.0. So we markjsonFile
deprecated and useread.json
at SparkR side.jsonFile
withread.json
at test_sparkSQL.R, but still keep jsonFile test case.parquetFile
.cc @felixcheung @sun-rui @shivaram