-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13327][SPARKR] Added parameter validations for colnames<- #11220
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -303,8 +303,28 @@ setMethod("colnames", | |
#' @rdname columns | ||
#' @name colnames<- | ||
setMethod("colnames<-", | ||
signature(x = "DataFrame", value = "character"), | ||
signature(x = "DataFrame"), | ||
function(x, value) { | ||
|
||
# Check parameter integrity | ||
if (class(value) != "character") { | ||
stop("Invalid column names.") | ||
} | ||
|
||
if (length(value) != ncol(x)) { | ||
stop( | ||
"Column names must have the same length as the number of columns in the dataset.") | ||
} | ||
|
||
if (any(is.na(value))) { | ||
stop("Column names cannot be NA.") | ||
} | ||
|
||
# Check if the column names have . in it | ||
if (any(regexec(".", value, fixed=TRUE)[[1]][1] != -1)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might seem rather restrictive? As this is possibly fixed by https://issues.apache.org/jira/browse/SPARK-11976 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is OK as SPARK-11976 is not fixed. This restriction can be removed later, if columns having '.' can be accessed transparently There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then I would suggest creating a new test case with colnames to check that a DataFrame created from iris would have column names with . replaced with _. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @felixcheung @sun-rui Thanks for your input. Right now if I assign column names containing "." character, any subsequent operation on the DataFrame will fail. Now, regarding @felixcheung's comment on the test case, right now there are two test cases with str() and with() expecting colnames of iris to be "Sepal_Length", ..., etc. Those will be broken when they fix SPARK-11976. No need to add more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To clarify, I think having a specific test of colnames with iris will help to realize that this check code is in place and should be removed, if and when SPARK-11976 is fixed. Otherwise only tests with str and with will be fixed. Does that make sense? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @felixcheung Not sure I follow your idea. Is this what you refer to? Note: if this test is broken, remove check for "." character on colnames<- methodexpect_equal(colnames(irisDF)[1] == "Sepal_Length")) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that is, also what you have here is also a good test for covering this case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. @sun-rui, @felixcheung. Shall we merge this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @sun-rui @felixcheung @shivaram Folks: this is a really simple thing. Shall we merge it? |
||
stop("Colum names cannot contain the '.' symbol.") | ||
} | ||
|
||
sdf <- callJMethod(x@sdf, "toDF", as.list(value)) | ||
dataFrame(sdf) | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer keeping the current signature instead of checking the class of the parameter inside the method.
@felixcheung, what's your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - letting R do type matching for the method signature seems like a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I added that check was because of this:
After I ran that, I saw this:
So looks like R automatically adds definitions of colnames<- if value is other than character.
This does not happen with coltypes<-, as it's not part of base package and doesn't have an (ANY,ANY) signature.
Therefore, I believe we do need to do this data type check inside colnames<-.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is interesting. @felixcheung, do you know the reason behind this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case the base:: definition is implemented in such a way that it should (or at least it attempts to) handle different parameter types:
So from what I can see the error you see actually comes from the base implement, as "sort of" expected. Though I'm ok if we are to add explicit checks to make the error more clear for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @felixcheung for investigating this further!