Skip to content

Commit

Permalink
[SPARK-13137][SQL] NullPoingException in schema inference for CSV whe…
Browse files Browse the repository at this point in the history
…n the first line is empty

https://issues.apache.org/jira/browse/SPARK-13137

This PR adds a filter in schema inference so that it does not emit NullPointException.

Also, I removed `MAX_COMMENT_LINES_IN_HEADER `but instead used a monad chaining with `filter()` and `first()`.

Lastly, I simply added a newline rather than adding a new file for this so that this is covered with the original tests.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #11023 from HyukjinKwon/SPARK-13137.
  • Loading branch information
HyukjinKwon authored and rxin committed Feb 21, 2016
1 parent b6a873d commit 7eb83fe
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,6 @@ private[sql] class CSVOptions(
val ignoreLeadingWhiteSpaceFlag = getBool("ignoreLeadingWhiteSpace")
val ignoreTrailingWhiteSpaceFlag = getBool("ignoreTrailingWhiteSpace")

// Limit the number of lines we'll search for a header row that isn't comment-prefixed
val MAX_COMMENT_LINES_IN_HEADER = 10

// Parse mode flags
if (!ParseModes.isValidMode(parseMode)) {
logWarning(s"$parseMode is not a valid parse mode. Using ${ParseModes.DEFAULT}.")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,12 +154,14 @@ private[csv] class CSVRelation(
*/
private def findFirstLine(rdd: RDD[String]): String = {
if (params.isCommentSet) {
rdd.take(params.MAX_COMMENT_LINES_IN_HEADER)
.find(!_.startsWith(params.comment.toString))
.getOrElse(sys.error(s"No uncommented header line in " +
s"first ${params.MAX_COMMENT_LINES_IN_HEADER} lines"))
val comment = params.comment.toString
rdd.filter { line =>
line.trim.nonEmpty && !line.startsWith(comment)
}.first()
} else {
rdd.first()
rdd.filter { line =>
line.trim.nonEmpty
}.first()
}
}
}
Expand Down
1 change: 1 addition & 0 deletions sql/core/src/test/resources/cars.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

year,make,model,comment,blank
"2012","Tesla","S","No comment",

Expand Down

0 comments on commit 7eb83fe

Please sign in to comment.