New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-2659] Support partition table by DataFrame API #2415
Conversation
options.getOrElse("partitionClass", | ||
"org.apache.carbondata.processing.partition.impl.SampleDataPartitionerImpl") | ||
} | ||
|
||
def tempCSV: Boolean = options.getOrElse("tempCSV", "false").toBoolean | ||
lazy val tempCSV: Boolean = options.getOrElse("tempCSV", "false").toBoolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember that the 'tempCsv' option has been deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
options.partitionColumns.get.map { column => | ||
val c = schema.fields.find(_.name.equalsIgnoreCase(column)) | ||
if (c.isEmpty) { | ||
throw new MalformedCarbonCommandException(s"invalid partition column: $column") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing validation for duplicated column names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5387/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6561/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5395/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6568/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5468/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5471/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5475/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7105/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5881/ |
} | ||
|
||
val schemaWithoutPartition = if (options.partitionColumns.isDefined) { | ||
val fields = schema.filterNot(field => options.partitionColumns.get.contains(field.name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better check exists
with equalsIgnoreCase inside filterNot
instead of contains
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@jackylk Please rebase it |
ca5feb2
to
4a26a2c
Compare
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7268/ |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5896/ |
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6037/ |
retest this please |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7291/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6058/ |
retest this please |
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6200/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7826/ |
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6551/ |
LGTM |
Currently only partition table is only supported by SQL, it should be supported by Spark DataFrame API also. This PR added an option to specify the partition columns when writing a DataFrame to carbon table. This closes #2415
Currently only partition table is only supported by SQL, it should be supported by Spark DataFrame API also.
This PR added an option to specify the partition columns when writing a DataFrame to carbon table
For example:
Any interfaces changed?
Added an option for DataFrame.write
Any backward compatibility impacted?
No
Document update required?
Testing done
Added one test case
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA