Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-2659] Support partition table by DataFrame API #2415

Closed
wants to merge 2 commits into from

Conversation

jackylk
Copy link
Contributor

@jackylk jackylk commented Jun 26, 2018

Currently only partition table is only supported by SQL, it should be supported by Spark DataFrame API also.
This PR added an option to specify the partition columns when writing a DataFrame to carbon table
For example:

    df.write
      .format("carbondata")
      .option("tableName", "carbon_df_table")
      .option("partitionColumns", "c1, c2")  // a list of column names
      .mode(SaveMode.Overwrite)
      .save()
  • Any interfaces changed?
    Added an option for DataFrame.write

  • Any backward compatibility impacted?
    No

  • Document update required?

  • Testing done
    Added one test case

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
    NA

options.getOrElse("partitionClass",
"org.apache.carbondata.processing.partition.impl.SampleDataPartitionerImpl")
}

def tempCSV: Boolean = options.getOrElse("tempCSV", "false").toBoolean
lazy val tempCSV: Boolean = options.getOrElse("tempCSV", "false").toBoolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that the 'tempCsv' option has been deprecated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

options.partitionColumns.get.map { column =>
val c = schema.fields.find(_.name.equalsIgnoreCase(column))
if (c.isEmpty) {
throw new MalformedCarbonCommandException(s"invalid partition column: $column")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing validation for duplicated column names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5387/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6561/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5395/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6568/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5468/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5471/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5475/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7105/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5881/

}

val schemaWithoutPartition = if (options.partitionColumns.isDefined) {
val fields = schema.filterNot(field => options.partitionColumns.get.contains(field.name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better check exists with equalsIgnoreCase inside filterNot instead of contains

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ravipesala
Copy link
Contributor

@jackylk Please rebase it

fix

fix comment
@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7268/

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5896/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6037/

@ravipesala
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7291/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6058/

@ravipesala
Copy link
Contributor

retest this please

@ravipesala
Copy link
Contributor

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6200/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/7826/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6551/

@ravipesala
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 8f7b594 Aug 8, 2018
asfgit pushed a commit that referenced this pull request Aug 9, 2018
Currently only partition table is only supported by SQL, it should be supported by Spark DataFrame API also.
This PR added an option to specify the partition columns when writing a DataFrame to carbon table.

This closes #2415
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants