Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3572] [SQL] Internal API for User-Defined Types #3063

Closed
wants to merge 48 commits into from

Conversation

marmbrus
Copy link
Contributor

@marmbrus marmbrus commented Nov 2, 2014

This PR adds User-Defined Types (UDTs) to SQL. It is a precursor to using SchemaRDD as a Dataset for the new MLlib API. Currently, the UDT API is private since there is incomplete support (e.g., no Java or Python support yet).

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22777 has started for PR 3063 at commit a7888b0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22778 has started for PR 3063 at commit e369b91.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22778 has finished for PR 3063 at commit e369b91.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • // in some cases, such as when a class is enclosed in an object (in which case
    • abstract class GenericStrategy[PhysicalPlan <: TreeNode[PhysicalPlan]] extends Logging
    • abstract class UserDefinedType[UserType] extends DataType with Serializable
    • public abstract class UserDefinedType<UserType> extends DataType implements Serializable
    • trait RunnableCommand extends logical.Command
    • case class ExecutedCommand(cmd: RunnableCommand) extends SparkPlan
    • protected case class Keyword(str: String)
    • sys.error(s"Failed to load class for data source: $provider")
    • case class EqualTo(attribute: String, value: Any) extends Filter
    • case class GreaterThan(attribute: String, value: Any) extends Filter
    • case class GreaterThanOrEqual(attribute: String, value: Any) extends Filter
    • case class LessThan(attribute: String, value: Any) extends Filter
    • case class LessThanOrEqual(attribute: String, value: Any) extends Filter
    • trait RelationProvider
    • abstract class BaseRelation
    • abstract class TableScan extends BaseRelation
    • abstract class PrunedScan extends BaseRelation
    • abstract class PrunedFilteredScan extends BaseRelation

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22778/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22783 has started for PR 3063 at commit 46a3aee.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22786 has started for PR 3063 at commit 7ccfc0d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22777 has finished for PR 3063 at commit a7888b0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Params(
    • // in some cases, such as when a class is enclosed in an object (in which case
    • abstract class GenericStrategy[PhysicalPlan <: TreeNode[PhysicalPlan]] extends Logging
    • abstract class UserDefinedType[UserType] extends DataType with Serializable
    • public abstract class UserDefinedType<UserType> extends DataType implements Serializable
    • trait RunnableCommand extends logical.Command
    • case class ExecutedCommand(cmd: RunnableCommand) extends SparkPlan
    • protected case class Keyword(str: String)
    • sys.error(s"Failed to load class for data source: $provider")
    • case class EqualTo(attribute: String, value: Any) extends Filter
    • case class GreaterThan(attribute: String, value: Any) extends Filter
    • case class GreaterThanOrEqual(attribute: String, value: Any) extends Filter
    • case class LessThan(attribute: String, value: Any) extends Filter
    • case class LessThanOrEqual(attribute: String, value: Any) extends Filter
    • trait RelationProvider
    • abstract class BaseRelation
    • abstract class TableScan extends BaseRelation
    • abstract class PrunedScan extends BaseRelation
    • abstract class PrunedFilteredScan extends BaseRelation

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22777/
Test FAILed.

@jkbradley
Copy link
Member

Oops, those test failure are from the MLlib Dataset tests being left in, even though I removed the UDTs. (That was part of the commit which I couldn't push to Github earlier.) Basically, you can remove all of the mllib/ changes in this PR.

@marmbrus
Copy link
Contributor Author

marmbrus commented Nov 3, 2014

The failure is stale, the most recent push is still running.

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22783 has finished for PR 3063 at commit 46a3aee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ExecutorLostFailure(execId: String) extends TaskFailedReason
    • // in some cases, such as when a class is enclosed in an object (in which case
    • abstract class GenericStrategy[PhysicalPlan <: TreeNode[PhysicalPlan]] extends Logging
    • abstract class UserDefinedType[UserType] extends DataType with Serializable
    • public abstract class UserDefinedType<UserType> extends DataType implements Serializable
    • trait RunnableCommand extends logical.Command
    • case class ExecutedCommand(cmd: RunnableCommand) extends SparkPlan
    • protected case class Keyword(str: String)
    • sys.error(s"Failed to load class for data source: $provider")
    • case class EqualTo(attribute: String, value: Any) extends Filter
    • case class GreaterThan(attribute: String, value: Any) extends Filter
    • case class GreaterThanOrEqual(attribute: String, value: Any) extends Filter
    • case class LessThan(attribute: String, value: Any) extends Filter
    • case class LessThanOrEqual(attribute: String, value: Any) extends Filter
    • trait RelationProvider
    • abstract class BaseRelation
    • abstract class TableScan extends BaseRelation
    • abstract class PrunedScan extends BaseRelation
    • abstract class PrunedFilteredScan extends BaseRelation

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22783/
Test PASSed.

jkbradley added a commit that referenced this pull request Nov 3, 2014
This PR adds User-Defined Types (UDTs) to SQL. It is a precursor to using SchemaRDD as a Dataset for the new MLlib API. Currently, the UDT API is private since there is incomplete support (e.g., no Java or Python support yet).

Author: Joseph K. Bradley <joseph@databricks.com>
Author: Michael Armbrust <michael@databricks.com>
Author: Xiangrui Meng <meng@databricks.com>

Closes #3063 from marmbrus/udts and squashes the following commits:

7ccfc0d [Michael Armbrust] remove println
46a3aee [Michael Armbrust] Slightly easier to read test output.
6cc434d [Michael Armbrust] Recursively convert rows.
e369b91 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into udts
15c10a6 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into sql-udt2
f3c72fe [Joseph K. Bradley] Fixing merge
e13cd8a [Joseph K. Bradley] Removed Vector UDTs
5817b2b [Joseph K. Bradley] style edits
30ce5b2 [Joseph K. Bradley] updates based on code review
d063380 [Joseph K. Bradley] Cleaned up Java UDT Suite, and added warning about element ordering when creating schema from Java Bean
a571bb6 [Joseph K. Bradley] Removed old UDT code (registry and Java UDTs).  Cleaned up other code.  Extended JavaUserDefinedTypeSuite
6fddc1c [Joseph K. Bradley] Made MyLabeledPoint into a Java Bean
20630bc [Joseph K. Bradley] fixed scalastyle
fa86b20 [Joseph K. Bradley] Removed Java UserDefinedType, and made UDTs private[spark] for now
8de957c [Joseph K. Bradley] Modified UserDefinedType to store Java class of user type so that registerUDT takes only the udt argument.
8b242ea [Joseph K. Bradley] Fixed merge error after last merge.  Note: Last merge commit also removed SQL UDT examples from mllib.
7f29656 [Joseph K. Bradley] Moved udt case to top of all matches.  Small cleanups
b028675 [Xiangrui Meng] allow any type in UDT
4500d8a [Xiangrui Meng] update example code
87264a5 [Xiangrui Meng] remove debug code
3143ac3 [Xiangrui Meng] remove unnecessary changes
cfbc321 [Xiangrui Meng] support UDT in parquet
db16139 [Joseph K. Bradley] Added more doc for UserDefinedType.  Removed unused code in Suite
759af7a [Joseph K. Bradley] Added more doc to UserDefineType
63626a4 [Joseph K. Bradley] Updated ScalaReflectionsSuite per @marmbrus suggestions
51e5282 [Joseph K. Bradley] fixed 1 test
f025035 [Joseph K. Bradley] Cleanups before PR.  Added new tests
85872f6 [Michael Armbrust] Allow schema calculation to be lazy, but ensure its available on executors.
dff99d6 [Joseph K. Bradley] Added UDTs for Vectors in MLlib, plus DatasetExample using the UDTs
cd60cb4 [Joseph K. Bradley] Trying to get other SQL tests to run
34a5831 [Joseph K. Bradley] Added MLlib dependency on SQL.
e1f7b9c [Joseph K. Bradley] blah
2f40c02 [Joseph K. Bradley] renamed UDT types
3579035 [Joseph K. Bradley] udt annotation now working
b226b9e [Joseph K. Bradley] Changing UDT to annotation
fea04af [Joseph K. Bradley] more cleanups
964b32e [Joseph K. Bradley] some cleanups
893ee4c [Joseph K. Bradley] udt finallly working
50f9726 [Joseph K. Bradley] udts
04303c9 [Joseph K. Bradley] udts
39f8707 [Joseph K. Bradley] removed old udt suite
273ac96 [Joseph K. Bradley] basic UDT is working, but deserialization has yet to be done
8bebf24 [Joseph K. Bradley] commented out convertRowToScala for debugging
53de70f [Joseph K. Bradley] more udts...
982c035 [Joseph K. Bradley] still working on UDTs
19b2f60 [Joseph K. Bradley] still working on UDTs
0eaeb81 [Joseph K. Bradley] Still working on UDTs
105c5a3 [Joseph K. Bradley] Adding UserDefinedType to SQL, not done yet.
@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22786 has finished for PR 3063 at commit 7ccfc0d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class ExecutorLostFailure(execId: String) extends TaskFailedReason
    • // in some cases, such as when a class is enclosed in an object (in which case
    • abstract class GenericStrategy[PhysicalPlan <: TreeNode[PhysicalPlan]] extends Logging
    • abstract class UserDefinedType[UserType] extends DataType with Serializable
    • public abstract class UserDefinedType<UserType> extends DataType implements Serializable
    • trait RunnableCommand extends logical.Command
    • case class ExecutedCommand(cmd: RunnableCommand) extends SparkPlan
    • protected case class Keyword(str: String)
    • sys.error(s"Failed to load class for data source: $provider")
    • case class EqualTo(attribute: String, value: Any) extends Filter
    • case class GreaterThan(attribute: String, value: Any) extends Filter
    • case class GreaterThanOrEqual(attribute: String, value: Any) extends Filter
    • case class LessThan(attribute: String, value: Any) extends Filter
    • case class LessThanOrEqual(attribute: String, value: Any) extends Filter
    • trait RelationProvider
    • abstract class BaseRelation
    • abstract class TableScan extends BaseRelation
    • abstract class PrunedScan extends BaseRelation
    • abstract class PrunedFilteredScan extends BaseRelation

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22786/
Test PASSed.

@marmbrus
Copy link
Contributor Author

marmbrus commented Nov 3, 2014

Not sure why this didn't get auto closed...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants