Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5123][SQL] Expose only one version of the data type APIs (i.e. remove Java-specific APIs) #3925

Closed
wants to merge 10 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Jan 7, 2015

Having two versions of the data type APIs (one for Java, one for Scala) requires downstream libraries to also have two versions of the APIs if the library wants to support both Java and Scala. I took a look at the Scala version of the data type APIs - it can actually work out pretty well for Java out of the box.

As part of the PR, I created a sql.types package and moved all type definitions there. I then removed the Java specific data type API along with a lot of the conversion code.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25141 has started for PR 3925 at commit 8eb5dc9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25141 has finished for PR 3925 at commit 8eb5dc9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25141/
Test FAILed.

@pwendell
Copy link
Contributor

pwendell commented Jan 7, 2015

Is this something where we need to make sure to include upgrade details in the release notes?

@rxin
Copy link
Contributor Author

rxin commented Jan 7, 2015

Yes - is there a way to label these things in JIRA?

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25142 has started for PR 3925 at commit 4674585.

  • This patch merges cleanly.

@rxin
Copy link
Contributor Author

rxin commented Jan 7, 2015

cc @marmbrus, @yhuai for SQL changes, and @mengxr, @jkbradley for MLlib changes ...

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25142 has finished for PR 3925 at commit 4674585.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25142/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25145 has started for PR 3925 at commit a2bb038.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25145 has finished for PR 3925 at commit a2bb038.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25145/
Test FAILed.

}
} else {
// ISO8601 with GMT insert
val ISO8601GMT: SimpleDateFormat = new SimpleDateFormat( "yyyy-MM-dd'T'HH:mm:ss.SSSz" )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make ISO8601GMT this as thread local? or leave a TODO for future improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? I don't think I changed this. Simply copied it from a file that was deleted.

@chenghao-intel
Copy link
Contributor

That's a very cool idea to make a unified DataType API for jvm-based language. Less code maintenance will be made!

/**
* :: DeveloperApi ::
*
* The data type representing `NULL` values. Please use the singleton [[DataTypes.NullType]].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are scala users expected to use DataTypes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure why not ....

@yhuai
Copy link
Contributor

yhuai commented Jan 7, 2015

SQL changes look good to me.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25169 has started for PR 3925 at commit a1c1864.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25170 has started for PR 3925 at commit 0cc1a5d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25169 has finished for PR 3925 at commit a1c1864.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25169/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25170 has finished for PR 3925 at commit 0cc1a5d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25170/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25173 has started for PR 3925 at commit 5b77c6a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25173 has finished for PR 3925 at commit 5b77c6a.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25173/
Test FAILed.

* // nonExisting: StructField = null
*
* // Extract multiple StructFields. Field names are provided in a set.
* // A StructType object will be returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the "Preserve the original order of fields." comment in the apply() method be moved to the doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25184 has started for PR 3925 at commit c130913.

  • This patch merges cleanly.

@jkbradley
Copy link
Member

@rxin MLlib changes look fine to me, and FWIW the other parts did too. My only remaining comments are ordering imports, but I'll leave those out for now. LGTM

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25184 has finished for PR 3925 at commit c130913.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25184/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25193 has started for PR 3925 at commit e98a7c0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25193 has finished for PR 3925 at commit e98a7c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25193/
Test PASSed.

@@ -93,7 +92,6 @@ class HiveInspectorSuite extends FunSuite with HiveInspectors {
val row = data.map(_.eval(null))
val dataTypes = data.map(_.dataType)

import scala.collection.JavaConversions._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code in Spark SQL uses JavaConversions... I don't think performance is an issue here or anything.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25260 has started for PR 3925 at commit 603919f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 8, 2015

Test build #25260 has finished for PR 3925 at commit 603919f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25260/
Test PASSed.

@rxin
Copy link
Contributor Author

rxin commented Jan 8, 2015

See #3958

@rxin rxin closed this Jan 8, 2015
asfgit pushed a commit that referenced this pull request Jan 14, 2015
Having two versions of the data type APIs (one for Java, one for Scala) requires downstream libraries to also have two versions of the APIs if the library wants to support both Java and Scala. I took a look at the Scala version of the data type APIs - it can actually work out pretty well for Java out of the box.

As part of the PR, I created a sql.types package and moved all type definitions there. I then removed the Java specific data type API along with a lot of the conversion code.

This subsumes #3925

Author: Reynold Xin <rxin@databricks.com>

Closes #3958 from rxin/SPARK-5123-datatype-2 and squashes the following commits:

66505cc [Reynold Xin] [SPARK-5123] Expose only one version of the data type APIs (i.e. remove the Java-specific API).
@@ -15,77 +15,74 @@
* limitations under the License.
*/

package org.apache.spark.sql.api.java;
package org.apache.spark.sql.types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a java source file, should we move it to sql/catalyst/src/main/java ?

asfgit pushed a commit that referenced this pull request Jan 18, 2015
Follow up of #3925
/cc rxin

Author: scwf <wangfei1@huawei.com>

Closes #4095 from scwf/sql-doc and squashes the following commits:

97e311b [scwf] update sql doc since now expose only one version of the data type APIs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants