Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8784] [SQL] Add Python API for hex and unhex #7181

Closed
wants to merge 6 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Jul 2, 2015

Also improve the performance of hex/unhex

@davies
Copy link
Contributor Author

davies commented Jul 2, 2015

cc @rxin @zhichao-li

* Resulting characters are returned as a byte array.
*/
case class Unhex(child: Expression)
extends UnaryExpression with AutoCastInputTypes with Serializable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use ExpectsInputTypes here? since I will remove AutoCastInputTypes.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

checkEvaluation(UnHex(Literal("737472696E67")), "string".getBytes)
checkEvaluation(UnHex(Literal("")), new Array[Byte](0))
checkEvaluation(UnHex(Literal("0")), Array[Byte](0))
checkEvaluation(Unhex(Literal("737472696E67")), "string".getBytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for null literal of string type

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36358 has started for PR 7181 at commit 1a24082.

}

// lookup table to translate '0' -> 0 ... 'F'/'f' -> 15
private[this] val unhexDigits = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move the two tables into some static field in an object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but putting them here is more clear. Only one object per Expression, I think it's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would it be more clear? are you thinking about the distance between its definition and where it is used?

this is one case where java beats scala with static fields. Ideally all the string functions should just be member functions in UTF8String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less code is better than more code. From performance's point of view, there is never an end to stop optimize it. I think we could go with something that's good enough (won't be the bottle neck).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this less code? It is just a bad idea to create unnecessary state. Just move both tables into two fields in Hex object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I absolutely disagree that "less code is better than more code" as an ethos. While it can be true in many cases, there are plenty of counter examples:

  • putting everything into a single file without any namespacing reduces the import statements, but that creates bad logical structures
  • not writing any test cases reduces the amount of code also, but that's obviously bad
  • using arcane scala features can substantially reduce code size in certain cases, at the expense of readability

In this case, I don't see how this creates less code (it takes 2 lines of code to define a scala object -- you can even put it in an existing java class like in UTF8String as a static field).

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36362 has started for PR 7181 at commit c3af78c.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36358 has finished for PR 7181 at commit 1a24082.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ExpectsInputTypes
    • trait AutoCastInputTypes
    • abstract class BinaryExpression extends Expression with trees.BinaryNode[Expression]
    • abstract class BinaryOperator extends BinaryExpression
    • abstract class BinaryArithmetic extends BinaryOperator
    • case class Unhex(child: Expression)
    • abstract class BinaryComparison extends BinaryOperator with Predicate

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@davies
Copy link
Contributor Author

davies commented Jul 2, 2015

The test failed because of ExpectsInputTypes, blocked by #7175

}
// two characters form the hex value.
while (i < bytes.length) {
if (bytes(i) < 0 || bytes(i + 1) < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would be exception on my previous logic when facing non-ascii character. Thanks for fixing this.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36362 has finished for PR 7181 at commit c3af78c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Unhex(child: Expression)

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Build triggered.

@AmplabJenkins
Copy link

Build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36416 has started for PR 7181 at commit 25156b7.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36416 has finished for PR 7181 at commit 25156b7.

  • This patch fails to build.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class Unhex(child: Expression)

@AmplabJenkins
Copy link

Build finished. Test FAILed.

@AmplabJenkins
Copy link

Build triggered.

@AmplabJenkins
Copy link

Build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36423 has started for PR 7181 at commit b31fc9a.

Davies Liu added 2 commits July 2, 2015 13:12
Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36429 has started for PR 7181 at commit f032fbb.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36423 has finished for PR 7181 at commit b31fc9a.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class Unhex(child: Expression)

@AmplabJenkins
Copy link

Build finished. Test PASSed.

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36429 has finished for PR 7181 at commit f032fbb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class CreateNamedStruct(children: Seq[Expression]) extends Expression
    • case class Unhex(child: Expression)
    • case class ShiftLeft(left: Expression, right: Expression) extends BinaryExpression
    • case class ShiftRight(left: Expression, right: Expression) extends BinaryExpression

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@rxin
Copy link
Contributor

rxin commented Jul 2, 2015

Thanks - merging this in master.

@asfgit asfgit closed this in fc7aebd Jul 2, 2015
@rxin
Copy link
Contributor

rxin commented Jul 2, 2015

I reverted the patch since it broke the build. Can you submit a new PR?

@JoshRosen
Copy link
Contributor

Was the build break caused by racing merges?

@rxin
Copy link
Contributor

rxin commented Jul 2, 2015

Not 100% sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants