Skip to content

Conversation

@vectorijk
Copy link
Contributor

No description provided.

@SparkQA
Copy link

SparkQA commented Dec 30, 2015

Test build #48484 has finished for PR 10527 at commit 9edeba7.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class AesDecrypt(left: Expression, right: Expression)

@vectorijk
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 31, 2015

Test build #48511 has finished for PR 10527 at commit 9edeba7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class AesDecrypt(left: Expression, right: Expression)

@vectorijk vectorijk force-pushed the spark-12567 branch 2 times, most recently from 27693b4 to 0558bf8 Compare January 5, 2016 08:07
@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48748 has finished for PR 10527 at commit 0558bf8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class AesEncrypt(left: Expression, right: Expression)
    • case class AesDecrypt(left: Expression, right: Expression)

@SparkQA
Copy link

SparkQA commented Jan 5, 2016

Test build #48744 has finished for PR 10527 at commit 27693b4.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • case class AesDecrypt(left: Expression, right: Expression)

@vectorijk
Copy link
Contributor Author

cc @cloud-fan @marmbrus @davies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to with Serializable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48890 has finished for PR 10527 at commit ed38390.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when exception happens, codegen version will throw it, and the interpreted version will return null?

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48943 has finished for PR 10527 at commit 9476822.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does only this test need extra installation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Oracle JRE/JDK supports AES-128 out of the box. AES-192 and AES-256 are supported if Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files are installed. In this test case, key size is 256 bits. We should install JCE not only with this test also any test with 192/256 bits key. I commented out this test by default in case this test fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @cloud-fan?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can introduce a feature that may require extra installation... cc @rxin @marmbrus

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be okay, as long as the failure is clear and it doesn't break other things when its not installed. What happens when its not installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmbrus When Java Cryptography Extension (JCE) not installed under oracle jdk, it will return null and will not break other things. Plus, we don't need to worry about installing this extension under openjdk, it works fine with openjdk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan
Copy link
Contributor

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50212 has finished for PR 10527 at commit 9476822.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vectorijk
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 28, 2016

Test build #50243 has finished for PR 10527 at commit 9476822.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should follow the style of other classes in this file, i.e. use double quotation marks and \n

@vectorijk
Copy link
Contributor Author

@cloud-fan Okay, addressed comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure how we should deal with errors, return null or throw exception? cc @marmbrus

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked some similar expressions, Encode/Decode will throw exception if its charset string parameter is invalid, Sha2 will return null if its bit length parameter is invalid.

For AesEncrypt, what we should do if its secret key parameter is invalid? @vectorijk are there any other errors will be thrown during execution?

cc @rxin , what's the general rule for this kind of stuff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not during construction, but execution. The parameters may be non-foldable and we can only know it's valid or not during execution. BTW the encrypting process may fail and throw exception, should we silently ignore it and return null, or rethrow the exception and fail the execution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say in this case we should rethrow the exception to fail everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan What I have implemented was trying to follow the way Sha2 did.
I think if secret key is invalid. It will return null just like the way Sha2 did. There is no other errors should be thrown during execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, you followed Sha2, but Sha2 may also be wrong. In this case we think rethrow exception makes more sense, and we can also fix Sha2 later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan okay, i will try to rethrow exception as Encode/Decode does.

@SparkQA
Copy link

SparkQA commented Jan 31, 2016

Test build #50456 has finished for PR 10527 at commit 3a2510b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 31, 2016

Test build #50457 has finished for PR 10527 at commit 04a14cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 13, 2016

Test build #51235 has finished for PR 10527 at commit 6bc1b63.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


// key length (80 bits) is not one of the permitted values (128, 192 or 256 bits)
intercept[java.security.InvalidKeyException] {
evaluate(AesDecrypt(UnBase64(Literal("y6Ss+zCYObpCbgfWfyNWTw==")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only tests interpreted version, we also need to test codegen version, i.e. create an unsafe projection using AesDecrypt and execute it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I added codegen version tests with creating unsafe projection. But I'm not very sure whether I am doing right or not. Could you take a look again?
Meanwhile, I took look at code ExpressionEvalHelper.scala checkEvalutionWithUnsafeProjection(). Is that doing the same thing as you said to test codegen version?

@SparkQA
Copy link

SparkQA commented Feb 18, 2016

Test build #51476 has finished for PR 10527 at commit dc76057.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51508 has finished for PR 10527 at commit 8e2960c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assert(instance1.apply(null).getString(0) === "ABC")

val instance2 = UnsafeProjection.create(expr2 :: Nil)
assert(instance2.apply(null).getString(0) === "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above 2 are not needed. Normal cases are already covered by checkEvaluation, we only need to test error case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, Done that.

evaluate(expr3)
}
intercept[java.security.InvalidKeyException] {
UnsafeProjection.create(expr3::Nil).apply(null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: expr3 :: Nil, we should add space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cloud-fan
Copy link
Contributor

LGTM overall

@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51549 has finished for PR 10527 at commit 60b9da7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 19, 2016

Test build #51551 has finished for PR 10527 at commit 0856fb0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Feb 20, 2016

LGTM, merging this into master, thanks!

* @group misc_funcs
* @since 2.0.0
*/
def aes_encrypt(input: Column, key: Column): Column = withExpr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key should just be an int

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key is not integer, it's binary, see examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think so. The examples is just one of those cases. Key also could be abcdef1234567890.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we should probably just take a string.

the thing is we want to facilitate the most common cases, e.g. if 99% of the time people are passing in keys directly rather than relying on some other columns' values, we should just let them pass the literals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the functions take literals, some not., In 2.0, should we clean up all the mess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the other examples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do have a mess of it. Generally we have 3 kinds of parameters: Column, column name string, literal. For example, pow takes 2 parameters, and we have 8 overloaded versions of it, which are all combinations of these 3 kinds of parameters except both literals. However, some functions like pmod, only have both Column version. Some functions like sha2 only have Column, literal combination.

Personly I think for something like sha2 and this one, where one parameter will be literal 99% of the time, we should just give a Column, literal combination. i.e. def aes_encrypt(input: Column, key: String). For something like pow, we only need to provide the both Column version, as column name string and literal are easy to parse to Column, by col() and lit().

@asfgit asfgit closed this in 4f9a664 Feb 20, 2016
@rxin
Copy link
Contributor

rxin commented Feb 20, 2016

@davies does this implementation even make sense? decrypt always return a string, while encrypt always take in a binary? The two are not symmetric.

@rxin
Copy link
Contributor

rxin commented Feb 20, 2016

Sorry I'm not convinced this is correct (decrypt returning string type), and there are other issues with it. I'm just going to revert the patch, since it is unlikely other things will conflict with this.

@rxin
Copy link
Contributor

rxin commented Feb 20, 2016

@vectorijk I've reverted the patch. Can you reopen the pull request and fix the return types?

@vectorijk
Copy link
Contributor Author

Ok, Sure. I will do that.

@davies
Copy link
Contributor

davies commented Feb 20, 2016

I did not look into this closely, since @cloud-fan already reviewed this many rounds. sorry for the rush.

@ExpressionDescription(
usage = "_FUNC_(input, key) - Decrypts input using AES.",
extended = "> SELECT _FUNC_(UnBase64('y6Ss+zCYObpCbgfWfyNWTw=='),'1234567890123456');\n 'ABC'")
case class AesDecrypt(left: Expression, right: Expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find it confusing to have left/right here (e.g. input, key)

Let's give them a proper name, and then just override def left: Expression = input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if the key is literal, i'd just do some input data type checking in analysis (override checkInputTypes) to make sure the key is in acceptable range.

@rxin
Copy link
Contributor

rxin commented Feb 20, 2016

I would also just remove support for 192/256, so we don't have to explain the JCE stuff.

@cloud-fan
Copy link
Contributor

Sorry missed the symmetry part, but it is symmetric underneath, except AesDecrypt wrap the bytes into a string at last. We should return binary directly, sorry for the rush.

@vectorijk
Copy link
Contributor Author

Thanks so much for suggestion! I will open a new PR for update.

@maver1ck
Copy link
Contributor

@vectorijk Is this PR dead ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants