Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-6764 Implement Binary and Hexadecimal literals #1490

Closed
wants to merge 12 commits into from

Conversation

stoty
Copy link
Contributor

@stoty stoty commented Aug 19, 2022

also use VarBinaryFormatter for BINARY type

Copy link
Contributor

@gjacoby126 gjacoby126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions:

  1. This doesn't seem to have a test covering the new feature
  2. For numeric literal types, Phoenix does some custom serialization so that they sort lexicographically within HBase so that we can do range scans and order by clauses properly. Do we need to do something like this here, or will PBinary take care of that for us?
  3. What if a user wants to treat a hexadecimal value as a numeric type rather than a fixed byte array?

@stoty
Copy link
Contributor Author

stoty commented Aug 23, 2022

  1. Forgot to add it to the commit, I've pushed it now.

  2. Yes, PBinary should take care of that. This only affects the parser, otherwise everything should behave as if you specified a query parameter and set it with setObject(byte[]).

  3. The standard BINMARY->NUMERIC coercion mechanism should apply. I will add a few tests for that.

@stoty
Copy link
Contributor Author

stoty commented Aug 23, 2022

I've checked the SQL2011 standard

On page 188 it says

  1. It is implementation-defined whether the declared type of a is a fixed-length binary
    string type, a variable-length binary string type, or a binary large object string type.

So using the BINARY is OK.

Phoenix won't coerce binary types to numeric types, so you cannot treat a binary string literal as a number.

@stoty
Copy link
Contributor Author

stoty commented Aug 23, 2022

I have also found some more requirements in the standrar regarding the literal handling (multi-line literals and whitespace handling)

I am currently fighting ANTLR to implement those.

@stoty
Copy link
Contributor Author

stoty commented Aug 25, 2022

I've updated the patch with a more flexible implementation.
As we don't enable backtracking, implementing the multi-line literals was a major pain, ended up quite hacky.

The query dumping BINARY format change is probably going to break a lot of tests, I'll have to do another pass on those.

@stoty
Copy link
Contributor Author

stoty commented Aug 26, 2022

There two open issues:

  1. The new litarals don't work with BINARY ARRAYS. I suspect that this is not a new problem, so I'm not terribly concerned about that. This may be related to the fact the parser generates an ARRAY of VARBINARY nodes, which is not a supported

  2. The way we convert binaries to strings is inconsitent:
    On one hand, we have the Formatters that are used for VARBINARY (and now BINARY) results, on the other we have the toStringLiteral() method in the P*BBinary classes, which are used for Statement.toString(), and for BINARY_ARRAY (which doesn't have an explicit Formatter set)

Before the patch, the formatter generated 'AABB', which the toStringLiteral() methods returned '[170, 187]'

With my patch, the formatter is the same, only it's now also set for BINARY, while previously it was only set for VARBINARY.
I've changed the toStringLiteral() formatter to return a valid binary string literal, X'AABB' , so now stmt.toString() generates valid SQL.

As BINARY_ARRAY doesn't have a formatter, it uses the toStringLiteral() format, this is not changed by the patch.

The complicate things, it seems that Oracle and Mysql simply returns something like new String(byte[]) for getString(), so the Phoenix behaviour is unique.

I don't think that this is show-stopper either, the patch doesn't make any of those problems any worse.



/** Mask for bit 0 of a byte. */
private static final int BIT_0 = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we have 0x01 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Becuse that's how it's written in commons-codec where I copied this from.

But you're right, it's more consistent to use 0x01 here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@stoty
Copy link
Contributor Author

stoty commented Aug 30, 2022

  1. The new litarals don't work with BINARY ARRAYS. I suspect that this is not a new problem, so I'm not terribly concerned about that. This may be related to the fact the parser generates an ARRAY of VARBINARY nodes, which is not a supported

I have changed the literals to be interpreted as BINARY instead of VARBINARY.
We can still coerce them to varbinary, but this allows using them for BINARY ARRAYs.

I also had to change the size estimation code.
While the BINARY type is fixed length, we don't know / pass its length when trying to estimate its size.

stmt.execute(ddl);
conn.commit();

//FIXME why does not this work ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comment can be removed if this has been fixed

@stoty
Copy link
Contributor Author

stoty commented Sep 8, 2022

I have fixed the space handling, and also adopted the ITs to expect the new literal format.

Copy link
Contributor

@gjacoby126 gjacoby126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks @stoty

Copy link
Contributor

@gjacoby126 gjacoby126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed a test failure in the new BinaryLiteralStringIT:

[ERROR] org.apache.phoenix.end2end.BinaryStringLiteralIT.testBinary Time elapsed: 2.254 s <<< ERROR! org.apache.phoenix.exception.PhoenixParserException: ERROR 602 (42P00): Syntax error. Missing "LPAREN" at line 1, column 32. at org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33) at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111) at org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1961) at org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:2044) at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:2142) at org.apache.phoenix.end2end.BinaryStringLiteralIT.insertRow(BinaryStringLiteralIT.java:53) at org.apache.phoenix.end2end.BinaryStringLiteralIT.testBinary(BinaryStringLiteralIT.java:71) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.phoenix.SystemExitRule$1.evaluate(SystemExitRule.java:40) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.apache.maven.surefire.junitcore.JUnitCore.run(JUnitCore.java:55) at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.createRequestAndRun(JUnitCoreWrapper.java:137) at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.executeLazy(JUnitCoreWrapper.java:119) at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:87) at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.execute(JUnitCoreWrapper.java:75) at org.apache.maven.surefire.junitcore.JUnitCoreProvider.invoke(JUnitCoreProvider.java:158) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:456) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:169) at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:595) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:581) Caused by: MissingTokenException(inserted [@-1,0:0='<missing LPAREN>',<105>,1:31] at x) at org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:382) at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:8328) at org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:8142) at org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:8079) at org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:8044) at org.apache.phoenix.parse.PhoenixSQLParser.one_or_more_expressions(PhoenixSQLParser.java:10358) at org.apache.phoenix.parse.PhoenixSQLParser.upsert_node(PhoenixSQLParser.java:5987) at org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:883) at org.apache.phoenix.parse.PhoenixSQLParser.statement(PhoenixSQLParser.java:532) at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:108) ... 49 more

@stoty
Copy link
Contributor Author

stoty commented Sep 9, 2022

Thanks, I haven't fixed the extra space in the other test method.
Pushed the fix now.

Copy link
Contributor

@gjacoby126 gjacoby126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks @stoty

@stoty stoty closed this Sep 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants