-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape discovered avro field name if it is a ksql lexer token literal #(1043) #1050
Escape discovered avro field name if it is a ksql lexer token literal #(1043) #1050
Conversation
It looks like @gjimher hasn't signed our Contributor License Agreement, yet.
You can read and sign our full Contributor License Agreement here. Once you've signed reply with Appreciation of efforts, clabot |
[clabot:check] |
@confluentinc It looks like @gjimher just signed our Contributor License Agreement. 👍 Always at your service, clabot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gjimher - thanks for the PR!
There are a few nits from me. I also think this PR would benefit from some more testing. It may be worth adding a suitable test to QueryTranslationTest
, for example, to show the full statement parsing working as expected.
I'll leave it to others to comment on the overall approach, as I'm not that familiar with this code.
|
||
public class FormatterUtil { | ||
|
||
private static final HashSet<String> literalsSet = new HashSet<String>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch the type of literalsSet
to be the interface Set<String>
, rather than the implementation.
if (literal == null) { | ||
continue; | ||
} | ||
if (literal.startsWith("'") && literal.endsWith("'")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't literals explicitly start and end with a single quote?
|
||
private static final HashSet<String> literalsSet = new HashSet<String>(); | ||
|
||
static { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can avoid the need for a static block and make immutable by using streams:
final Set<String> literalsSet = ImmutableSet.copyOf(
IntStream.range(0, SqlBaseLexer.VOCABULARY.getMaxTokenType())
.mapToObj(SqlBaseLexer.VOCABULARY::getLiteralName)
.filter(Objects::nonNull)
.map(l -> l.substring(1, l.length() - 1))
.collect(Collectors.toSet())
);
f49ac11
to
27ab97b
Compare
I pushed some improvements:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gjimher, left a couple of comments
import java.util.stream.IntStream; | ||
import com.google.common.collect.ImmutableSet; | ||
|
||
public class FormatterUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a class for this? It is only used in SqlFormatter
so can just include it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will also be used by ExpressionFormatter to correct new bugs similar to this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now i'd just include it in SqlFormatter
- if the same functionality is needed elsewhere we can pull it out later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
new StringLiteral("topic_test") | ||
)); | ||
String sql = SqlFormatter.formatSql(createStream); | ||
Assert.assertTrue("literal escaping failure", sql.contains("`GROUP` STRING")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertThat(sql, containsString("
GROUP STRING"));
same below
Assert.assertTrue("not literal escaping failure", sql.contains("NOLIT STRING")); | ||
Assert.assertTrue("lowercase literal escaping failure", sql.contains("`Having` STRING")); | ||
List<Statement> statements = new KsqlParser().buildAst(sql, MetaStoreFixture.getNewMetaStore()); | ||
Assert.assertTrue("formatted sql parsing error", statements != null && ! statements.isEmpty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertFalse(.., statements.isEmpty())
|
||
public class SqlFormatterTest { | ||
@Test | ||
public void testFormatSql() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove throws Exception
27ab97b
to
ef19d3b
Compare
ef19d3b
to
2443b7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gjimher, LGTM
@big-andy-coates ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the PR!
In CTAS and CSAS statements with avro format, fields names are discovered in schema registry and added as table elements generating an SQL statement. If the field name is a token literal (eg FROM, GROUP, CREATE, ...) the generated statement is not parsed correctly and the creation fails.
This PR solves it escaping the field name with ` quotes. It adds quotes only when field name is a token literal.
Issue #1043