Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape discovered avro field name if it is a ksql lexer token literal #(1043) #1050

Conversation

gjimher
Copy link
Contributor

@gjimher gjimher commented Mar 26, 2018

In CTAS and CSAS statements with avro format, fields names are discovered in schema registry and added as table elements generating an SQL statement. If the field name is a token literal (eg FROM, GROUP, CREATE, ...) the generated statement is not parsed correctly and the creation fails.

This PR solves it escaping the field name with ` quotes. It adds quotes only when field name is a token literal.

Issue #1043

@ghost
Copy link

ghost commented Mar 26, 2018

It looks like @gjimher hasn't signed our Contributor License Agreement, yet.

The purpose of a CLA is to ensure that the guardian of a project's outputs has the necessary ownership or grants of rights over all contributions to allow them to distribute under the chosen licence.
Wikipedia

You can read and sign our full Contributor License Agreement here.

Once you've signed reply with [clabot:check] to prove it.

Appreciation of efforts,

clabot

@gjimher
Copy link
Contributor Author

gjimher commented Mar 26, 2018

[clabot:check]

@ghost
Copy link

ghost commented Mar 26, 2018

@confluentinc It looks like @gjimher just signed our Contributor License Agreement. 👍

Always at your service,

clabot

@apurvam apurvam requested a review from a team March 26, 2018 15:25
Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gjimher - thanks for the PR!

There are a few nits from me. I also think this PR would benefit from some more testing. It may be worth adding a suitable test to QueryTranslationTest, for example, to show the full statement parsing working as expected.

I'll leave it to others to comment on the overall approach, as I'm not that familiar with this code.


public class FormatterUtil {

private static final HashSet<String> literalsSet = new HashSet<String>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch the type of literalsSet to be the interface Set<String>, rather than the implementation.

if (literal == null) {
continue;
}
if (literal.startsWith("'") && literal.endsWith("'")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't literals explicitly start and end with a single quote?


private static final HashSet<String> literalsSet = new HashSet<String>();

static {
Copy link
Contributor

@big-andy-coates big-andy-coates Mar 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the need for a static block and make immutable by using streams:

final Set<String> literalsSet = ImmutableSet.copyOf(
  IntStream.range(0, SqlBaseLexer.VOCABULARY.getMaxTokenType())
        .mapToObj(SqlBaseLexer.VOCABULARY::getLiteralName)
        .filter(Objects::nonNull)
        .map(l -> l.substring(1, l.length() - 1))
        .collect(Collectors.toSet())
);

@gjimher
Copy link
Contributor Author

gjimher commented Apr 4, 2018

I pushed some improvements:

  • your better style
  • fixed lowercase literal escaping
  • a test to check that the generated sql is parsable

Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gjimher, left a couple of comments

import java.util.stream.IntStream;
import com.google.common.collect.ImmutableSet;

public class FormatterUtil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a class for this? It is only used in SqlFormatter so can just include it there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will also be used by ExpressionFormatter to correct new bugs similar to this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now i'd just include it in SqlFormatter - if the same functionality is needed elsewhere we can pull it out later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

new StringLiteral("topic_test")
));
String sql = SqlFormatter.formatSql(createStream);
Assert.assertTrue("literal escaping failure", sql.contains("`GROUP` STRING"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertThat(sql, containsString("GROUP STRING"));
same below

Assert.assertTrue("not literal escaping failure", sql.contains("NOLIT STRING"));
Assert.assertTrue("lowercase literal escaping failure", sql.contains("`Having` STRING"));
List<Statement> statements = new KsqlParser().buildAst(sql, MetaStoreFixture.getNewMetaStore());
Assert.assertTrue("formatted sql parsing error", statements != null && ! statements.isEmpty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertFalse(.., statements.isEmpty())


public class SqlFormatterTest {
@Test
public void testFormatSql() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove throws Exception

@gjimher gjimher force-pushed the avro-escape-infered-schema-field-names branch from 27ab97b to ef19d3b Compare April 5, 2018 16:58
@gjimher gjimher force-pushed the avro-escape-infered-schema-field-names branch from ef19d3b to 2443b7d Compare April 9, 2018 09:06
Copy link
Contributor

@dguy dguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gjimher, LGTM
@big-andy-coates ?

Copy link
Contributor

@big-andy-coates big-andy-coates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the PR!

@big-andy-coates big-andy-coates merged commit a42d7e8 into confluentinc:master Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants