Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support variable substitution in SQL statements #6504

Merged
merged 4 commits into from
Oct 27, 2020

Conversation

spena
Copy link
Member

@spena spena commented Oct 22, 2020

Description

This is part of KLIP-38 - Variable Substitution

It implements variable substitution on SQL statements, both the Server-side and CLI-side. Variables are referenced in statements with the ${} characters. For instance:

For the CLI-side, the scope is per-session. The following example will create the s1 stream that reads from topic t1.

ksql> define topicName = 't1';
ksql> define streamName = 's1';
ksql> create stream ${streamName} with (kafka_topic='${topicName}');

For the server-side, the variable scope is seen per-request. The following example will create the s1 stream that reads from topic t1.
Request 1 will create stream s1 from topic t1:

{ 
  statement: "
    define topicName = 't1';
    create stream s1 with (kafka_topic='${topicName}');
  "
}

Request 2 will FAIL because ${topicName} is not a valid topic name

{ 
  statement: "
    create stream s1 with (kafka_topic='${topicName}');
  "
}

Variable substitution can be enabled or disabled with the ksql.variable.substitution.enable configuration. It can be set in the ksql-server.properties or can be overridden from the CLI. By default is true.

Implementation:

A new VariableSubstitutor class is created that replaces variables wrapped in ${} from a parsed statement (ParsedStatement). This VarialbeSubstitutor is called from the CLI to replace variables before sending to the Server; and from the KsqlEngine.prepare() too in case there are variables defined in the KSQL request.

During the CLI session, variables are substituted before sending the request to the Server. I initially thought of sending the DEFINE statements, but it was more work to identify the variables and create a new request with them. Doing it in the CLI saves some time on parsing.

I didn't do the substitution in the AstBuilder because there were several places I needed to change to provide substitution. If a new syntax was added in the future, the developer would have to know about variables and call the right visit or ParserUtil methods to get the variables substituted. Also, I needed to rewrite the statement text, which required more work in the builder for doing that. Having a simple VariableSubstitutor that returns the replaced statement text was simple.

Because substitution happens by replacing text, I had to make sure of escape the single-quote characters to avoid using variables for SQL injection. The escape is done in the ParserUtil when called by the VariableSubstitutor.

Testing done

Added unit tests
Verified manually

ksql> show streams;

 Stream Name         | Kafka Topic                 | Key Format | Value Format | Windowed 
------------------------------------------------------------------------------------------
 KSQL_PROCESSING_LOG | default_ksql_processing_log | KAFKA      | JSON         | false    
------------------------------------------------------------------------------------------
ksql> define topicName = 't1';
ksql> define streamName = 's1';
ksql> create stream ${streamName} (id int) with (kafka_topic='${topicName}', value_format='json', replicas=1, partitions=1);

 Message        
----------------
 Stream created 
----------------
sql> show streams;

 Stream Name         | Kafka Topic                 | Key Format | Value Format | Windowed 
------------------------------------------------------------------------------------------
 KSQL_PROCESSING_LOG | default_ksql_processing_log | KAFKA      | JSON         | false    
 S1                  | t1                          | KAFKA      | JSON         | false    
------------------------------------------------------------------------------------------

Server side

$ curl -X "POST" "http://localhost:8088/ksql" \
>      -H "Accept: application/vnd.ksql.v1+json" \
>      -d $'{
>   "ksql": "define topicName = \'t1\';define streamName = \'s3\';create stream ${streamName} (id int) with (kafka_topic=\'${topicName}\', value_format=\'json\');",
>   "streamsProperties": {}
> }' | jq .
[
  {
    "@type": "currentStatus",
    "statementText": "CREATE STREAM S3 (ID INTEGER) WITH (KAFKA_TOPIC='t1', KEY_FORMAT='KAFKA', VALUE_FORMAT='JSON');",
    "commandId": "stream/`S3`/create",
    "commandStatus": {
      "status": "SUCCESS",
      "message": "Stream created",
      "queryId": null
    },
    "commandSequenceNumber": 36,
    "warnings": []
  }
]

Reviewer checklist

  • Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
  • Ensure relevant issues are linked (description should include text like "Fixes #")

@spena spena requested a review from a team October 22, 2020 21:37
@spena spena added this to the 0.14.0 milestone Oct 22, 2020
@spena spena changed the title Sql variable subst feat: support variable substitution in SQL statements Oct 22, 2020
throw new ParseFailedException(
"Illegal argument at " + location.map(NodeLocation::toString).orElse("?")
+ ". Identifier names may only contain alphanumeric values, '_' "
+ "or not starting with '@'. Got: '" + value + "'",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text is a bit confusing. You mean: "Identifier names cannot start with '@' and may only contain alphanumeric values and '_'. "

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That sounds much better.
Done

}

@Test
public void shouldThrowOnSQLInjection() {
Copy link
Member

@vpapavas vpapavas Oct 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works (identifying sql injection) because variable names must not contain spaces, etc? Basically, only alphanumeric and '_'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable names do not contain spaces, but values can. However, it all depends on how to reference the values in the statement. All values must be enclosed into single-quotes, but single-quotes are not part of the value in the statement.

For instance:

// This should throw an error because ${id} has spaces.
ksql> define id = '5 and id != 5';
ksql> SELECT * FROM s1 WHERE id = ${id};

// This is a valid statement. The query will print | 5 and id != 5 | in column 1 because it is referenced inside the single-quotes, which defines the column as String.
ksql> define id = '5 and id != 5';
ksql> SELECT '${id}' FROM ...

@vpapavas
Copy link
Member

Where is the define topicName statement defined? Was that in a previous PR? Is define tableName not supported?

@spena
Copy link
Member Author

spena commented Oct 26, 2020

@vpapavas which define topicName and define tableName you're asking? topicName and tableName are variables names a user can use, they're not reserved names.

@vpapavas
Copy link
Member

Ah ok, I got confused. I thought there are reserved words.

@vpapavas vpapavas self-assigned this Oct 27, 2020
Copy link
Member

@vpapavas vpapavas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@spena spena merged commit e185c1f into confluentinc:master Oct 27, 2020
@spena
Copy link
Member Author

spena commented Oct 27, 2020

Thanks @vpapavas for the review

@spena spena deleted the sql_variable_subst branch October 27, 2020 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants