Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Fix and enhance SQL injection prevention #40

Merged
merged 11 commits into from Sep 13, 2022
Merged

Conversation

oscar60310
Copy link
Contributor

@oscar60310 oscar60310 commented Aug 24, 2022

What's fixed

  1. After Feature: SQL Injection Prevention (Flow Simulation) #23, we replace the input parameters with parameterized values like $1, $2 ...etc. But we need the raw value with template logic, for example:

    select * from user
    {% if context.params.age %} where age > {{ context.params.age }} {% endif %}

    The above template will always render the where query because context.params.age is "$1". This PR fixed this issue, it replaces values only when needed.

  2. We sent useless binding with sub-queries. For example:

     {% req user %}
        select id from users where userName = {{ context.params.name }}
    {% endreq %}
    select * from groups where groupName = {{ context.params.groupName }}

    We sent select id from users where userName = $1 with binding ['someUserName', 'someGroupName'] at the first query, which might cause driver failure (binding length is not equal to query usage). This PR sends binding with only required parameters.

Enhancement

  1. Parameterize all the values including sub queries' values, user attributes...

    {% req user %}
    select id from users where userName = {{ context.params.name }}
    {% endreq %}
    select * from groups where userId = {{ user.value()[0].id }}; --- this value will be parameterized too.
  2. Provide a filter raw to force us to render the raw values.

    {{ context.params.name }} --- $1
    {{ context.params.name | raw }} --- someName

@oscar60310 oscar60310 force-pushed the chore/vulcan-lab branch 2 times, most recently from 5e131f8 to e01221b Compare August 25, 2022 03:38
@oscar60310 oscar60310 force-pushed the fix/sampler-params branch 4 times, most recently from 0bbedc7 to 7b8451f Compare August 30, 2022 09:20
@oscar60310 oscar60310 changed the title [WIP] Fix: Fix sampler parameters inputs [WIP] Fix: Fix and enhance SQL injection prevention Aug 31, 2022
@codecov-commenter
Copy link

codecov-commenter commented Aug 31, 2022

Codecov Report

Base: 91.36% // Head: 92.40% // Increases project coverage by +1.03% 🎉

Coverage data is based on head (4285267) compared to base (dc2cf86).
Patch coverage: 90.85% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #40      +/-   ##
===========================================
+ Coverage    91.36%   92.40%   +1.03%     
===========================================
  Files          199      221      +22     
  Lines         2630     3040     +410     
  Branches       280      354      +74     
===========================================
+ Hits          2403     2809     +406     
+ Misses         177      169       -8     
- Partials        50       62      +12     
Flag Coverage Δ
build 94.87% <100.00%> (+0.63%) ⬆️
core 93.13% <94.26%> (+2.01%) ⬆️
extension-dbt 97.43% <ø> (ø)
extension-debug-tools 98.11% <98.11%> (?)
integration-testing 95.00% <100.00%> (-1.43%) ⬇️
serve 88.78% <88.19%> (+0.67%) ⬆️
test-utility ∅ <ø> (∅)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
packages/build/src/containers/types.ts 100.00% <ø> (ø)
...generator/spec-generator/oas3/oas3SpecGenerator.ts 97.33% <ø> (-0.04%) ⬇️
packages/build/src/models/index.ts 100.00% <ø> (ø)
packages/build/src/options/index.ts 100.00% <ø> (ø)
packages/core/src/containers/types.ts 100.00% <ø> (ø)
packages/core/src/lib/data-source/pg.ts 42.85% <0.00%> (+22.85%) ⬆️
packages/core/src/models/extensions/dataSource.ts 100.00% <ø> (ø)
...ackages/core/src/models/extensions/filterRunner.ts 100.00% <ø> (ø)
packages/core/src/models/extensions/tagRunner.ts 95.23% <ø> (ø)
packages/serve/src/containers/types.ts 100.00% <ø> (ø)
... and 78 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@oscar60310 oscar60310 marked this pull request as ready for review August 31, 2022 08:04
@oscar60310 oscar60310 changed the title [WIP] Fix: Fix and enhance SQL injection prevention Fix: Fix and enhance SQL injection prevention Aug 31, 2022
Base automatically changed from chore/vulcan-lab to develop September 6, 2022 08:28
Copy link
Contributor

@kokokuo kokokuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides some questions ( I do not figure out the part...), others LGTM, Awsome 👍

Comment on lines +10 to +11
/** The index (starts from 1) of parameters, it's useful to generate parameter id like $1, $2 ...etc. */
parameterIndex: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also consider the non-index parameters, like big query seems only support ? placeholder and naming placeholder e.g: @name, @url ... and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simply ignore this input argument and return the identifier whatever you want:

  public async prepare() {
    return `@${someRandom()}`
  }

I can't give you some meaningful parameter name because it might not be an input parameter, for example:

{% for val in someArray %}
{{ val }}  --- I don't know the name of the value, only know the order (index)
{% endfor %}

So for Big Query, one alternative is generating naming identifiers by the index too:

  public async prepare({parameterIndex}) {
    return `@var${parameterIndex}`
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oscar60310 for replying my question and explain it!

bindParams: BindParameters;
pagination?: Pagination;
}

export type PrepareParameter = { (param: RequestParameter): Promise<string> };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename to PrepareParameterFunc or PrepareParameterAsyncFunc to represent explicitly ?

Or it looks like a type object, so that when seeing the prepare: PrepareParameter in IExecutor interface, not easy to attend it until discovering not add the await at dataSource.prepare under executor prepare method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update it to PrepareParameterFunc, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much !

Comment on lines +131 to +134
function (this: any, value: any, ...args) {
// use classic function to receive context
extension.__transform(this, value, ...args);
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we change this part, not figure out what the influence is, could you introduce it ? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nunjucks binding "context" to "this" (js this) https://github.com/mozilla/nunjucks/blob/master/nunjucks/src/compiler.js#L499

I'd like to pass context to transform functions, but we bound extension object to __transform function, it makes this become extension instead of the context provided by upstream.

extension.__transform.bind(extension),

So I used a classic function here to receive this first then pass to ___transform function as an argument.

// the value is real param data
[identifier: string]: string;
};
export type BindParameters = Map<string, string>;

export type IdentifierParameters = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also remove it, because seems it will not use to after the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oscar60310's help!

import { RAW_FILTER_NAME, SANITIZER_NAME } from './constants';

@VulcanInternalExtension()
export class SanitizerBuilder extends FilterBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you introduce what is the SanitizerBuilder responsible for, why we need to add sanitizer, and put the symbol sanitize? Not figure out the workflow and the domain context for sanitize and not see the description introduce it, thanks so much !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builder automatically adds the filter 'sanilizer` after the "lookup" like nodes, e.g. LookupVal, FunctionCall ...etc. (so do other filter builders). In order to do sql injection prevention.

{{ context.params.id }} -> {{ context.params.id | sanitizer }}

I've add these comments for the class too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oscar60310 for replying my question

}

if (this.isNodeNeedToBeSanitize(child)) {
if (!parentHasOutputNode && !(node instanceof nunjucks.nodes.Output))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we only add sanitize when the node is the output or the parent has an output node? But why do we add the sanitize after the output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we only add sanitize when the node is the output or the parent has an output node?

Yes.

This is an ordinary AST tree of template {{ context.params.id }}:

Vucal-sql-injection-des drawio

When we traversal the AST from root:

  1. We first meet the output node
  2. And we'll find its child is a LookupVal
  3. We wrap the lookupVal node in filter sanitizer. (via the replace function) {{ context.params.id | sanitizer }}
    Vucal-sql-injection-des drawio

This is what this builder mainly does.

But why do we add the sanitize after the output?

Output nodes mean to "render" strings, in our case, they also mean to generate the sql to execute, these values should be sanitized (parameterized) before executing, so we add the filter before the output result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oscar60310 for introducing detail so that I could understand it 😃 !!

query = query
.split(/\r?\n/)
.filter((line) => line.trim().length > 0)
.join('\n');
// Get bind real parameters and pass to data query builder for data source used.
const binds = (context.ctx || {})['_paramBinds'] || {};
const binds = parameterizer.getBinding();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call parameterizer.getBinding(), but the binds has contains the idToValueMapping data which SanitizerRunner ran await input.parameterize(parameterizer) ?

I'm really confused the each filter's execute order and when calling transform and calling run ~ " ~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We call parameterizer.getBinding(), but the binds has contains the idToValueMapping data which SanitizerRunner ran await input.parameterize(parameterizer) ?

Yes! The magic happened because of the shared context. Please have a look at the image below.

I'm really confused the each filter's execute order and when calling transform and calling run ~ " ~

We're using DFS to compile and execute the nodes, so we run req tag first, then filter.

Comment on lines +32 to +34
// parameterizer from parent, we should set it back after rendered our context.
const parentParameterizer = context.lookup(PARAMETERIZER_VAR_NAME);
context.setVariable(PARAMETERIZER_VAR_NAME, parameterizer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand why we set the parameterizer back to our context when parameterizer from parent, could you tell more information? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the parent and children shared the same context, if the children don't set it back, their parent will lost its parameterize because it will be overriden.

Comment on lines +24 to +27
// Parameterizer should be set by req tag runner
const parameterizer = context.lookup<Parameterizer>(PARAMETERIZER_VAR_NAME);
if (!parameterizer) throw new Error(`No parameterizer found`);
return await input.parameterize(parameterizer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious the parameter only set by req tag runner, the req tag seems to the sample you show on description:

{% req user %}
    select id from users where userName = {{ context.params.name }}
{% endreq %}

but I didn't see where to handle the normal {{ ... }} case, e.g:

select * from user where name = {{ context.params.name }}

Or does the req tag runner also include the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or does the req tag runner also include the case?

Yes, we'll wrap the template with a builder if there is no main builder.
https://github.com/Canner/vulcan-sql/blob/develop/packages/core/src/lib/template-engine/built-in-extensions/query-builder/reqTagBuilder.ts#L146-L158

@oscar60310
Copy link
Contributor Author

oscar60310 commented Sep 13, 2022

Hi @kokokuo , I've drawn a diagram to describe what's happened with our filters and tags.
vulcan-ext

@oscar60310
Copy link
Contributor Author

Hi @kokokuo all issues have been fixed.

@kokokuo
Copy link
Contributor

kokokuo commented Sep 13, 2022

Hi @kokokuo , I've drawn a diagram to describe what's happened with our filters and tags. vulcan-ext

Thanks for drawing the detailed steps picture, and explaining to me the flow, it's really helpful for me, thanks so much 😃

@kokokuo kokokuo merged commit b727780 into develop Sep 13, 2022
@kokokuo kokokuo deleted the fix/sampler-params branch September 13, 2022 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants