C++: Model indirect data flow through external functions

I'm trying to model indirect flow through external functions, similar to the example below. I want to follow the taint from `taint_source` to `taint_sink` through `process_taint` and `process_taint2`. Therefore, I need to model that the taint of the `data` member is propagated to the outputs.

```c++
struct S {
  int data;
  int dummy
};

int taint_source();

S* process_taint(S* input);
void process_taint2(S* input, S* output);

void taint_sink(int tainted);

void df1() {
  S* s = new S();
  S* t;

  s->data = taint_source();

  process_taint2(s, t);

  taint_sink(t->data);
  taint_sink(t->dummy);
}

void df2() {
  S* u = new S();
  S* v;

  u->data = taint_source();

  v = process_taint(u);

  taint_sink(v->data);
  taint_sink(v->dummy);
}

int main(int argc, char* argv[]) {
  df1();
  df2();

  return 0;
}
```

I use this basic query for the example
```codeql
/**
 * @kind path-problem
 */

import cpp
import semmle.code.cpp.dataflow.new.TaintTracking
import MyFlow::PathGraph

module MyFlowConf implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source.asExpr() = any(Call c | c.getTarget().hasName("taint_source))
  }

  predicate isSink(DataFlow::Node sink) {
    sink.asExpr() = any(Call c | c.getTarget().hasName("taint_sink)).getAnArgument()
  }
}

module MyFlow = TaintTracking::Global<MyFlowConf>;

from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink, source, sink, "Flow"
```

Using the following MaD I can get a correct taint flow.
```yml
extensions:
  - addsTo:
    pack: codeql/cpp-all
    extensible: summaryModel
  data:
    - ["", "", False, "process_taint", "", "", "Argument[*0].Field[data]", "ReturnValue[*].Field[data]", "taint", "manual"]
    - ["", "", False, "process_taint2", "", "", "Argument[*0].Field[data]", "Argument[*1].Field[data]", "taint", "manual"]
```
However, as far as I can see that would require to copy the rule for every member of `S` using the same pattern. For JS there seems to be an `AnyMember` keyword but it looks like this is not available in C++. Is there a wildcard to specify the same field/access path in the input and output?

Alternatively I tried to model it as an additional flow step like this
```codeql
predicate isAdditionalFlowStep2(DataFlow::Node source, DataFlow::Node sink) {
  exists(Assignment a |
    a.getLValue().getAChild() = sink.asIndirectExpr()
    and a.getRValue() = source.asExpr()
    and source.asExpr().(Call).getTarget().hasName("taint_source")
  )
  or
  exists(Call c |
    c.getTarget().hasName("process_taint")
    and sink.asIndiretExpr() = c
    and source.asIndirectExpr() = c.getAnArgument()
  )
  or
  exists(Call c |
    c.getTarget().hasName("process_taint2")
    and source.asIndirectExpr() = c.getArgument(0)
    and sink.asDefiningArgument() = c.getArgument(1)
  )
  or
  source.asIndirectExpr() = sink.asExpr().(FieldAccess).getQualifier()
}

```
Which finds the flows but also produces false positives where `dummy` is given to `taint_sink` as it should not be tainted.

How can I model the propagation of indirect data flow with the correct access path for external functions?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++: Model indirect data flow through external functions #19151

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

C++: Model indirect data flow through external functions #19151

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions