Skip to content

C++: Model indirect data flow through external functions #19151

Closed
@fbesler

Description

@fbesler

I'm trying to model indirect flow through external functions, similar to the example below. I want to follow the taint from taint_source to taint_sink through process_taint and process_taint2. Therefore, I need to model that the taint of the data member is propagated to the outputs.

struct S {
  int data;
  int dummy
};

int taint_source();

S* process_taint(S* input);
void process_taint2(S* input, S* output);

void taint_sink(int tainted);

void df1() {
  S* s = new S();
  S* t;

  s->data = taint_source();

  process_taint2(s, t);

  taint_sink(t->data);
  taint_sink(t->dummy);
}

void df2() {
  S* u = new S();
  S* v;

  u->data = taint_source();

  v = process_taint(u);

  taint_sink(v->data);
  taint_sink(v->dummy);
}

int main(int argc, char* argv[]) {
  df1();
  df2();

  return 0;
}

I use this basic query for the example

/**
 * @kind path-problem
 */

import cpp
import semmle.code.cpp.dataflow.new.TaintTracking
import MyFlow::PathGraph

module MyFlowConf implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source.asExpr() = any(Call c | c.getTarget().hasName("taint_source))
  }

  predicate isSink(DataFlow::Node sink) {
    sink.asExpr() = any(Call c | c.getTarget().hasName("taint_sink)).getAnArgument()
  }
}

module MyFlow = TaintTracking::Global<MyFlowConf>;

from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink, source, sink, "Flow"

Using the following MaD I can get a correct taint flow.

extensions:
  - addsTo:
    pack: codeql/cpp-all
    extensible: summaryModel
  data:
    - ["", "", False, "process_taint", "", "", "Argument[*0].Field[data]", "ReturnValue[*].Field[data]", "taint", "manual"]
    - ["", "", False, "process_taint2", "", "", "Argument[*0].Field[data]", "Argument[*1].Field[data]", "taint", "manual"]

However, as far as I can see that would require to copy the rule for every member of S using the same pattern. For JS there seems to be an AnyMember keyword but it looks like this is not available in C++. Is there a wildcard to specify the same field/access path in the input and output?

Alternatively I tried to model it as an additional flow step like this

predicate isAdditionalFlowStep2(DataFlow::Node source, DataFlow::Node sink) {
  exists(Assignment a |
    a.getLValue().getAChild() = sink.asIndirectExpr()
    and a.getRValue() = source.asExpr()
    and source.asExpr().(Call).getTarget().hasName("taint_source")
  )
  or
  exists(Call c |
    c.getTarget().hasName("process_taint")
    and sink.asIndiretExpr() = c
    and source.asIndirectExpr() = c.getAnArgument()
  )
  or
  exists(Call c |
    c.getTarget().hasName("process_taint2")
    and source.asIndirectExpr() = c.getArgument(0)
    and sink.asDefiningArgument() = c.getArgument(1)
  )
  or
  source.asIndirectExpr() = sink.asExpr().(FieldAccess).getQualifier()
}

Which finds the flows but also produces false positives where dummy is given to taint_sink as it should not be tainted.

How can I model the propagation of indirect data flow with the correct access path for external functions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions