Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument level granularity in data-flow tracking to calls #729

Open
jaiverma opened this issue May 10, 2020 · 2 comments
Open

Argument level granularity in data-flow tracking to calls #729

jaiverma opened this issue May 10, 2020 · 2 comments

Comments

@jaiverma
Copy link

I was trying to get data-flow to a specific argument to a function call.
For example, considering the following snippet of code:

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <arpa/inet.h>

int main() {
    uint32_t a = 28;
    uint32_t b = 42;
    uint32_t a_n = ntohl(a);
    uint32_t b_n = ntohl(b);

    char *buf;
    uint32_t offset = a_n + 5;

    memcpy(buf + offset, buf, b_n);
}

I want to get the dataflow from calls to ntohl, to the size argument of memcpy. So in the example, I would expect the flow b_n = ntohl(a) -> ... -> memcpy(buf + offset, buf, b_n).

My query is:

def networkToMemcpy() = {
    val source = cpg.call.name("ntoh(s|l|ll)")
    val sink = cpg.call.name("memcpy").argument(3)
    val paths = sink.reachableByFlows(source)
    paths.l.map(
        l => l.elements.map(
            call => (
                call.asInstanceOf[Call].name,
                call.asInstanceOf[Call].code,
                call.location.filename,
                call.location.lineNumber match {
                    case Some(n) => n.toString
                    case None => "n/a"
                }
            )
        )
    )
}

The problem is, apart from the expected flow, I am also getting the flow of identifier a_n -> memcpy(buf + offset) which is the first argument of memcpy.

joern> networkToMemcpy
res100: List[List[(String, String, String, String)]] = List(
  List(
    ("ntohl", "ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
    ("<operator>.assignment", "b_n = ntohl(b)", "/mnt/c/wd/tmp/t/a.c", "10"),
    ("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
  ),
  List(
    ("ntohl", "ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
    ("<operator>.assignment", "a_n = ntohl(a)", "/mnt/c/wd/tmp/t/a.c", "9"),
    ("<operator>.addition", "a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
    ("<operator>.assignment", "offset = a_n + 5", "/mnt/c/wd/tmp/t/a.c", "13"),
    ("<operator>.addition", "buf + offset", "/mnt/c/wd/tmp/t/a.c", "15"),
    ("memcpy", "memcpy(buf + offset, buf, b_n)", "/mnt/c/wd/tmp/t/a.c", "15")
  )
)

It seems that argument in val sink = cpg.call.name("memcpy").argument(3) doesn't change the result.

Is there currently a way of getting data-flow for just one argument of a call?

@fabsx00
Copy link
Contributor

fabsx00 commented Sep 2, 2020

@jaiverma I recently made some changes in the data flow engine. This problem should be addressed. Could you retest this by any chance?

@jaiverma
Copy link
Author

jaiverma commented Sep 3, 2020

Hi @fabsx00

I tested this again with Joern v1.1.1, but it seems to give the same result.

def networkToMemcpy() = {
    val source = cpg.call.name("ntoh(s|l|ll)")
    val sink = cpg.call.name("memcpy").argument(3)
    val paths = sink.reachableByFlows(source)
    paths.p
}

This is still returning flow for the first argument of memcpy.

joern> networkToMemcpy
res4: List[String] = List(
  """_______________________________________________________________________
| tracked                       | lineNumber| method| file             |
|======================================================================|
| ntohl(b)                      | 10        | main  | /tmp/c/arg/main.c|
| b_n = ntohl(b)                | 10        | main  | /tmp/c/arg/main.c|
| memcpy(buf + offset, buf, b_n)| 15        | main  | /tmp/c/arg/main.c|
""",
  """_______________________________________________________________________
| tracked                       | lineNumber| method| file             |
|======================================================================|
| ntohl(a)                      | 9         | main  | /tmp/c/arg/main.c|
| a_n = ntohl(a)                | 9         | main  | /tmp/c/arg/main.c|
| a_n + 5                       | 13        | main  | /tmp/c/arg/main.c|
| offset = a_n + 5              | 13        | main  | /tmp/c/arg/main.c|
| buf + offset                  | 15        | main  | /tmp/c/arg/main.c|
| memcpy(buf + offset, buf, b_n)| 15        | main  | /tmp/c/arg/main.c|
"""
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants