Java: Identify more APIs as supported in the telemetry queries. #16297

michaelnebel · 2024-04-22T09:21:43Z

In this PR we improve the telemetry queries such that sources- and sink APIs defined in QL will be taken into account when deciding whether an API is supported or not.
To this end a couple of new modules ApiSinks and ApiSources are introduced.
In each of these modules a class of nodes are introduced where the intention is to add all sources and sinks pertaining to APIs.

Defined in Models as Data.
Defined in QL re-usable libraries and used in data flow configurations as sources or sinks for queries.

Only source- and sink definitions that are sufficiently concrete have been added (that is, sources and sinks that just "guesses" based on string pattern matching on names are not included).

This is a best effort attempt to improve the telemetry reporting.

java/ql/lib/semmle/code/java/dataflow/FlowSources.qll

java/ql/lib/semmle/code/java/dataflow/ApiSources.qll

java/ql/lib/semmle/code/java/dataflow/ApiSinks.qll

java/ql/lib/semmle/code/java/dataflow/ApiSources.qll

michaelnebel · 2024-04-24T07:03:57Z

DCA looks.
As expected the DIL size and execution (stage timings) of the telemetry queries increased. This is not surprising, especially if the compute time for some of the sources and sinks now contribute to the telemetry query stage timings.
Overall DCA reports a slowdown of around 2%, which I believe is an upper limit (as the sources and sinks needs to be computed anyways for the queries to run).

owen-mc · 2024-04-24T08:55:54Z

Will this need to be updated when a new query is added/promoted? If so, how will we remember to do that?

michaelnebel · 2024-04-24T14:36:56Z

Will this need to be updated when a new query is added/promoted? If so, how will we remember to do that?

Excellent question.
I hope we won't have to update this whenever a query is added/promoted.
(1) If a query is based purely on MaD then all these are by default when models are added (or promoted out of experimental).(2) It is only necessary to add the sources or sinks of a query in case we think accurately define a group or a specific set of sources or sinks (based on API calls). When I looked through the java data flow configurations, I tried to avoid adding sources and sinks that was defined "with a broad net" (for instance pattern matching in parts of a method name).

The query doesn't have to be 100% accurate. We just need to at least have a best effort of incorporating QL sources and sinks.

owen-mc · 2024-04-25T09:22:53Z

I have skimmed this, but not reviewed in great detail. Here is the approach that I thought you were taking, but then I saw you weren't: define an abstract class SinkNode extending DataFlow::Node, put it somewhere that's imported by everything, make all the sink definitions you are interested in extend that, and then use that in the telemetry query. The advantage of this is that it's more discoverable: people will see SinkNode in various places, maybe in queries that they are copying when they're making a new query, they'll see the QLDoc which explains what it does, and then they are more likely to use it in their new queries. The other advantage is that it's easier to add new sinks. Is there a reason why that approach wouldn't work, or that your approach is preferable?

michaelnebel · 2024-04-26T07:13:22Z

I have skimmed this, but not reviewed in great detail. Here is the approach that I thought you were taking, but then I saw you weren't: define an abstract class SinkNode extending DataFlow::Node, put it somewhere that's imported by everything, make all the sink definitions you are interested in extend that, and then use that in the telemetry query. The advantage of this is that it's more discoverable: people will see SinkNode in various places, maybe in queries that they are copying when they're making a new query, they'll see the QLDoc which explains what it does, and then they are more likely to use it in their new queries. The other advantage is that it's easier to add new sinks. Is there a reason why that approach wouldn't work, or that your approach is preferable?

That is a fair point and I will look into it - thank you! The reason I picked the approach here was to avoid have bidirectional imports between the Query like modules.

aschackmull · 2024-04-26T07:54:12Z

There are a couple of different design options: There needs to be a central qll file that imports all of the sink definitions in order for the telemetry queries to see them (so we can't just have query-specific files extend SinkNode and be done).
A. This can be done either as it's currently done with a single qll that references the relevant sink definitions from their query-specific qlls and forms a big union.
B. Or it could be with a diamond-shaped import pattern such that one qll defines an abstract SinkNode class, which all of the sink definitions then extend, and then there's a separate telemetry-specific qll that imports all of the files containing extensions of SinkNode. (We probably don't want all the query-file imports to be in the file containing the SinkNode definition as that would collapse the import graph to one big SCC - that's the bi-directional import pattern, which we probably don't want here.)
C. A third alternative is to refactor all the sink definitions into something like Concepts, i.e. a combined collection of definitions that all of the query files would reference. Then the SinkNode definition and all of the sink definitions could be in the same file (or import-SCC of files). That's somewhat close to the bi-directional import that I just claimed we didn't want, except that it'll be much more clear which parts of the query logic that's import-reachable from all queries and which parts that remain local to the individual queries, i.e. putting something in scope of all queries will be an explicit choice rather than an accidental consequence of a massive bidirectional import-SCC.

michaelnebel · 2024-04-26T08:10:52Z

There are a couple of different design options: There needs to be a central qll file that imports all of the sink definitions in order for the telemetry queries to see them (so we can't just have query-specific files extend SinkNode and be done). A. This can be done either as it's currently done with a single qll that references the relevant sink definitions from their query-specific qlls and forms a big union. B. Or it could be with a diamond-shaped import pattern such that one qll defines an abstract SinkNode class, which all of the sink definitions then extend, and then there's a separate telemetry-specific qll that imports all of the files containing extensions of SinkNode. (We probably don't want all the query-file imports to be in the file containing the SinkNode definition as that would collapse the import graph to one big SCC - that's the bi-directional import pattern, which we probably don't want here.) C. A third alternative is to refactor all the sink definitions into something like Concepts, i.e. a combined collection of definitions that all of the query files would reference. Then the SinkNode definition and all of the sink definitions could be in the same file (or import-SCC of files). That's somewhat close to the bi-directional import that I just claimed we didn't want, except that it'll be much more clear which parts of the query logic that's import-reachable from all queries and which parts that remain local to the individual queries, i.e. putting something in scope of all queries will be an explicit choice rather than an accidental consequence of a massive bidirectional import-SCC.

Thank you! I am already working on alternative B ;-)

java/ql/lib/semmle/code/java/dataflow/FlowSources.qll

java/ql/lib/semmle/code/java/dataflow/ApiSinks.qll

java/ql/lib/semmle/code/java/dataflow/FlowSources.qll

java/ql/lib/semmle/code/java/security/InsecureTrustManager.qll

java/ql/lib/semmle/code/java/dataflow/ApiSinks.qll

… defined sources).

… defined sinks).

java/ql/lib/semmle/code/java/dataflow/FlowSinks.qll

michaelnebel · 2024-04-29T07:15:43Z

DCA looks good.

owen-mc

This looks good. Some minor nitpicks/questions. Also I had a quick look through and thought you might also want to include these sinks:

CookieSink in java/ql/lib/semmle/code/java/security/InsecureRandomnessQuery.qll
FileCreationSink in java/ql/lib/semmle/code/java/security/TempDirLocalInformationDisclosureQuery.qll

java/ql/lib/semmle/code/java/dataflow/ApiSinks.qll

java/ql/lib/semmle/code/java/dataflow/FlowSources.qll

java/ql/lib/semmle/code/java/security/AndroidSensitiveCommunicationQuery.qll

Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com>

michaelnebel · 2024-05-03T09:18:56Z

@owen-mc : Thank you for the feedback!

You suggestion on incorporating FileCreationSink and CookieSink has been included as well. I removed Api nodes that were just used for "safe" data flow configurations (used to exclude results). I am aware that if such configs had been using the MaD models, they would not be excluded, but I just want to draw the line somewhere.
The Api Source and Sink nodes are "just" used for telemetry purposes and the telemetry queries are mostly relevant for us to discover "unsupported" APIs. If it turns out that we should consider more QL sources and sinks as supported, we can add them (but then I would like to see those as results in the top 100 of unsupported APIs).

github-actions bot added the Java label Apr 22, 2024

michaelnebel force-pushed the java/improveapitelemetry branch from 481e289 to b8dd047 Compare April 22, 2024 09:22

github-advanced-security bot found potential problems Apr 22, 2024

View reviewed changes

michaelnebel force-pushed the java/improveapitelemetry branch from b8dd047 to 67d292c Compare April 23, 2024 11:56

github-advanced-security bot found potential problems Apr 23, 2024

View reviewed changes

java/ql/lib/semmle/code/java/dataflow/ApiSinks.qll Fixed Show fixed Hide fixed

java/ql/lib/semmle/code/java/dataflow/ApiSources.qll Fixed Show fixed Hide fixed

michaelnebel force-pushed the java/improveapitelemetry branch from 67d292c to 8f940fc Compare April 23, 2024 13:13

michaelnebel changed the title ~~Java: Indentify more APIs as supported in the telemetry queries (as QL …~~ Java: Identify more APIs as supported in the telemetry queries. Apr 23, 2024

michaelnebel force-pushed the java/improveapitelemetry branch from 8f940fc to ac3f035 Compare April 24, 2024 06:59

michaelnebel added the no-change-note-required This PR does not need a change note label Apr 24, 2024

michaelnebel marked this pull request as ready for review April 24, 2024 07:04

michaelnebel requested a review from a team as a code owner April 24, 2024 07:04

michaelnebel marked this pull request as draft April 26, 2024 07:14

github-advanced-security bot found potential problems Apr 26, 2024

View reviewed changes

michaelnebel added 4 commits April 26, 2024 12:39

Java: Add test example of a supported sink defined in QL.

06f987a

Java: Identify more APIs as supported in the telemetry queries (as QL…

acb2bbb

… defined sources).

Java: Identify more APIs as supported in the telemetry queries (as QL…

9db32f4

… defined sinks).

Java: Update SupportedExternalApi expected test output.

b754706

michaelnebel force-pushed the java/improveapitelemetry branch from 79f54fb to 5f660c4 Compare April 26, 2024 10:43

github-advanced-security bot found potential problems Apr 26, 2024

View reviewed changes

java/ql/lib/semmle/code/java/dataflow/FlowSinks.qll Fixed Show fixed Hide fixed

java/ql/lib/semmle/code/java/dataflow/FlowSinks.qll Fixed Show fixed Hide fixed

michaelnebel force-pushed the java/improveapitelemetry branch 4 times, most recently from fa4912f to fe654be Compare April 26, 2024 11:40

michaelnebel force-pushed the java/improveapitelemetry branch from fe654be to 8758dec Compare April 26, 2024 11:43

Java: Improve the Api sources and sinks implementation.

f95b330

michaelnebel force-pushed the java/improveapitelemetry branch from 8758dec to f95b330 Compare April 26, 2024 12:17

michaelnebel marked this pull request as ready for review April 29, 2024 07:15

michaelnebel mentioned this pull request Apr 30, 2024

C#: Base telemetry Api Source/Sink nodes on abstract classes. #16356

Merged

owen-mc reviewed May 2, 2024

View reviewed changes

java/ql/lib/semmle/code/java/dataflow/ApiSinks.qll Show resolved Hide resolved

java/ql/lib/semmle/code/java/dataflow/FlowSources.qll Outdated Show resolved Hide resolved

java/ql/lib/semmle/code/java/security/AndroidSensitiveCommunicationQuery.qll Outdated Show resolved Hide resolved

Java: Address review comments and some other code quality improvements.

8def1c2

michaelnebel force-pushed the java/improveapitelemetry branch from 5245fd8 to 8def1c2 Compare May 3, 2024 09:12

Update java/ql/lib/semmle/code/java/dataflow/FlowSources.qll

c07bf65

Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com>

owen-mc approved these changes May 3, 2024

View reviewed changes

michaelnebel merged commit 95ff5ba into github:main May 3, 2024
15 checks passed

michaelnebel deleted the java/improveapitelemetry branch May 3, 2024 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java: Identify more APIs as supported in the telemetry queries. #16297

Java: Identify more APIs as supported in the telemetry queries. #16297

michaelnebel commented Apr 22, 2024 •

edited

michaelnebel commented Apr 24, 2024

owen-mc commented Apr 24, 2024

michaelnebel commented Apr 24, 2024

owen-mc commented Apr 25, 2024

michaelnebel commented Apr 26, 2024

aschackmull commented Apr 26, 2024

michaelnebel commented Apr 26, 2024 •

edited

michaelnebel commented Apr 29, 2024

owen-mc left a comment

michaelnebel commented May 3, 2024

Java: Identify more APIs as supported in the telemetry queries. #16297

Java: Identify more APIs as supported in the telemetry queries. #16297

Conversation

michaelnebel commented Apr 22, 2024 • edited

michaelnebel commented Apr 24, 2024

owen-mc commented Apr 24, 2024

michaelnebel commented Apr 24, 2024

owen-mc commented Apr 25, 2024

michaelnebel commented Apr 26, 2024

aschackmull commented Apr 26, 2024

michaelnebel commented Apr 26, 2024 • edited

michaelnebel commented Apr 29, 2024

owen-mc left a comment

Choose a reason for hiding this comment

michaelnebel commented May 3, 2024

michaelnebel commented Apr 22, 2024 •

edited

michaelnebel commented Apr 26, 2024 •

edited