Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acceptance tests #166

Closed
harpocrates opened this issue Jan 13, 2017 · 9 comments
Closed

Acceptance tests #166

harpocrates opened this issue Jan 13, 2017 · 9 comments

Comments

@harpocrates
Copy link
Contributor

We will communicate with TA1 providers our expectations about their data via acceptance tests (basically unit tests to enforce syntactic/semantic validity, with respect to our understanding, of their data). These are to be delivered using the adapt-tester, and should be constantly refined until April.

This issue will track progress on this front and provide a place for keeping note of things suggested.

So far, we have come up with the following ideas of things to test as a bare minimum:

  • parsing
  • ground truth tests and assertions
  • TA1 specific tests
  • one of every CDM type (as makes sense per TA1) - aka the "Zen cheeseburger"
  • deduplication

Later on, once CDM14 is out, we will be able to leverage "epoch" also.

@harpocrates
Copy link
Contributor Author

harpocrates commented Jan 13, 2017

WRT the zen cheeseburger, these are the CDM statements that are missing for each TA1 (from the engagement data)

  • SOURCE_FREEBSD_DTRACE_CADETS: MemoryObject, ProvenanceTagNode, RegistryKeyObject, SrcSinkObject, TagEntity, Value
  • SOURCE_ANDROID_JAVA_CLEARSCOPE: MemoryObject, RegistryKeyObject, TagEntity, Value
  • SOURCE_WINDOWS_DIFT_FAROS: MemoryObject, RegistryKeyObject, TagEntity, Value
  • SOURCE_WINDOWS_FIVEDIRECTIONS: MemoryObject, TagEntity, Value
  • SOURCE_LINUX_THEIA: Value
  • SOURCE_LINUX_AUDIT_TRACE: Value

Obviously no one has implemented anything about Value, so it seems a bit foolish to have tests for that. That said, I'm a bit surprised SOURCE_WINDOWS_DIFT_FAROS doesn't have RegistryKeyObject. Should we include a test for this anyways?

Also, it appears there are a bunch more sources for which we have no samples. What should the default be for those? Expect statements of every type?

@davearcher
Copy link
Contributor

davearcher commented Jan 14, 2017 via email

@rrwright
Copy link
Contributor

For now, I think we should immediately fail when detecting a source for which we haven't planned tests. I think Dave is right that CDM13 started off being a bit more aspirational. But I don't think most teams are going to add that in CDM 14. So let's just make an early test that fails (and the remainder abort?) unless the source data claims to be one of the types we plan to see.

@davearcher
Copy link
Contributor

davearcher commented Jan 14, 2017 via email

@harpocrates
Copy link
Contributor Author

Consolidating from email, here are more (syntactic) tests to do eventually

A few below are labeled “HOLD”, because (as of Jeff’s recognition this morning that CDM is slightly broken) we need to let a fix settle out. The others should be good to go.

  • the UUID field of a PTN must not match that of any other PTN
  • the srcSinkObject field of a PTN must refer to a subject or object type dependent on the type of event pointed to by the event field, as follows
    • bind, connect, accept, sendmsg, recvfrom, sendto, recv events: srcSinkObject = NetFlow
    • open, close: srcSinkObject = File or NetFlow
    • read, write, unlink, create-object, dup, fnctl, mmap, modify attributes, truncate, update: srcSinkObject = File
    • mprotect, shm: srcSinkObject = Memory
    • change-principal, clone, create-thread, execute, fork, signal, unit, wait, exit: srcSinkObject = Subject, with subjectType = Process or Thread
  • the subject field of a PTN must refer to a subject, with subjectType = Process, Thread, or Unit
  • (HOLD this one - CDM currently under definition) the event field of a PTN must refer to an event object, where eventType is one of the above event types
  • (HOLD also) the subject field of a PTN must match the subject field of the event referred to in the event field
  • (HOLD also) the srcSinkObject field of a PTN must match the predicateObject field of the referenced event
  • the prevTagId field of a PTN must refer to another PTN, or be zero (the "no provenance" value)
  • (HOLD also) if the subject field of a PTN A is the same as the subject field of the PTN (B) referenced in its prevTagId field, then the sequence number of the event referred to in A must be less than the sequence number referred to in B
  • exactly one of the prevTagId field and the tagIds field must be non-null (that is, one of the two must be populated, but only one)
  • if opcode of a PTN is null, then tagIds must be empty (and thus prevTagId must not be null)
  • if opcode of a PTN is non-null, then it must be UNION
  • if PTN.prevTagId is not null, then the cTag and iTag values of the PTN must match those of the PTN referred to in prevTagId
  • if PTN.prevTagId is zero, then cTag and iTag must be populated (if the TA1 provider claims to support confidentiality and trustworthiness tagging)
  • (HOLD also) events of type read, write, sendmsg, sendto, recvfrom, recv, dup, mmap must all be referenced as the relevant event of some PTN
  • Each non-null parameter (Value record) in an event must contain at least 1 runLengthTuple
  • Each runLengthTuple in a Value record must have both a non-zero natural number (for length) and the UUID of a valid PTN

@rrwright
Copy link
Contributor

@harpocrates We need to add a test to ensure UUID uniqueness (ironically)—to ensure that only one node ever claims to have any given UUID. There is talk by TA1s about updating data by issuing a new event that reuses an old UUID (and so it would replace earlier data). We need to shut that down every way possible. So let's encode it in a test ASAP.

@rrwright
Copy link
Contributor

PS - @harpocrates after that's done, you should close this issue. :-)

@harpocrates
Copy link
Contributor Author

Fixed (not super efficiently) with 1a25b8c.

@rrwright
Copy link
Contributor

@harpocrates did you deploy these tests yet? I don't see them being run in the version I just ran a moment ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants