Skip to content
Semantic Code Property Graph: specification, query language, and utilities
Branch: master
Clone or download
fabsx00 Bring in steps for backwards compatability (#257)
* Bring in steps for backwards compat.

* scalafmt.

* Fix build.
Latest commit 26c5d84 Jul 17, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
codepropertygraph Syntax tree traversals (#256) Jul 17, 2019
cpg2overflowdb Syntax tree traversals (#256) Jul 17, 2019
cpgclientlib Revert "Ship cpg-version, but don't install it into scripts. (#203)" Jun 2, 2019
cpgqueryingtests Syntax tree traversals (#256) Jul 17, 2019
cpgserver scalafmt. (#219) Jun 6, 2019
cpgvalidator Fix typo. (#241) Jul 3, 2019
dataflowengine Code improvements Jun 13, 2019
enhancements Close resources after use - tests (#227) Jun 13, 2019
images Corrected image size. Oct 30, 2018
project Adjustments to change in tinker graph. (#247) Jul 9, 2019
proto-bindings generate protoc go bindings Apr 24, 2019
query-primitives Bring in steps for backwards compatability (#257) Jul 17, 2019
resources/cpgs Move Java specific cpg entities to open source. (#168) May 15, 2019
samples Sample boilerplate for enhancing the CPG with additional nodes/edges. ( May 29, 2019
semanticcpg Syntax tree traversals (#256) Jul 17, 2019
.gitattributes unify public and private projects (#101) Apr 2, 2019
.gitignore reapply ondisk overflow v2 (#188) May 23, 2019
.scalafmt.conf initial open sourced version of codepropertygraph Jul 3, 2018
.travis.yml Next attempt to publish to pypi. (#195) May 24, 2019
LICENSE apache license Jul 11, 2018
README.md Remove spec from README.md and link instead. (#253) Jul 11, 2019
build-dotnet-bindings.sh wire up nuget script with sbt Jul 27, 2018
build.sbt Syntax tree traversals (#256) Jul 17, 2019
cpg2overflowdb.sh cpg2overflowdb (#238) Jun 27, 2019
cpgvalidator.sh Add first version of a CPG validator. (#176) May 17, 2019
private-key.pem.enc re-add signing key Sep 8, 2018
testserver.sh Python testing (#184) May 21, 2019

README.md

Build Status Maven Central

Code Property Graph - Specification and Tooling

Note: for first-time users, we recommend building "joern" at https://github.com/ShiftLeftSecurity/joern/ instead. It combines this repo with a C/C++ language frontend to construct a complete code analysis platform.

A Code Property Graph (CPG) is an extensible and language-agnostic representation of program code designed for incremental and distributed code analysis. This repository hosts the base specification together with a build process that generates data structure definitions for accessing the graph with different programming languages.

We are publishing the Code Property Graph specification as a suggestion for an open standard for the exchange of code in intermediate representations along with analysis results. With this goal in mind, the specification consists of a minimal base schema that can be augmented via extension schemas to enable storage of application-specific data.

Building the code

Note: for first-time users, we recommend building "joern" at https://github.com/ShiftLeftSecurity/joern/ instead. It contains a code property graph generator for C/C++, a component for querying the code property graph, as well as a few helpful examples to get started.

The build process has been verified on Linux and it should be possible to build on OS X and BSD systems as well. The build process requires the following prerequisites:

Some binary files required for testing are managed through git-lfs. If you haven't cloned this repository yet, simply run git lfs install. If you have cloned it already, additionally run git lfs pull (from within the repository).

Additional build-time dependencies are automatically downloaded as part of the build process. To build and install into your local Maven cache, issue the command sbt publishM2.

This command will install the following artifacts:

  • codepropertygraph-VERSION.jar: Java and Scala classes to be used in combination with the ShiftLeft Tinkergraph [3].

  • codepropertygraph-protos-VERSION.jar: Java bindings for Google's Protocol Buffer definitions

Creating Protocol Buffer bindings for different languages

The codepropertygraph-VERSION.jar artifact contains a Protocol Buffer definition file cpg.proto that you can use to generate your own language-specific bindings. For instance, to create C++ and Python bindings, issue the following series of commands:

sbt package
cd codepropertygraph/target
unzip codepropertygraph-*.jar cpg.proto
mkdir cpp python
protoc --cpp_out=cpp --python_out=python cpg.proto

Base schema for the Code Property Graph

You can find the code property graph specification at:

https://docs.shiftleft.io/shiftleft/using-shiftleft-ocular/getting-started/cpg-deep-dive

Loading a codepropertygraph

Here's how you can load a cpg into ShiftLeft Tinkergraph [3] in the sbt console - the next section will list some queries you can interactively run from there.

There are some sample cpgs in this repository in the resources/cpgs directory.

Tinkergraph (in memory reference db)

sbt semanticcpg/console
val cpg = io.shiftleft.cpgloading.CpgLoader.load("cpg.bin.zip")

Querying the cpg

Once you've loaded a cpg you can run queries, which are provided by the query-primitives subproject. Note that if you're in the sbt shell you can play with it interactively: TAB completion is your friend. Otherwise your IDE will assist.

Here are some simple traversals to get all the base nodes. Running all of these without errors is a good test to ensure that your cpg is valid:

cpg.literal.toList
cpg.file.toList
cpg.namespace.toList
cpg.types.toList
cpg.methodReturn.toList
cpg.parameter.toList
cpg.member.toList
cpg.call.toList
cpg.local.toList
cpg.identifier.toList
cpg.argument.toList
cpg.typeDecl.toList
cpg.method.toList
cpg.methodInstance.toList

From here you can traverse through the cpg. The query-primitives DSL ensures that only valid steps are available - anything else will result in a compile error:

cpg.method.name("getAccountList").parameter.toList
/* List(
 *   MethodParameterIn(Some(v[7054781587948444580]),this,0,this,BY_SHARING,io.shiftleft.controller.AccountController,Some(28),None,None,None),
 *   MethodParameterIn(Some(v[7054781587948444584]),request,2,request,BY_SHARING,javax.servlet.http.HttpServletRequest,Some(28),None,None,None),
 *   MethodParameterIn(Some(v[7054781587948444582]),response,1,response,BY_SHARING,javax.servlet.http.HttpServletResponse,Some(28),None,None,None)
 *   )
 **/

cpg.method.name("getAccountList").definingTypeDecl.toList.head
// TypeDecl(Some(v[464]),AccountController,io.shiftleft.controller.AccountController,false,List(java.lang.Object))

References

[1] Rodriguez and Neubauer - The Graph Traversal Pattern: https://pdfs.semanticscholar.org/ae6d/dcba8c848dd0a30a30c5a895cbb491c9e445.pdf

[2] Yamaguchi et al. - Modeling and Discovering Vulnerabilities with Code Property Graphs https://www.sec.cs.tu-bs.de/pubs/2014-ieeesp.pdf

[3] The ShiftLeft Tinkergraph https://github.com/ShiftLeftSecurity/tinkergraph-gremlin

You can’t perform that action at this time.