# KubeHound 101 - Gremlin and DSL

A step by step example of basic Gremlin and KubeHound DSL queries.

## Getting started

Setting the connection variable to KubeHound graph db (mandatory). No active connection is made on this step (will be made on first query).
Connect to the kubegraph server by running the cell below

In [1]:
%%graph_notebook_config
{
  "host": "kubegraph",
  "port": 8182,
  "ssl": false,
  "gremlin": {
    "traversal_source": "g",
    "username": "",
    "password": "",
    "message_serializer": "graphsonv3"
  }
}

Setting the visualisation aspect of the graph rendering. **This step is also mandatory.**

Now set the appearance customizations for the notebook. You can see a guide on possible options [here](https://github.com/aws/graph-notebook/blob/623d43827f798c33125219e8f45ad1b6e5b29513/src/graph_notebook/notebooks/01-Neptune-Database/02-Visualization/Grouping-and-Appearance-Customization-Gremlin.ipynb#L680)

In [125]:
%%graph_notebook_vis_options
{
  "edges": {
    "smooth": {
      "enabled": true,
      "type": "dynamic"
    },
    "arrows": {
      "to": {
        "enabled": true,
        "type": "arrow"
      }
    }
  }
}

To run a query you need to start with the `%%gremlin` magic

In [99]:
%%gremlin
kh                // traversal source (KubeHound DSL) 
.V()              // retreive all the vertices
.count()          // count the number of results

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

To show a graph you need to add some option to make the graph more readable `%%gremlin -d class -g critical -le 50 -p inv,oute`

In [30]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh                // traversal source (KubeHound DSL) 
.V()              // retreive all the vertices
.path()           // wrap it with a path type (to show into a graph)
.by(elementMap()) // get details for each vertices (properties/values)

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

Raw information in the console tab (download CSV or XSLX). The search go through all the fields in the results.

Graph view to navigate through the results (can access properties info through the burger button when a vertice is selected).


## Constructing requests

Every vertices has a label associated which describes the type of the k8s resources (can be accessed through KubeHound DSL).

Raw gremlin query to select all pod in a k8s cluster.

In [3]:
%%gremlin
kh               // traversal source (KubeHound DSL) 
.V()             // retreive all the vertices
.hasLabel("Pod") // retreiving all the pods
.valueMap()      // transforming it to json with all properties value

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

Equivalent in KubeHound DSL:

In [196]:
%%gremlin
kh               // traversal source (KubeHound DSL) 
.V()             // retreive all the vertices
.hasLabel("Pod") // retreiving all the pods
.valueMap()      // transforming it to json with all properties value

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

List all nodes:

In [111]:
%%gremlin
kh           // traversal source (KubeHound DSL) 
.nodes()      // retreiving all the nodes
.valueMap()  // transforming it to json with all properties values

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

List all volumes:

In [3]:
%%gremlin
kh           // traversal source (KubeHound DSL) 
.volumes()      // retreiving all the volumes
.valueMap()  // transforming it to json with all properties values

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

List all endpoints

In [115]:
%%gremlin
kh           // traversal source (KubeHound DSL) 
.endpoints()      // retreiving all the endpoints
.valueMap()  // transforming it to json with all properties values

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

List all containers:

In [122]:
%%gremlin
kh           // traversal source (KubeHound DSL) 
.containers()      // retreiving all the containers
.valueMap()  // transforming it to json with all properties values

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

List all users:

In [108]:
%%gremlin
kh           // traversal source (KubeHound DSL) 
.users()      // retreiving all the users
.valueMap()  // transforming it to json with all properties values

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

List all groups:

In [109]:
%%gremlin
kh           // traversal source (KubeHound DSL) 
.groups()      // retreiving all the groups
.valueMap()  // transforming it to json with all properties values

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

List all service accounts:

In [124]:
%%gremlin
kh           // traversal source (KubeHound DSL) 
.sas()      // retreiving all the services account
.valueMap()  // transforming it to json with all properties values

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

For each type you can select specific resources based on its name (one or many).

Let's select 3 containers with specific names:

In [135]:
%%gremlin
kh           // traversal source (KubeHound DSL) 
             // selecting multiples containers with specific name
.containers("nsenter-pod","pod-create-pod", "host-read-exploit-pod")
.valueMap()  // transforming it to json with all properties values

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

For each type you can select specific resources based on its name (one or many). To get the exhaustive list you can use `.properties()`.

In [241]:
%%gremlin
kh            // traversal source (KubeHound DSL)   
.containers() // selecting multiples containers with specific name
.limit(1)     // limiting result to 1 container only
.properties() // printing the properties and the associated values

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

Most important common properties present for all KH resources.

In [237]:
%%gremlin
kh.containers().limit(1)
.properties("runID","app","cluster","isNamespaced", "namespace")

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

To select resources with specific properties, use the `.has()` and `.not()`.

In [169]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh.containers()
.has("image","ubuntu")           // looking for ubuntu based image container
.not(has("namespace","default")) // skipping any container present in default namespace
.path().by(elementMap())         // converting to Graph output

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

## Gremlin introduction

Basic gremlin function to play around KubeHound resources. All gremlin function can be access from KubeHound DSL.

* `properties()`: get all specified properties for the current element
* `values()`: get all specified property values for the current element
* `valueMap()` or `elementMap()`: get all specified property values for the current element as a map

Group results by key and value. This allows us to display some important value. 

* `group()`: group([key]).by(keySelector).by(valueSelector)  
* `unfold()`: unfold the incoming list and continue processing each element individually

In [275]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh.pods()                 // get all the pods
.group().by("namespace")  // group by namespaces
.by("name")               // filter only the name
.unfold()                 // transform the result to a list

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

Group and Count results by key. This gets metrics and KPI around k8s resources.

In [277]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh.pods()                      // get all the pods
.groupCount().by("namespace")  // group and count by namespaces
.unfold()                      // transform the result to a list

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

When using text value you can do some pattern matching using `TextP.<cmd>`
    
_Note:_ this can slows down a lot the query (not using index)

In [278]:
%%gremlin -d name -g class -le 50 -p inv,oute
kh.containers()          // get all containers
    // retrieve all registry.k8s.io/* image
.has("image", TextP.containing("registry.k8s.io"))
.path().by(elementMap()) // format it as graph

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

Classic operator that are useful to scope items of the research:

* `limit()`: Limit the number of results 
* `or()`: Classic `OR` operator, useful when selecting resources by properties
* `dedup()`: Will remove any duplicate on the object output (needs to scope to specific properties to make it work).

In [309]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh.containers()  // get all the containers
.values("image") // extract the image properties
.dedup()         // deduplicate the results

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

The step-modulator `by()` can be added in addition to other step to modulate the results. It can be added one or multiple times.  The `by()` modulator is commonly used with steps `aggregate()`, `dedup()`, `group()`, `groupCount()`, `order()`, `path()`, `select()`, `tree()`, and more.

* `by()`: If a step is able to accept functions, comparators, etc. then by() is the means by which they are added (like group() step)

One modulator for the `group()` filter for the `key`.

In [316]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh.endpoints()
.group()
.by("port")

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

The second modulator for the `group()` filter for the `value`.

In [320]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh.endpoints()
.group()
.by("port")
.by("portName")

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

There are some defined value to access specific “properties” of the vertices:

* `labels()` or `label`: It takes an Element and extracts its label from it.
* `key()` or `key`: It takes a Property and extracts the key from it.
* `value()` or `value`: It takes a Property and extracts the value from it.

In [326]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh.V()        // get all the vertices
.groupCount() // group and count occurencies
.by(label)    // count by label of vertices
.unfold()     // output as a list

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

## KubeHound RBAC

A permission set is the combination of role and role binding. The reason is that RoleBinding can “downgrade” the scope of a cluster role.

In [31]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh.permissions() // get the permissionsets
.valueMap()

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

List all the permission set flagged as `critical()`. This is equivalent to `.has("critical",true)`.

In [32]:
%%gremlin -d class -g critical -le 50 -p inv,oute
kh.permissions()                 // get the permissionsets
.critical()                      // limit to criticalAsset only
.valueMap("name","role","rules") // filter to specific properties

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

## Intermediate Gremlin

When building a path you need to access Edges and Vertices to know when to stop the path.

* `outV()`: get all outgoing vertices 
* `inV()`: get all incoming vertices 
* `outE()`: get all outgoing edges 
* `inE()`: get all incoming edges
* `out()`: get all adjacent vertices connected by outgoing edges

Note: you filter the elements you want to select with labels.

Example using `out*()`, building the attacks() DSL function.

In [109]:
%%gremlin -d class -g critical -le 50 -p inv,oute

kh.containers().outE().inV().path().by(valueMap())

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

This is the equivalent to `attacks()`. You should get the same results with it.

In [112]:
%%gremlin -d class -g critical -le 50 -p inv,oute

kh.containers().attacks().by(elementMap())

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

To build a path you need to iterate through the element and checks at every step if you want to stop or not.

* `loops()`: Indicate the number of iteration
* `repeat()`: Define the action you want to iterate
* `until()`: Set the condition for the loop
* `simplePath()`: Create a path with avoiding cyclic loop that will break the graph

To build a path you need to iterate through the element and checks at every step if you want to stop or not.


In [122]:
%%gremlin -d class -g critical -le 50 -p inv,oute

kh.endpoints().
repeat(
  outE().inV().simplePath()  // Building the path
).until(
    has("critical", true)    // Stop when meeting a critical asset
    .or().loops().is(4)      // Stop after 4 iteration
).has("critical", true)      // Keep only path ending with a critical asset
.path().by(elementMap())     // Output as a graph

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

This is an equivalent to `criticalPaths(4)`.

In [127]:
%%gremlin -d class -g critical -le 50 -p inv,oute

kh.endpoints()    // Start with all endpoints
.criticalPaths(4) // Build criticalPath with 4 max hops
.by(elementMap()) // Output as a graph

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

To extract the first element of a path, the local function allows to scope to the first resources.

* `local()`: Its purpose is to execute a child traversal on a single element within the stream.

In [124]:
%%gremlin -d class -g critical -le 50 -p inv,oute

kh.endpoints()    // List all endpoints
.criticalPaths()  // Generate the criticalPaths
.limit(local,1)   // Extract the first element
.dedup()          // Deduplicating result
.valueMap()       // Json output of the vertices properties

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…