# Using joern-lib with Joern server

Common imports and console logging settings for the rest of this note.

In [1]:
import asyncio

from joern_lib import client, workspace
from joern_lib.detectors import common as cpg, js

# Configure console logging
from rich.console import Console

console = Console(
    log_time=False,
    log_path=False,
    width=280,
    color_system="auto"
)

joern_host = "http://joern:9000"
cpggen_host = "http://cpggen:7072"
joern_username = "admin"
joern_password  = "admin"

If you're running this note via docker compose, you can access the joern server using the hostname joern and port 7000.

In [3]:
async def connect():
    connection = await client.get(joern_host, cpggen_host, joern_username, joern_password)

asyncio.run(connect())

## Test connection with a simple query

We can begin by testing the connection with a simple query.

In [5]:
async def test_eval():
    connection = await client.get(joern_host, cpggen_host, joern_username, joern_password)
    res = await client.q(connection, "val a=1");
    print (res)
asyncio.run(test_eval())

╭─ CPGQL Query ─╮
│ val a=1       │
╰───────────────╯
{'response': 'a: Int = 1\n'}


**NOTE:** Joern lib uses asyncio and websockets to interact with the Joern server. Therefore, every use of the library methods must use the async-await syntax.

## Interacting with Joern workspace

Joern lib offers several useful methods to query and create joern workspaces. Full list of methods could be obtained by using dir method.

In [8]:
list(filter(lambda m: not m.startswith("__"), dir(workspace)))

['client', 'cpg_exists', 'create_cpg', 'delete_project', 'dir_exists', 'extract_dir', 'from_string', 'get_active_project', 'get_overlay_dir', 'get_path', 'import_code', 'import_cpg', 'json', 'ls', 'os', 'reset', 'set_active_project', 'slice_cpg']

### List any existing workspace



In [10]:
async def test_workspace():
    connection = await client.get(joern_host, cpggen_host, joern_username, joern_password)
    res = await workspace.ls(connection);
    print (res)
asyncio.run(test_workspace())

╭─ CPGQL Query ─╮
│ workspace     │
╰───────────────╯
res3: workspacehandling.WorkspaceManager[io.joern.joerncli.console.JoernProject] = 
_______________________________________________________________________________________________________________________________________________
| name                | overlays                                       | inputPath                                                     | open  |
| commons-text-1.10.0 |                                                | /tmp/cpggen_cpg_outzjw1_pa5/cpggenqldt9ve7-jimple-cpg.bin.zip | false |
| NodeGoat            | base,controlflow,typerel,callgraph,dataflowOss | /tmp/NodeGoat                                                 | false |



If the list is empty, it is time to create a new workspace by importing some code.

## Clone and scan NodeGoat

What better way to learn joern and this library than scanning a real application? This polynote comes with the GitPython package installed so that we can clone and import repos effortlessly. The directory /tmp is shared between polynote and joern containers to help with this exercise.

Let's clone the OWASP [NodeGoat](https://github.com/OWASP/NodeGoat) repo.

In [12]:
import git
import os

if not os.path.exists("/tmp/NodeGoat"):
    repo = git.Repo.clone_from("https://github.com/OWASP/NodeGoat.git", "/tmp/NodeGoat", branch="master", depth=1)
else:
    print("/tmp/NodeGoat already exists")

/tmp/NodeGoat already exists


Now that we have the source code, let's import this into joern.

In [14]:
async def create_workspace():
    connection = await client.get(joern_host, cpggen_host, joern_username, joern_password)
    res = await workspace.import_code(connection, "/tmp/NodeGoat", "NodeGoat")
    print (res)
asyncio.run(create_workspace())

╭──────────── CPGQL Query ────────────╮
│ os.exists(os.Path("/tmp/NodeGoat")) │
╰─────────────────────────────────────╯
╭────────────── CPGQL Query ──────────────╮
│ importCode("/tmp/NodeGoat", "NodeGoat") │
╰─────────────────────────────────────────╯
╭─ CPGQL Query ─╮
│ save          │
╰───────────────╯
True


### Queries

With the source code for NodeGoat imported, it is time to query joern about our application. We can ask about generic information such as list of files or advanced JavaScript specific information.

In [16]:
async def files_lister():
    connection = await client.get(joern_host, cpggen_host, joern_username, joern_password)
    # detectors.common is imported as cpg in the beginning of this notebook
    # list_files api is used to retrieve the list of files in the cpg
    res = await cpg.list_files(connection)
    print ([f["name"] for f in res])
asyncio.run(files_lister())

╭───── CPGQL Query ─────╮
│ cpg.file.toJsonPretty │
╰───────────────────────╯
['Gruntfile.js', 'app/assets/js/chart/chart-data-morris.js', 'app/assets/js/tour/redirects-steps.js', 'app/data/allocations-dao.js', 'app/data/benefits-dao.js', 'app/data/contributions-dao.js', 'app/data/memos-dao.js', 'app/data/profile-dao.js', 'app/data/research-dao.js', 'app/data/user-dao.js', 'app/routes/allocations.js', 'app/routes/benefits.js', 'app/routes/contributions.js', 'app/routes/error.js', 'app/routes/index.js', 'app/routes/memos.js', 'app/routes/profile.js', 'app/routes/research.js', 'app/routes/session.js', 'artifacts/db-reset.js', 'config/config.js', 'config/env/all.js', 'config/env/development.js', 'config/env/production.js', 'server.js', 'builtintypes']


NodeGoat is a Node.js application that use [Express.js framework](https://expressjs.com). We can use joern-lib to query for HTTP routes defined by the application.

In [18]:
async def js_insights():
    connection = await client.get(joern_host, cpggen_host, joern_username, joern_password)
    res = await js.list_http_routes(connection, include_middlewares=False)
    print ([n["node"]["code"] for n in res])
asyncio.run(js_insights())

╭─────────────────────────────────────────────────── CPGQL Query ────────────────────────────────────────────────────╮
│ cpg.call.code(".*(app|router).(head|get|post|put|patch|delete|options).*").argument.order(3).location.toJsonPretty │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
['/', '/login', '/login', '/signup', '/signup', '/logout', '/dashboard', '/profile', '/profile', '/contributions', '/contributions', '/benefits', '/benefits', '/allocations/:userId', '/memos', '/memos', '/learn', '/tutorial', '/tutorial/:page', '/research']


`list_http_routes` api includes [middlewares](https://expressjs.com/en/guide/using-middleware.html) by default. In the above example, we exclude middlewares by setting `include_middlewares=False`.

Next, we identify the various HTTP sources and sinks exposed by the application.

In [20]:
async def identify_sources_sinks():
    connection = await client.get(joern_host, cpggen_host, joern_username, joern_password)
    sources = await js.get_http_sources(connection)
    print (len(sources))
    sinks = await js.get_http_sinks(connection)
    print (len(sinks))
asyncio.run(identify_sources_sinks())

╭────────────────────────────────────────────────────────────────────────────────── CPGQL Query ───────────────────────────────────────────────────────────────────────────────────╮
│ cpg.call.code("(?i)(?s)(?i).*(req|ctx)\\.(originalUrl|path|protocol|route|secure|signedCookies|stale|subdomains|xhr|app|pipe|file|files|baseUrl|fresh|hostname|ip|url|ips|method │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
56
╭────────────────────────────────────────────────────────────────────────────────── CPGQL Query ───────────────────────────────────────────────────────────────────────────────────╮
│ cpg.call.code("(?i)(?s)(?i).*res\\.(append|attachment|cookie|clearCookie|download|end|format|get|json|jsonp|links|location|redirect|render|send|sendFile|sendStatus|set|status|t │
╰───────────────────────────────────────────────────────────────────────────────────────────

js detector comes with a couple of useful patterns called `REQUEST_PATTERN` and `RESPONSE_PATTERN` which gets used by the above methods.

### Reachability Queries

The power of joern is to connect the sources with sinks to identify vulnerable data flows using the CPGQL method `reachableByFlows`. An example query using this method is below.

In [22]:
async def vulnerable_flows():
    connection = await client.get(joern_host, cpggen_host, joern_username, joern_password)
    await client.df(connection, f'cpg.call.code("{js.REQUEST_PATTERN}")', f'cpg.call.code("{js.RESPONSE_PATTERN}")')
asyncio.run(vulnerable_flows())

╭────────────────────────────────────────────────────────────────────────────────── CPGQL Query ───────────────────────────────────────────────────────────────────────────────────╮
│ def source = cpg.call.code("(?s)(?i).*(req|ctx)\\.(originalUrl|path|protocol|route|secure|signedCookies|stale|subdomains|xhr|app|pipe|file|files|baseUrl|fresh|hostname|ip|url|i │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭────────────────────────────────────────────────────────────────────────────────── CPGQL Query ───────────────────────────────────────────────────────────────────────────────────╮
│ def sink = cpg.call.code("(?s)(?i).*res\\.(append|attachment|cookie|clearCookie|download|end|format|get|json|jsonp|links|location|redirect|render|send|sendFile|sendStatus|set|s │
╰──────────────────────────────────────────────────────────────────────────────────────────────

# Closing Thoughts

This notebook introduced joern-lib, a high-level python library to interact with Joern server. While this notebook focused on a JavaScript application, Joern supports many other languages such as Java, PHP, Python and so on.