# CBTH Search, Feed & Watchlist Short Demo

This Jupyter Notebook will provide a brief walkthrough of the search, feed and watchlist fucntionality in CB ThreatHunter. For more detail walkthrough of what is possible via the API please take a look at the other notebooks available in this repo.

## Prerequisites

There are two prerequisites for using this code: first, you need credentials to log into the API for your
Cb PSC organization; and second, you need the `cbapi` bindings to use this Python code directly. If you
want to use another language, or to call the REST API endpoints manually, you won't need to install `cbapi`.

### API Credentials

The first step is to create connectors in your Cb PSC organization. Log into the console and follow the
instructions at https://developer.carbonblack.com/reference/cb-defense/authentication/ to create an`API` type connector

Once you have your connector, you'll need the following information:

1. URL endpoint (e.g. `defense-prod05.conferdeploy.net`) for the APIs. This is the same URL you would use for the PSC Web UI
2. Connector ID and API key for the API connector
4. "Org key" - this is a unique identifier for your org and is displayed on the top of API Keys page

### Install cbapi

The second step is only if you want to run this code directly. This python script uses the `cbapi`
module. The support for ThreatHunter in `cbapi` is being actively developed in a fork available from
https://github.com/trailofbits/cbapi-python/tree/tob-cbth. To run this code as-is, you need to `git clone`
that repository, change into the `tob-cbth` branch, and install `cbapi` in a virtualenv.

`cbapi` uses credential file to read the API secret keys. Whenever you write scripts to interact with the
Cb APIs (or any API for that matter) you should **always** keep your API secret keys separate from your script.
If your script is ever exposed, either intentionally (by sharing it), or accidentally, then your API token
could be compromised if it were embedded inside your script.

To learn more about credential files and `cbapi`, see the docs at https://cbapi.readthedocs.io/en/latest/#api-credentials.

## Documentation

More information on configuring `cbapi`:
https://cbapi.readthedocs.io/en/latest/installation.html

Documentation for the ThreatHunter APIs is now available on the Developer Network website at: https://developer.carbonblack.com/reference/cb-threathunter/

## Story Line

To make things a little more interesting the API used today will be coupled with a hypotheical, but entirely plausible story line which would warrant the use of search, watchlists, and feeds in CB ThreatHunter. 

We will focus on a public source of intel that is coming from a CBTH user already formated in the form of CBTH queries. In this scenario the user is sharing search through a twitter account: `@Heinzeralli`

## Setup

First lets add a paragraph to get the initial configuration of the `cbapi` objects ready to go
If desired you can enable debug logging, which will provide an output of the underlying REST API calls that are made to the backend

In [1]:
from cbapi.psc.threathunter import *
#pretty printer for formating of the json responses
import pprint
import time

# for debug logging import the logging module, and configure `cbapi` to DEBUG logging level
#import logging
#logging.basicConfig()
#logging.getLogger("cbapi").setLevel(logging.DEBUG)
#bump logging back to info after debug session 
#logging.getLogger("cbapi").setLevel(logging.INFO)


# The following will fail if you have not yet set up your credentials file
th = CbThreatHunterAPI(profile="devday") # profile is the name of your config block in you credentials file
orgkey = th.credentials['org_key'] # makes for shorter URLs in future paragraphs

## Search

Now that we are set up, lets take a quick look at how search works for CB ThreatHunter. ThreatHunter is based on scalable and multi-tenant architecture that is capable of ingressing tens of millions of events per second. Searches are built to scale and perform.
Searches in ThreatHunter are asynchronous. This allows better experience to both API and UI user since search can be initiated quickly and results can be gathered later and incrementally. Another advantage is that results can be referenced over and over, without re-running the search.

For the purposes of today's demo we are going to skip over how to access these APIs either directly using something like `curl` and making raw requests leveraging `cbapi` supporting functions. Instead we will jump right to the full `cbapi` supported objects.

**NOTE** for more information on search API, including architectural diagrams, reference the [Search Notebook](search_demo.ipynb)

After following `@Heinzeralli` for some period of time we finally get the urge to start looking for some of the "IOC" data that he is sharing in our own environment. In their most recent tweet `@Heinzeralli` shared some indications of potentially malicious Microsoft Excel activity, in particular Excel launching an instance of powershell.

In [5]:
query = th.select(Process).where("process_name:excel.exe childproc_name: powershell.exe -enriched:True")
query_results = list(query) # get all results from this query into a list
pprint.pprint(query_results)

[<cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-000008b0-00000000-1d57793423b0e30> @ https://defense-prod05.conferdeploy.net,
 <cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-00001694-00000000-1d577930b7260e2> @ https://defense-prod05.conferdeploy.net,
 <cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-00000f10-00000000-1d57793dd82137a> @ https://defense-prod05.conferdeploy.net,
 <cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-000008b0-00000000-1d57793423b0e30> @ https://defense-prod05.conferdeploy.net,
 <cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-00000f10-00000000-1d57793dd82137a> @ https://defense-prod05.conferdeploy.net,
 <cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-00000f10-00000000-1d57793dd82137a> @ https://defense-prod05.conferdeploy.net,
 <cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-00000f10-00000000-1d57793dd82137a> @ https://defense-prod05.conferdeploy.net,
 <cbapi.psc.threathu

## Results Segmentation
Like Cb Response, each query result actually represents a process "segment" - that is, a set of events
associated with a process. 
If you issue default search request, all the segments will be returned, which could cause a lot of results that appear to be duplicates for long-living processes.

For more information about process segments, see https://developer.carbonblack.com/reference/enterprise-response/6.1/process-api-changes/#new-immutable-model.

**To create a unique list of process IDs (process_guid) we will create a map ourselves**

In [6]:
unique_processes = {r.process_guid:r for r in query_results}
pprint.pprint(unique_processes)

{'N4LFP2KN-00c0fdce-000008b0-00000000-1d57793423b0e30': <cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-000008b0-00000000-1d57793423b0e30> @ https://defense-prod05.conferdeploy.net,
 'N4LFP2KN-00c0fdce-00000f10-00000000-1d57793dd82137a': <cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-00000f10-00000000-1d57793dd82137a> @ https://defense-prod05.conferdeploy.net,
 'N4LFP2KN-00c0fdce-00001694-00000000-1d577930b7260e2': <cbapi.psc.threathunter.models.Process: id N4LFP2KN-00c0fdce-00001694-00000000-1d577930b7260e2> @ https://defense-prod05.conferdeploy.net}


Now lets take a deeper look at one of the process objects to see what sort of information has been retrieved.

In [7]:
interesting_process_guid='N4LFP2KN-00c0fdce-000008b0-00000000-1d57793423b0e30'
process = unique_processes[interesting_process_guid]
print(process)

Process object, bound to https://defense-prod05.conferdeploy.net.
-------------------------------------------------------------------------------

       backend_timestamp: 2019-09-30T13:40:09.029Z
         childproc_count: 0
         crossproc_count: 16
      device_external_ip: 96.234.213.61
            device_group: 
               device_id: 12647886
      device_internal_ip: 
             device_name: pscr-training
               device_os: WINDOWS
        device_policy_id: 12202
        device_timestamp: 2019-09-30T13:34:44.118Z
                enriched: False
           filemod_count: 12
             index_class: default
            ingress_time: 1569850764922
                  legacy: False
           modload_count: 130
           netconn_count: 2
                  org_id: N4LFP2KN
             parent_guid: N4LFP2KN-00c0fdce-00000eb4-00000000-1d57790a364...
             parent_hash: ['40ee6feb000be5ef2be0b850cc85a4d7', '7aa9355ce...
             parent_name: c:\windows\explorer

# Reports, Watchlists, and Feeds

We confirmed that we are seeing some of the activity that `@Heinzeralli` was tweeting about in our environment, now rather than doing something like setting up a local cron job to repeated run the search for us, we would like to use the similar fucntionality that is already built-in to CB ThreatHunter.

To do this we will explore the concept of iocs, reports, watchlists, and feeds.

**NOTE** For more detailed information on the feeds and watchlists data model please take a look at [Watchlist Demo](watchlist_demo.ipynb)

## Create a Query Based Report

In [13]:
report_dict = {
    "id":"randomidentifier",
    "timestamp": int(time.time()),
    "link": "https://devday2019.carbonblack.com/excelPowershellReport",
    "title": "Excel with powershell child process",
    "description": "Detected an instance of Excel spawning powershell.exe",
    "severity": 8,
    "iocs_v2": [
        {
            "id":"excel_powershell_child",
            "match_type":"query",
            "values":["process_name:excel.exe childproc_name:powershell.exe"],
            "link": "https://devday2019.carbonblack.com/excelPowershellReport"
        }
    ]
}

report_obj = th.create(Report,report_dict)
report_obj.save_watchlist()
print(report_obj)

Report created without feed ID or not from watchlist


Report object, bound to https://defense-prod05.conferdeploy.net.
-------------------------------------------------------------------------------

             description: Detected an instance of Excel spawning powershe...
                      id: uPKw3bRTQQCcZVWICV3HQg
                    iocs: None
                 iocs_v2: [{'id': 'notepad_child_proc', 'match_type': 'qu...
                    link: https://devday2019.carbonblack.com/excelPowersh...
                severity: 8
                    tags: None
               timestamp: 1569855699
                   title: Excel with powershell child process
              visibility: None


## Create a Custom Watchlist

You would have noticed the warning that we recieve from the API call stating that the report we just created is not part of a watchlist or feed. Just creating a report is not enough to have this query be recurringly run against our data, so we will now create a watchlist introduce this reccuring behavior.

To start our watchlist will have single report, and we can reference the id for the object that we created above.

We want to levearge this watchlist to both tag data that matches our query and, because we see the results as highly suspcious, generate an alert in the PSC console.

In [2]:
ts = int(time.time())
watchlist_dict = {
    "create_timestamp": ts,
    "last_update_timestamp":ts,
    "name": "DevDay Example Watchlist",
    "description": "Pretty cool, its a watchlist",
    "tags_enabled": True,
    "alerts_enabled": True,
    "report_ids": [report_obj.id]
}

watchlist_obj = th.create(Watchlist,watchlist_dict)
watchlist_obj.save()

NameError: name 'time' is not defined

And lets take a look to make sure the report got added correctly

In [15]:
for report in watchlist_obj.reports:
    print(report)

Report object, bound to https://defense-prod05.conferdeploy.net.
-------------------------------------------------------------------------------

             description: Detected an instance of Excel spawning powershe...
                      id: uPKw3bRTQQCcZVWICV3HQg
                    iocs: None
                 iocs_v2: [{'id': 'notepad_child_proc', 'match_type': 'qu...
                    link: https://devday2019.carbonblack.com/excelPowersh...
                severity: 8
                    tags: None
               timestamp: 1569855699
                   title: Excel with powershell child process
              visibility: None


## Create an Ingress Based Report

`@Heinzeralli` has recently published an additional IOC from their research indicating that there has been an increase in threat actors attempting to obfuscate their activity by through naming executables in a way that may slip by a quick inspection of process name.

Since this IOC is focused on just a single field in CB ThreatHunter data we can create a report that will be process as data is being ingested, indexed and storage. While these types of reports must be created agains a single field each field supports all the "match types" that would be expected for query based reports.

In [16]:
ingress_report_dict = {
    "id":"randomidentifier",
    "timestamp": int(time.time()),
    "link": "https://devday2019.carbonblack.com/notExcel",
    "title": "Misspelled Excel",
    "description": "Detected an instance of suspicious Msft Excel",
    "severity": 8,
    "iocs_v2": [
        {
            "id":"winword_name_ioc",
            "match_type":"regex",
            "field":"process_name",
            "values": ["excl.exe"],
            "link": "https://devday2019.carbonblack.com/notExcel"
        }
    ]
}
ingress_report_obj = th.create(Report,ingress_report_dict)
ingress_report_obj.save_watchlist()
print(ingress_report_obj)

Report created without feed ID or not from watchlist


Report object, bound to https://defense-prod05.conferdeploy.net.
-------------------------------------------------------------------------------

             description: Detected an instance of suspicious powershell.exe
                      id: Qjcj5NfQvyDVmHvTq2bjg
                    iocs: None
                 iocs_v2: [{'id': 'notepad_name_ioc', 'match_type': 'rege...
                    link: https://devday2019.carbonblack.com/notPowershell
                severity: 8
                    tags: None
               timestamp: 1569855794
                   title: Misspelled Notepad
              visibility: None


Now we are going to update our watchlist to include our ingress report as well, note that currently with `cbapi` you need to call they update which will overwrite existing reports that are tied to this watchlist.

In [17]:
watchlist_obj.id
watchlist_obj.update(report_ids=[report_obj.id,ingress_report_obj.id])
for report in watchlist_obj.reports:
    print(report)

Report object, bound to https://defense-prod05.conferdeploy.net.
-------------------------------------------------------------------------------

             description: Detected an instance of suspicious powershell.exe
                      id: Qjcj5NfQvyDVmHvTq2bjg
                    iocs: None
                 iocs_v2: [{'id': 'notepad_name_ioc', 'match_type': 'rege...
                    link: https://devday2019.carbonblack.com/notPowershell
                severity: 8
                    tags: None
               timestamp: 1569855794
                   title: Misspelled Notepad
              visibility: None
Report object, bound to https://defense-prod05.conferdeploy.net.
-------------------------------------------------------------------------------

             description: Detected an instance of suspicious powershell.exe
                      id: Qjcj5NfQvyDVmHvTq2bjg
                    iocs: None
                 iocs_v2: [{'id': 'notepad_name_ioc', 'match_type': 'rege.

## Create a Feed 

It turns out that all that the intel published by `@Heinzeralli` that we just leveraged to create our watchlist is actually part of a long running effort to give back to the CB ThreatHunter community and he is constantly pushing out new intel, maybe this should all actually be a feed.

Note that, when creating feed, you need to put report definitions in place - you cannot reference existing reports. Also, you will need to provide report id for each, that will be unique within the feed.

The first thing that we need to do is create a feed that will hold all of `@Heinzeralli`'s intel.

In [19]:
init_feed = {
    "feedinfo":{
        "name": "Office Attacks", 
        "provider_url": "https://devday2019.carbonblack.com/officeAttacksFeed",
        "summary": "Various attacks using Office",
        "category": "DevDay demo",
        "access":"private",
        "owner":th.credentials['org_key']
    }, 
    "reports":[{"timestamp": int(time.time()),
    "id": "report1",
    "link": "https://devday2019.carbonblack.com/excelIOC",
      "title": "Misspelled Excel",
      "description": "Detected an instance of suspicious excel.exe",
      "severity": 7,
      "iocs_v2": [
          {
              "id": "regex_excel_ioc2",
              "match_type": "regex",
              "field": "process_name",
              "values": [".+/3xc3l.exe"],
              "link": "https://devday2019.carbonblack.com/excelIOCReport"
          }
      ]
    },                    
   {"timestamp": int(time.time()),
    "id": "report2",
    "link": "https://devday2019.carbonblack.com/excelChildReport",
      "title": "Notepad spawning processes",
      "description": "Detected an instance of excel.exe spawns powershell.exe",
      "severity": 8,
      "iocs_v2": [
          {
              "id": "query_excel_ioc",
              "match_type": "query",
              "values": ["process_name:excel.exe childproc_name:powershell.exe"],
              "link": "https://devday2019.carbonblack.com/excelPowershellReport"
          }
      ]
   }]
}

feed_obj = th.create(Feed,init_feed)
feed_obj.save()
print(feed_obj)

Report created without feed ID or not from watchlist
Report created without feed ID or not from watchlist


Feed object, bound to https://defense-prod05.conferdeploy.net.
-------------------------------------------------------------------------------

                  access: private
                category: DevDay demo
                      id: c25UcBHQQbmtTIariUJzuw
                    name: Office Attacks
                   owner: N4LFP2KN
            provider_url: https://devday2019.carbonblack.com/officeAttack...
            source_label: None
                 summary: Various attacks using Office


In order to apply the intel in the feed to our data we need to "subscribe" by creating a watchlist that will essentially be a reference to the Feed.
Rather than creating a brand new watchlist, we will just modify the one from our previous example to use the feed

In [20]:
watchlist_obj.update(description="It's now from a feed!",classifier={'key':'feed_id','value':feed_obj.id})
print(watchlist_obj)

Watchlist object, bound to https://defense-prod05.conferdeploy.net.
-------------------------------------------------------------------------------

          alerts_enabled: False
              classifier: {'key': 'feed_id', 'value': 'c25UcBHQQbmtTIariU...
        create_timestamp: 1569855715
             description: It's now a feed!
                      id: 4onPxYiySOigkvODRDu6Aw
    last_update_timestamp: 1569859881
                    name: DevDay Test Watchlist
              report_ids: ['Qjcj5NfQvyDVmHvTq2bjg', 'uPKw3bRTQQCcZVWICV3H...
            tags_enabled: True


In [21]:
new_report_dict = {
    "id":"randomidentifier1",
    "timestamp": int(time.time()),
    "link": "https://devday2019.carbonblack.com/excelCmdReport",
    "title": "Excel spawns cmd.exe",
    "description": "Excel creating a cmd.exe process",
    "severity": 8,
    "iocs_v2": [
        {
            "id":"excel_cmd_child",
            "match_type":"query",
            "values":["process_name:excel.exe childproc_name:cmd.exe"],
            "link": "https://devday2019.carbonblack.com/excelCmdReport"
        }
    ]
}

new_report_obj = th.create(Report,new_report_dict)
feed_obj.append_reports([new_report_obj])

Report created without feed ID or not from watchlist


# Clean up your org

In [105]:
#delete the watchlist
print(f"Deleting watchlist {watchlist_obj.id}")
watchlist_obj.delete()
#delete the reports as part of the feed
for report in feed_obj.reports:
    print(f"Deleting report {report.id}")
    report.delete()
#delete the feed
print(f"Deleting feed {feed_obj.id}")
feed_obj.delete()

In [106]:
#clean up the orphaned reports from our first watchlist
#`cbapi` has not yet implemented feedsearch so we use the raw object references
ret = th.get_object(f"/threathunter/feedsearch/v1/orgs/{orgkey}/search?query=devday2019")
for rid in ([r["_id"] for r in ret["hits"]["hits"]]):
    print(f"Deleting report {rid}")
    try:
        th.delete_object(f"/threathunter/watchlistmgr/v3/orgs/{orgkey}/reports/{rid}")
    except Exception as e:
        print(f'...Failed to delete report {rid}: {e}')

Deleting report pZ0hFfpRzqcwSgRkFiTgQ
Deleting report u0LxdkjSRo2pHorhsXnpEA
