# Comparison of collaboration methods between MSTICpy and Splunk SIEM

In [2]:
from IPython.display import HTML
HTML('"<iframe width="950" height="550" src="http://localhost:8000/images/TitlePicture.jpg"></iframe>"')

## WHOAMI

### Tatsuya Hasegawa

- **Threat Hunter**, **Application Developer**
- **Self-employed** 
- - mainly, contracting with *"GoAhead Inc."* which is a Japanese small company. (Log analysis and security consulting company using Splunk)
- **MSTICpy lover & contributor**
- - especially, regarding to Splunk Provider. (driver & uploader)
- Infosec Carrer
- - University master degree major in AntiVirus
- - - MSSP in NTT                                    (3y)
- - - - CSIRT in domestic recruit company            (3y)
- - - - - Security Researcher in Cylance/BlackBerry  (3y)
- - - - - - Security Consultant in GoAhead           (3y)
- SNS
- - X: @T_8ase
- - LinkedIn: tatsuya-hasegawa-aa3279142
- **Speaker of APAC SANS DFIR Summit 2023**
- - [Practical msticpy use ~ rainbow bridge to SIEM for advanced threat hunting ~](https://github.com/Tatsuya-hasegawa/MSTICPy_utils/blob/main/Tatsuya_Hasegawa_msticpy_SANS_APAC_DFIR_SUMMIT_2023.pdf)

### MSTICpy is very cool tool for data analysis in terms of threat hunting!
### I introduced the reason I use MSTICpy and Jupyter for advanced threat hunting on my DFIR Summit slides.

## Agenda

1. Splunk Query Provider & Uploader (4 min)
    
    How MSTICpy's Splunk Query Provider work?

2. Splunk App for DSDL (5 min)
    
    How Splunk DSDL work?
    
    How can we inject MSTICpy to DSDL frame work?

3. Let's compare the both about advantages and disadvantages with Matrix! (5 min)

    In fact, case by case.

4. Conclusion (1min)

    





# 1. Splunk Query Provider & Uploader

In [2]:
import msticpy as mp
print(mp.__version__)
mp.init_notebook()


2.10.0


In [3]:
# Splunk Query Provider

splunk_prov = mp.QueryProvider("Splunk")
splunk_prov.connect() # from msticpyconfig.yaml
## splunk_prov.connect(host="HOST",port="PORT",username="USERNAME",password="PASSWORD")

# Auth Token is better for security to access to Splunk
splunk_token_prov = mp.QueryProvider("Splunk")
splunk_token_prov.connect() # from msticpyconfig.yaml
## splunk_token_prov.connect(host="HOST",port="PORT",bearer_token="JWT")


Connected.
Connected.


### Inside Splunk Driver

`Splunk Driver manipulates Splunk Python SDK to connect to Splunk by REST API.`

- /msticpy/msticpy/data/drivers/splunk_drivers.py

```python
try:
    import splunklib.client as sp_client
    import splunklib.results as sp_results
    from splunklib.client import AuthenticationError, HTTPError
except ImportError as imp_err:
    raise MsticpyImportExtraError(
        "Cannot use this feature without splunk-sdk installed",
        title="Error importing splunk-sdk",
        extra="splunk",
    ) from imp_err
    
~~
        # Different required parameters for the REST API authentication method
        # between user/pass and authorization bearer token
        if "username" in cs_dict:
            self._required_params = ["host", "username", "password"]
        else:
            self._required_params = ["host", "bearer_token"]

~~
        # Replace to Splunk python sdk's parameter name of sp_client.connect()
        if arg_dict.get("bearer_token"):
            arg_dict["splunkToken"] = arg_dict.pop("bearer_token")

        try:
            self.service = sp_client.connect(**arg_dict)
        except AuthenticationError as err:            
```

>Note. 
It means user authentification has the priority than token auth if we set the both username and bearer_token in the argument options or in the msticpyconfig.yaml.


### Basically, SplunkUploader is using the same trick!

It has an dedicated Class outside Query Provider like below, but SplunkUploader also utilizes "splunk_driver" for connection.

In [4]:
# Splunk Uploader

from msticpy.data.uploaders.splunk_uploader import SplunkUploader
splunk_uploader = SplunkUploader() # from msticpyconfig.yaml
## splunk_uploader = SplunkUploader(host="HOST", port="PORT", username="USERNAME",  password="PASSWORD")
## splunk_uploader = SplunkUploader(host="localhost", bearer_token="JWT")


Connected.


### The expire check of JWT auth token has not implemented in msticpy yet.

It just emit query failure getting from splunk server after the auth token expired.

# 2. Splunk App for DSDL

[Splunk App for Data Science and Deep Learning](https://splunkbase.splunk.com/app/4607) (DSDL) is a Splunk official extension for Deep Leaning. 

- It usually requires external machine resources such as Docker containers outside of Splunk application.
- Data exchange is fundamentally between the machine containers and Splunk by using endpoint URLs.
- Of course, Jupyther Notebook is supported on the machine containers.
- Thus we can install msticpy in the container side, but manually.
- msticpy's Query Provider and Uploader are **not** necessary on DSDL platform, the data transfer has done by DSDL architecture.



At first, choose the container deployment that meets your needs:
- `single-instance` deployment with Docker or Kubernetes and the Splunk platform running on the same instance.
- `side-by-side` deployment where the Splunk platform instance communicates with another instance that serves as the Docker or Kubernetes host.
    

In fact, there are also two method for data transfer in DSDL.
1. pull & push in container: `Almost same of msticpy's Query Provider and Uploader`
    - Downloader utilizes Splunk REST API with a valid Splunk auth token which is defined in DSDL app.
    - Uploader utilizes Splunk's HTTP Event Collector (HEC).
2. listen & return container: `Different from msticpy!`
    - fit & apply model
    - **I will explain more about this method as followings.**

In [5]:
HTML('"<iframe width="950" height="550" src="http://localhost:8000/images/SplunkDSDL.jpg"></iframe>"')
## DSDL Architechture https://docs.splunk.com/Documentation/DSDL/5.1.1/User/Architecture


### Inside DSDL's MLTKContainer

`MLTKContainer utilizes endpoint URLs with api auth token for communication.`
- /mltk-container/bin/mltkc/MLTKContainer.py 

```python
    # ---------------------------------------------------------------------------
    # helper function to call container endpoint
    # communicate over basic urllib GET or POST with container endpoint
    # TODO #1: gzip compression of payload
    # Arguments : (self for convenience)
    # - url     : request endpoint url, usually self.endpoint_url = 'http://localhost:5000'
    # - data    : JSON structure send via POST to the container
    # - content_type (for future use in case of compression)
    def endpoint(self, url, data=None, content_type='application/json'):
        # we assume a GET call
        method = "GET"
        # if we have data let's switch to POST call
        if data:
            method = "POST"
        # check for logging enabled and log stuff
        if _MLTKContainer_logging:
            debug_message = method+' endpoint ['+url+'] called '
            if data:
                debug_message += 'with payload ('+str(len(data))+' bytes)'
            _MLTKContainer_logger.info(debug_message)
        # let's start pessimistic and override in case of success
        returns = 'ERROR on ' + method + ' request to endpoint ['+url+']'
        try:
            # get endpoint cert and create ssl context with option for incluster traffic
            settings = self._get_config(os.path.join(os.path.dirname(__file__),"..", "..", "local", "docker.conf"))['connection']
            api_token = settings.get('api_token')
            header = {
                'Authorization': api_token,
            }
            # do the POST
            if data:                
                #data_encoded = urllib_parse.urlencode(data).encode('utf-8')
                data_encoded = str.encode(data)
                header['Content-Type'] = content_type
                request = urllib_request.Request(
                    url, data_encoded, header)
            # or the GET
            else:
                request = urllib_request.Request(url, None, header)
~~~
```

### Case example and sample code 

This is a sample code for powershell command line analysis which is developed on the Jupyter Notebook in DSDL's machine container.
https://github.com/Tatsuya-hasegawa/MSTICPy_utils/blob/main/splunk_dsdl/msticpy_powershell_ioc.ipynb



In [6]:
HTML('"<iframe width="950" height="550" src="http://localhost:8000/images/DSDLex1.jpg"></iframe>"')

### In other words, I injected "msticpy library" to DSDL "apply" mechanism.

`Splunk's fit command is needed only for first once, after that, we can use apply the msticpy functions over Splunk without any touch to Jupyter.`

Of course, we can change the codes in the apply function anytime, however we need fit command again to overwrite the .py file.

In [6]:
HTML('"<iframe width="950" height="550" src="http://localhost:8000/images/MLContainerPy.jpg"></iframe>"')

In [8]:
HTML('"<iframe width="1390" height="780" src="http://localhost:8000/images/DSDLex2.jpg"></iframe>"')

# 3. Let's compare the both about advantages and disadvantages with Matrix!


| | msticpy's QueryProvider＆Uploader | Splunk DSDL fit&apply |
|:---- |:----:|:----:|
| Direction |  Jupyter -> Splunk | Splunk -> Jupyter |
| Action Trigger | MSTICpy code snippet| Splunk SPL |
| Secure Channel | By yourself | By DSDL |
| Credential Management |  Manage Splunk credential| Manage Jupyter credential |
| Visualization |  On Jupyter (On Splunk only when Uploader is used)| On Splunk |
| Operation | Individual | Team |
| Restriction | No, except for Jupyter machine resource | Yes, Splunk could be a shared resouce |
| Advantage |  Support long running process and easy for code modification & debug | Easy to set automatic&repetitive analysis by Splunk scheduled search |
| Disadvantage| Data security concerns on Jupyter | Should write more exceptional codes for error handling on Splunk|
| Suitable for data scientist | <font color=green>Y | <font color=red>N |
| Suitable for security analyst/operator | <font color=red>N | <font color=green>Y |
| Suitable for threat hunter | <font color=green>Y | <font color=green>Y | 

# 4. Conclusion

My conclusion is very simple!!!

```
For operational purpose, DSDL is easy to build msticpy's practical use platform.

For manual deep or rapid analysis, msticpy's QueryProvider & Uploader can be more useful.
```

On the other hand, smart threat hunters utilize manual batch and continuous auto analysis.

That is why msticpy users who connect to Splunk may had better know both of these methods.

I'm happy if this matrix is to a help for them.

---
> Comment for Microsoft Sentinel

Probably, we'll have no worry about the choice for Microsoft Sentinel, because it has the highly compatible method by using "Microsoft Sentinel ML Notebooks". 

That uses msticpy's QueryProvider & Uploader of the left side of above matrix, however it mitigates the disadvantage because all the components are inside of secure Azure architecture. Awesome!


### Thank you for giving this talk opportunity on my son's birthday today!🎉