Skip to content

Commit

Permalink
add LocalOsquery driver based on LocalData one (#624)
Browse files Browse the repository at this point in the history
* add LocalOsquery driver based on LocalData one

* split try/except, mypy json_out

* Simplifed local_osquery_driver.py a little. Now loads query files to memory on connect/query/schema access.

Changed name/enum of provider to OSQueryLogs
Added template unit test case test_load_osquery_driver.py
Added entry in msticpyconfig-test.yaml for OSQueryLogs (in DataProviders section)

* Skipping tests that will currently fail because no data in test_load_osquery_driver.py

* Updating OSQuery driver.

Auto-generate query names from data
Rename columns from json-normalized form
Add documentation
Add unit tests and test data

* Missed the updated unit tests

* Updating the process tree schema for cleaned col names

* Suppress low-sev bandit issue

---------

Co-authored-by: Ian Hellen <ianhelle@microsoft.com>
  • Loading branch information
juju4 and ianhelle committed Apr 18, 2023
1 parent 645db0e commit f78bd67
Show file tree
Hide file tree
Showing 15 changed files with 2,385 additions and 1 deletion.
1,421 changes: 1,421 additions & 0 deletions docs/notebooks/LocalData-osquery.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/source/DataAcquisition.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Individual Data Environments
data_acquisition/DataProv-Sumologic
data_acquisition/DataProv-Kusto
data_acquisition/DataProv-Cybereason
data_acquisition/DataProv-OSQuery


Built-in Data Queries
Expand Down
7 changes: 7 additions & 0 deletions docs/source/api/msticpy.data.drivers.local_osquery_driver.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
msticpy.data.drivers.local\_osquery\_driver module
==================================================

.. automodule:: msticpy.data.drivers.local_osquery_driver
:members:
:undoc-members:
:show-inheritance:
1 change: 1 addition & 0 deletions docs/source/api/msticpy.data.drivers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Submodules
msticpy.data.drivers.kql_driver
msticpy.data.drivers.kusto_driver
msticpy.data.drivers.local_data_driver
msticpy.data.drivers.local_osquery_driver
msticpy.data.drivers.mdatp_driver
msticpy.data.drivers.mordor_driver
msticpy.data.drivers.odata_driver
Expand Down
174 changes: 174 additions & 0 deletions docs/source/data_acquisition/DataProv-OSQuery.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
The OSQuery provider
====================

:py:mod:`OSQuery driver documentation<msticpy.data.drivers.local_os_query_driver>`

The ``OSQuery`` data provider can read OSQuery log files
and provide convenient query functions for each OSQuery "table"
(or event type) contained in the logs.

The provide can read in one or more log files, or multiple log files
in multiple folders. The files are read, converted to pandas
DataFrames and grouped by table/event. In addition, date fields
within the data are converted to pandas Timestamp format.

.. code::ipython3
qry_prov = mp.QueryProvider("OSQueryLogs", data_paths=["~/my_logs"])
qry_prov.connect()
df_processes = qry_prov.processes()
The query provider query functions will ignore parameters and do
no further filtering. You can use pandas to do additional filtering
and sorting of the data, or use it directly with other MSTICPy
functionality.

OSQuery Configuration
---------------------

You can store your connection details in *msticpyconfig.yaml*,
instead of supplying the ``data_paths`` parameter to
the ``QueryProvider`` class.

For more information on using and configuring *msticpyconfig.yaml* see
:doc:`msticpy Package Configuration <../getting_started/msticpyconfig>`
and :doc:`MSTICPy Settings Editor<../getting_started/SettingsEditor>`

The OSQuery settings in the file should look like the following:

.. code:: yaml
DataProviders:
...
OSQuery:
data_paths:
- /home/user1/sample_data
- /home/shared/sample_data
cache_file: ~/.msticpy/os_query_cache.pkl
The cache_file entry is explained later.

Expected log file format
------------------------

The log file format must be a text file of JSON records. An example
is shown below

.. parsed-literal::
{"name":"pack_osquery-snapshots-pack_python_packages","hostIdentifier":"jumpvm","calendarTime":"Thu Mar 16 09:22:33 2023 UTC","unixTime":1678958553,"epoch":0,"counter":0,"numerics":false,"decorations":{"host_uuid":"40443dd9-5b21-a345-8f89-aadde84c3719","username":"LOGIN"},"columns":{"author":"Python Packaging Authority","directory":"/usr/lib/python3.9/site-packages/","license":"UNKNOWN","name":"setuptools","path":"/usr/lib/python3.9/site-packages/setuptools-50.3.2.dist-info/","summary":"Easily download, build, install, upgrade, and uninstall Python packages","version":"50.3.2"},"action":"snapshot"}
{"name":"pack_osquery-snapshots-pack_dns_resolvers","hostIdentifier":"jumpvm","calendarTime":"Thu Mar 16 13:14:10 2023 UTC","unixTime":1678972450,"epoch":0,"counter":0,"numerics":false,"decorations":{"host_uuid":"40443dd9-5b21-a345-8f89-aadde84c3719","username":"LOGIN"},"columns":{"address":"168.63.129.16","id":"0","netmask":"32","options":"705","type":"nameserver"},"action":"snapshot"}
Each JSON record is expected to have a ``name`` field, identifying
the event type, along with child dictionaries (``columns`` and ``decorations``.

.. code::JSON
{
"name": "pack_osquery-snapshots-pack_dns_resolvers",
"hostIdentifier": "jumpvm",
"calendarTime": "Thu Mar 16 13:14:10 2023 UTC",
"unixTime": 1678972450,
"epoch": 0,
"counter": 0,
"numerics": false,
"decorations": {
"host_uuid": "40443dd9-5b21-a345-8f89-aadde84c3719",
"username": "LOGIN"
},
"columns": {
"address": "u5r0qfkczeeejf3qb20cha0ihb.bx.internal.cloudapp.net",
"id": "0",
"netmask": "",
"options": "705",
"type": "search"
},
"action": "snapshot"
}
Using the OSQuery provider
--------------------------

To use the OSQuery provider you need to create an QueryProvider
instance, passing the string "OSQueryLogs" as the ``data_environment``
parameter. If you have not configured ``data_paths`` in msticpyconfig.yaml,
you also need to add the ``data_paths`` parameter to specify
specific folders or files that you want to read.

.. code::ipython3
qry_prov = mp.QueryProvider("OSQueryLogs", data_paths=["~/my_logs"])
Calling the ``connect`` method triggers the provider to read the
log files.

.. code::ipython3
qry_prov.connect()
.. parsed-literal::
100%|██████████| 2/2 [00:00<00:00, 25.01it/s]
Data loaded.
Listing OSQuery tables
~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3
qry_prov.list_queries()
.. parsed-literal::
['osquery.acpi_tables',
'osquery.device_nodes',
'osquery.dns_resolvers',
'osquery.events',
'osquery.fim',
'osquery.last',
'osquery.listening_ports',
'osquery.logged_in_users',
'osquery.mounts',
'osquery.open_sockets',
'osquery.osquery_info',
'osquery.osquery_packs',
'osquery.osquerydb_size',
'osquery.platform_info',
'osquery.process_memory',
'osquery.processes',
'osquery.python_packages',
'osquery.schedule',
'osquery.shell_history']
Running an OSQuery query
~~~~~~~~~~~~~~~~~~~~~~~~

Each query returns the table of event types retrieved
from the logs.

.. code:: python3
qry_prov.osquery.processes()
================================== ================ ========================= ===== ========== ========= ====== ======== ======== ===== ==========
name hostIdentifier unixTime ... username cmdline euid name_ parent uid username
================================== ================ ========================= ===== ========== ========= ====== ======== ======== ===== ==========
pack_osquery-custom-pack_processes jumpvm 2023-03-16 03:08:58+00:00 ... LOGIN 0 kthreadd 2 0 root
pack_osquery-custom-pack_processes jumpvm 2023-03-16 03:08:58+00:00 ... LOGIN 0 kthreadd 2 0 root
pack_osquery-custom-pack_processes jumpvm 2023-03-16 03:08:58+00:00 ... LOGIN 0 kthreadd 2 0 root
pack_osquery-custom-pack_processes jumpvm 2023-03-16 03:08:58+00:00 ... LOGIN 0 kthreadd 2 0 root
pack_osquery-custom-pack_processes jumpvm 2023-03-16 03:08:58+00:00 ... LOGIN 0 kthreadd 2 0 root
================================== ================ ========================= ===== ========== ========= ====== ======== ======== ===== ==========

.. note:: Columns in the the nested log data may be renamed
if their name clashes with an existing name. See the
example ``name_`` in the previous table.

Other OSQuery Provider Documentation
------------------------------------


Built-in :ref:`data_acquisition/DataQueries:Queries for Local Data`.

:py:mod:`LocalData driver API documentation<msticpy.data.drivers.local_os_query_driver>`
4 changes: 3 additions & 1 deletion msticpy/data/core/data_providers.py
Original file line number Diff line number Diff line change
Expand Up @@ -708,7 +708,9 @@ def _add_driver_queries(self, queries: Iterable[Dict[str, str]]):
self.query_store.add_query(
name=query["name"],
query=query["query"],
query_paths=query["query_container"],
query_paths=query.get(
"query_paths", query.get("query_container", "default")
),
description=query["description"],
)
# For now, just add all of the functions again (with any connect-time acquired
Expand Down
1 change: 1 addition & 0 deletions msticpy/data/core/query_defns.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ class DataEnvironment(Enum):
M365D = 11
Cybereason = 12
Elastic = 14
OSQueryLogs = 15

@classmethod
def parse(cls, value: Union[str, int]) -> "DataEnvironment":
Expand Down
1 change: 1 addition & 0 deletions msticpy/data/drivers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
DataEnvironment.MDATP: ("mdatp_driver", "MDATPDriver"),
DataEnvironment.MDE: ("mdatp_driver", "MDATPDriver"),
DataEnvironment.LocalData: ("local_data_driver", "LocalDataDriver"),
DataEnvironment.OSQueryLogs: ("local_osquery_driver", "OSQueryLogDriver"),
DataEnvironment.Splunk: ("splunk_driver", "SplunkDriver"),
DataEnvironment.Mordor: ("mordor_driver", "MordorDriver"),
DataEnvironment.Sumologic: ("sumologic_driver", "SumologicDriver"),
Expand Down

0 comments on commit f78bd67

Please sign in to comment.