Skip to content

Commit

Permalink
Added API to QueryProvider to add a custom query at runtime (#586)
Browse files Browse the repository at this point in the history
* Added API to QueryProvider to add a custom query at runtime

* Added documentation and test case

* Adding section on query parameter naming

* Fixed inconsistent naming of create_param QueryProvider attribute

* Fixed typo in test_dataqueries.py

Adding pip caching to linting github action section

Co-authored-by: Pete Bryan <peter.bryan@microsoft.com>
  • Loading branch information
ianhelle and petebryan committed Jan 20, 2023
1 parent 867104e commit 4dc58e6
Show file tree
Hide file tree
Showing 6 changed files with 217 additions and 11 deletions.
17 changes: 14 additions & 3 deletions .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@ jobs:
# This path is specific to Ubuntu
path: ~/.cache/pip
# Look to see if there is a cache hit for the corresponding requirements file
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
key: ${{ runner.os }}-pip-${{ hashFiles('requirements-all.txt') }}
restore-keys: |
${{ runner.os }}-pip-
${{ runner.os }}-
${{ runner.os }}-pip-${{ hashFiles('requirements-all.txt') }}
${{ runner.os }}-pip
- name: Install dependencies
run: |
python -m pip install --upgrade pip wheel setuptools
Expand Down Expand Up @@ -99,6 +99,17 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Cache pip
uses: actions/cache@v3
with:
# This path is specific to Ubuntu
path: ~/.cache/pip
# Look to see if there is a cache hit for the corresponding requirements file
key: ${{ runner.os }}-pip-lint-${{ hashFiles('requirements-all.txt') }}
restore-keys: |
${{ runner.os }}-pip-lint-${{ hashFiles('requirements-all.txt') }}
${{ runner.os }}-pip-lint
${{ runner.os }}-pip
- name: Install dependencies
run: |
python -m pip install --upgrade pip wheel setuptools
Expand Down
1 change: 0 additions & 1 deletion docs/source/api/msticpy.context.tiproviders.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ Submodules

msticpy.context.tiproviders.alienvault_otx
msticpy.context.tiproviders.azure_sent_byoti
msticpy.context.tiproviders.dynamic_provider-old
msticpy.context.tiproviders.greynoise
msticpy.context.tiproviders.ibm_xforce
msticpy.context.tiproviders.intsights
Expand Down
81 changes: 75 additions & 6 deletions docs/source/data_acquisition/DataProviders.rst
Original file line number Diff line number Diff line change
Expand Up @@ -506,12 +506,53 @@ for Timedelta in the
exactly on the time boundaries but some data sources may not use
granular enough time stamps to avoid this.

Dynamically adding new queries
------------------------------

You can use the :py:meth:`msticpy.data.core.data_providers.QueryProvider.add_query`
to add parameterized queries from a notebook or script. This
let you use temporary parameterized queries without having to
add them to a YAML file (as described in `Creating new queries`_).

get_host_events

.. code:: python
# initialize a query provider
qry_prov = mp.QueryProvider("MSSentinel")
# define a query
query = """
SecurityEvent
| where EventID == {event_id}
| where TimeGenerated between (datetime({start}) .. datetime({end}))
| where Computer has "{host_name}"
"""
# define the query parameters
# (these can also be passed as a list of raw tuples)
qp_host = qry_prov.create_param("host_name", "str", "Name of Host")
qp_start = qry_prov.create_param("start", "datetime")
qp_end = qry_prov.create_param("end", "datetime")
qp_evt = qry_prov.create_param("event_id", "int", None, 4688)
# add the query
qry_prov.add_custom_query(
name="get_host_events",
query=query,
family="Custom",
parameters=[qp_host, qp_start, qp_end, qp_evt]
)
# query is now available as
qry_prov.Custom.get_host_events(host_name="MyPC"....)
Creating new queries
--------------------

*msticpy* provides a number of
pre-defined queries to call with using the data package. You can also
add in additional queries to be imported and used by your Query
add additional queries to be imported and used by your Query
Provider, these are defined in YAML format files and examples of these
files can be found at the msticpy GitHub site
https://github.com/microsoft/msticpy/tree/master/msticpy/data/queries.
Expand Down Expand Up @@ -580,7 +621,7 @@ Each query key has the following structure:
the query before being passed to the data provider. Each parameter
must have a unique name (for each query, not globally). All parameters
specified in the query text must have an entry here or in the file
defauls section. The parameter subsection has the following sub-keys:
defaults section. The parameter subsection has the following sub-keys:

- **description**: A description of what the parameter is (used for generating
documentation strings.
Expand All @@ -599,15 +640,43 @@ Some common parameters used in the queries are:

.. code:: yaml
table:
description: The table name
type: str
default: SecurityEvent | where EventID == 4624
parameters:
table:
description: The table name
type: str
default: SecurityEvent | where EventID == 4624
- **add_query_items**: This is a useful way of extending queries by adding
ad hoc statements to the end of the query (e.g. additional filtering order
summarization).

Using known parameter names
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Try to use standard names for common entities and other parameter values.
This makes things easier for users of the queries and, in some cases,
enables functionality such as automatic insertion of times.

Always use these names for common parameters

================= ================================= ============= ===============
Query Parameter Description type default
================= ================================= ============= ===============
start The start datetime for the query datetime N/A
end The end datetime for the query datetime N/A
table The name of the main table (opt) str the table name
add_query_items Placeholder for additional query str ""
================= ================================= ============= ===============

Entity names
For entities such as IP address, host name, account name, process, domain, etc.,
always use one of the standard names - these are used by pivot functions to
map queries to the correct entity.

For the current set of names see the following section in the Pivot Functions
documentation - :ref:`data_analysis/PivotFunctions:How are queries assigned to specific entities?`


Using yaml aliases and macros in your queries
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
1 change: 1 addition & 0 deletions docs/source/data_acquisition/SentinelTI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ with them via the Microsoft Sentinel APIs which are utilized in these functions.
See :py:meth:`get_all_indicators <msticpy.context.azure_sentinel_core.MicrosoftSentinel.get_all_indicators>`

.. code:: ipython3
sentinel = MicrosoftSentinel()
sentinel.get_all_indicators()
Expand Down
95 changes: 94 additions & 1 deletion msticpy/data/core/data_providers.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from functools import partial
from itertools import tee
from pathlib import Path
from typing import Any, Dict, Iterable, List, Optional, Pattern, Union
from typing import Any, Dict, Iterable, List, NamedTuple, Optional, Pattern, Union

import pandas as pd
from tqdm.auto import tqdm
Expand All @@ -35,6 +35,21 @@
_COMPATIBLE_DRIVER_MAPPINGS = {"mssentinel": ["m365d"], "mde": ["m365d"]}


class QueryParam(NamedTuple):
"""
Named tuple for custom query parameters.
name and data_type are mandatory.
description and default are optional.
"""

name: str
data_type: str
description: Optional[str] = None
default: Optional[str] = None


@export
class QueryProvider:
"""
Expand All @@ -45,6 +60,8 @@ class QueryProvider:
"""

create_param = QueryParam

def __init__( # noqa: MC0001
self,
data_environment: Union[str, DataEnvironment],
Expand Down Expand Up @@ -478,6 +495,82 @@ def query_time(self):
"""Return the default QueryTime control for queries."""
return self._query_time

def add_custom_query(
self,
name: str,
query: str,
family: Union[str, Iterable[str]],
description: Optional[str] = None,
parameters: Optional[Iterable[QueryParam]] = None,
):
"""
Add a custom function to the provider.
Parameters
----------
name : str
The name of the query.
query : str
The query text (optionally parameterized).
family : Union[str, Iterable[str]]
The query group/family or list of families. The query will
be added to attributes of the query provider with these
names.
description : Optional[str], optional
Optional description (for query help), by default None
parameters : Optional[Iterable[QueryParam]], optional
Optional list of parameter definitions, by default None.
If the query is parameterized you must supply definitions
for the parameters here - at least name and type.
Parameters can be the named tuple QueryParam (also
exposed as QueryProvider.Param) or a 4-value
Examples
--------
>>> qp = QueryProvider("MSSentinel")
>>> qp_host = qp.create_paramramram("host_name", "str", "Name of Host")
>>> qp_start = qp.create_param("start", "datetime")
>>> qp_end = qp.create_param("end", "datetime")
>>> qp_evt = qp.create_param("event_id", "int", None, 4688)
>>>
>>> query = '''
>>> SecurityEvent
>>> | where EventID == {event_id}
>>> | where TimeGenerated between (datetime({start}) .. datetime({end}))
>>> | where Computer has "{host_name}"
>>> '''
>>>
>>> qp.add_custom_query(
>>> name="test_host_proc",
>>> query=query,
>>> family="Custom",
>>> parameters=[qp_host, qp_start, qp_end, qp_evt]
>>> )
"""
if parameters:
param_dict = {
param[0]: {
"type": param[1],
"default": param[2],
"description": param[3],
}
for param in parameters
}
else:
param_dict = {}
source = {
"args": {"query": query},
"description": description,
"parameters": param_dict,
}
metadata = {"data_families": [family] if isinstance(family, str) else family}
query_source = QuerySource(
name=name, source=source, defaults={}, metadata=metadata
)
self.query_store.add_data_source(query_source)
self._add_query_functions()

def _execute_query(self, *args, **kwargs) -> Union[pd.DataFrame, Any]:
if not self._query_provider.loaded:
raise ValueError("Provider is not loaded.")
Expand Down
33 changes: 33 additions & 0 deletions tests/data/test_dataqueries.py
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,39 @@ def test_query_prov_properties():
check.is_in("ResourceGraph", data_envs)


def test_add_query():
"""Test adding a query dynamically."""
qry_prov = QueryProvider("MSSentinel")

# define a query
query = """
SecurityEvent
| where EventID == {event_id}
| where TimeGenerated between (datetime({start}) .. datetime({end}))
| where Computer has "{host_name}"
"""
# define the query parameters
# (these can also be passed as a list of raw tuples)
qp_host = qry_prov.create_param("host_name", "str", "Name of Host")
qp_start = qry_prov.create_param("start", "datetime")
qp_end = qry_prov.create_param("end", "datetime")
qp_evt = qry_prov.create_param("event_id", "int", None, 4688)

# add the query
qry_prov.add_custom_query(
name="get_host_events",
query=query,
description="Get events of type from host",
family="Custom",
parameters=[qp_host, qp_start, qp_end, qp_evt],
)

check.is_true(hasattr(qry_prov, "Custom"))
check.is_true(hasattr(qry_prov.Custom, "get_host_events"))
check.is_true(callable(qry_prov.Custom.get_host_events))
check.is_in("Get events of type", qry_prov.Custom.get_host_events.__doc__)


_SEARCH_TESTS = [
((None, None, None), 0),
(("syslog", None, None), 15),
Expand Down

0 comments on commit 4dc58e6

Please sign in to comment.