Skip to content

Commit

Permalink
Ianhelle/settings mgmt 2021 02 02 (#136)
Browse files Browse the repository at this point in the history
* Typo in opening sentence

* Adding hash_account as separate item type to data_obfus.py

Making hash_ip more flexible - ignoring things like localhost
Updating documentation, tests and mapping file.
Correcting typo in timeline.py.

* Adding missed documentation for hash_account

* Initial code for Mordor driver and browser

* Mordor data provider and browser.

unit tests and documentation

* Fixing some linting errors.

* Fixed a couple of broken tests because of data providers API change.

* Replacing custom json reader with pd.read_json()

Added ability to set query defaults (like cache directory) from provider.
Fixed a bug in path construction for download file.
Clarified the description of the search functionality and corrected Mitre Attack => ATT&CK
Add URL for Mitre
Updated notebook and doc to reflect these changes.

* Fixing lint/formatting errors in vtlookupv3.

Some other random black reformatting
Added test_mordor_browser.py for notebook test.

* Updated formatting for new black version

* Updating pre-commit version

* Bug fix and nasty workaround for old test setup removed in pkg_config.py

* Update MordorData.rst doc with better intro section

* Splitting entities into separate modules

* Moved entities to datamodel package and initial refactoring for pivoting

* Renaming files to lowercase phase 1

* Renaming entities phase 2

* Start of pivot main library

* Commit to re-merge with master

* Code complete - still docs to do.

* Added test case and fix for couple of misc methods in Pivot and Entity

* Phase 1 code complete with docs.

* Fixing the credscan suppression for test_splunk_uploader

* Adding pre-release version, removing old config file.

* Initial dependency separation

* Implemented extras for msticpy install.

Refactored a few classes to make it easier to import and use modules if only partial msticpy install. Installing
Main one is data_providers - dynamically loading drivers. Also eventcluster and auditdextract.
Moved latter two into analysis folder.
Remove unneeded code from keyvault_client.py since Pete's code eliminated the need for them.
Made AzureSentinel and MDE the preferred names for LogAnalytics and MDE drivers.
Fixed up several unit tests to handle partial installs and still produce results (most should be skipped now instead of erroring).
Fixed a random bugs (like GeoIP Maxmind download)
Fixed pivot_register_reader to skip classes that cannot be instantiated (e.g. IPStack if user doesn't have API key)
Added documentation to Installing.rst
Fixed some problems and renamed module locations in notebooks and RST docs.

* Additions/corrections to Installing.rst

* Somehow these two data files were changed.

* Bandit exception to except: pass

* Correction to FoliumMap.ipynb

* Removing dropna from read_csv in FoliumMap.ipynb

* Adding requirements-all and pre-commit hook to generate this file

* Adding vt, vt_graph to Sphinx mock list

* Added pivot_browser UI - pivot_browser.py

Added ability to read pipeline definitions from yaml files - pivot_pipeline.py
Adding pivot.tee_exec pipeline function - in pivot_pd_accessor.py
Add ability to add arbitrary/ad hoc functions as pivots - in pivot.py
Exposing get_timespan function in Pivot class as public function - in pivot.py.
Added Dns entity to several pivot functions - mp_pivot_reg.yaml

* Fixing some queries for more consistency.

Pivot data query functions now prefixed with table name.
Added ability for pivot functions to return raw output.
Added pyperclip to pkg dependencies exceptions.

* Some corrections to documentation in AzureSentinel and DataAcquisition docs.

Added lru_cache for geoip lookups.

* User environment configuration for notebooks.

Added minimal output from nbinit to show imported modules (I'd noticed some examples of people import stuff that had already been imported)

* Fixing mordor tests and updating azure-mgmt-monitor version in setup.py extras

* Some fixes and changes to the UserDefaults feature - esp the format of the config settings.

Also
- some fixes to tests for test_pkg_imports and import_analyzer.py
- fix to config2kv.py to correct some problems, Also added a function to retrieve and show current KV secrets
- fix for ipwidgets warning about deprecated on_submit() method
- multiple fixes for typos and duplicate section names in: DataProviders.rst, UploadData.rst, PivotFunctions.rst
- added SplunkProvider.rst doc for Splunk provider
- fixed issue in nbinit.py where extra_imports were being lost.
- fix for QueryTime in nbwidgets.py - exception if user types invalid value into date field.
- fixed several issues in test_mp_release.cmd with messed up folders/current folder.

* MSTICPY config settings management
Two main classed
- MpConfigFile (to manage settings file and do a few utility things)
- MpConfigEdit (to edit settings for mp config sections)
Still to add docs/notebook

* PR updates adding comments, some grammer fixes and obfuscation of names.

* PR updates adding comments, some grammar fixes and obfuscation of names.

* Some fixes and changes to the UserDefaults feature - esp the format of the config settings.

Also
- some fixes to tests for test_pkg_imports and import_analyzer.py
- fix to config2kv.py to correct some problems, Also added a function to retrieve and show current KV secrets
- fix for ipwidgets warning about deprecated on_submit() method
- multiple fixes for typos and duplicate section names in: DataProviders.rst, UploadData.rst, PivotFunctions.rst
- added SplunkProvider.rst doc for Splunk provider
- fixed issue in nbinit.py where extra_imports were being lost.
- fix for QueryTime in nbwidgets.py - exception if user types invalid value into date field.
- fixed several issues in test_mp_release.cmd with messed up folders/current folder.

MSTICPY config settings management
Two main classed
- MpConfigFile (to manage settings file and do a few utility things)
- MpConfigEdit (to edit settings for mp config sections)
Still to add docs/notebook

Additional tests, start of validation checks

Added validation step (none-blocking) to forms

Fixing some settings validation issues
Fixed default values being overwritten for new items

Adding more tests and fixes.

Added check_version.py and added call to this from nbinit.py
Added Mordor and LocalData as configurable providers in settings.
mordor_driver, local_data_driver and azure_auth now check settings for defaults.
Add list() type to mpconfig_defaults.yaml

Settings documentation and notebook.

Also updating README.md and PackageSummary.rst with something more contemporary.

* Some tests failing after merge.

Fixed URL in README.md

* Merge tag 'v0.9.0' into ianhelle/MP-Pivot-Phase2-2021-01-04

Fixing some test and linting errors after merge.
Removing lru_cache from ip_lookup in geoip.py

* test_file_browsert test failing because it was trying to change into parent folder and parent folder doesn't exist in CI test environment

* Add joins for pivot data queries in pivot_data_queries.py

Add "print" query debug parameter in data_providers.py
Add find_entity function in entities __init__.py
Add alias "pivots" for get_pivot_list in entity.py
Add ability to set timespan more flexibly. Calling set_timespan no longer resets the timespan. Add PivotBrowser method to Pivot class - in pivot.py
Add missing entity list box in pivot_browser.py.
Switched engine to "Python" for pd.read_csv in pivot_magic_core.py to handle more formatting types.
Add positional params to pipeline step and cleaned up code in pivot_pipeline.py
Updated PivotFunctions.rst and PivotFunctions.ipynb for new functionality.
More tests for test_pivot.py (timespan)
New tests for PivotBrowser - test_pivot_browser.py
Enable and fix tests for pivot data query joins in test_pivot_data_queries_run.py
Add test for positional params in test_pivot_pipeline.py

* Suppressing expected user warnings in tests.

Fixing a bug with the "print_query" debug option being called from TIProviders/kql_base.py.
Cleaning up mordor data file cleanup in test_mordor_driver.py.
Adding an optimistic random delay to geoip.py to avoid instances in different processes trying to download the same file simultaneously. Really only an issue in multi-processing distributed tests.

* Fixing test error in test_user_config.py

McCabe complexity warning in config2kv.py

* Updating version

* Bandit warning on use of random.randint()

Updating version

* Removing fake secret from MPSettingsEditor.ipynb
Moving list definition for mypy in local_data_driver.py
Black reformatting test_user_config.py

* Failing test and linter warnings

* Adding notice and badge to Readme

* Adding documentation diagrams

* Updates from PR.

Also fixing a bug and merge conflict in mp_config_file.py where I was passing the whole URL as the secret name. Also put a catch for this in keyvault_client.py.
  • Loading branch information
ianhelle committed Mar 10, 2021
1 parent 3e16715 commit f386ccd
Show file tree
Hide file tree
Showing 77 changed files with 8,324 additions and 324 deletions.
128 changes: 77 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,13 @@ authoring for
[Azure Sentinel](https://azure.microsoft.com/en-us/services/azure-sentinel/).
While Azure Sentinel is still a big focus of our work, we are
extending the data query/acquisition components to pull log data from
other sources (currently Microsoft Defender and Microsoft Graph but we
other sources (currently Splunk, Microsoft Defender for Endpoint and
Microsoft Graph are supported but we
are actively working on support for data from other SIEM platforms).
Most of the components can also be used with data from any source. Pandas
DataFrames are used as the ubiquitous input and output format of almost
all components.
all components. There is also a data provider to make it easy to and process
data from local CSV files and pickled DataFrames.

The package addresses three central needs for security investigators
and hunters:
Expand Down Expand Up @@ -77,19 +79,20 @@ functions in this interactive demo on mybinder.org.

## Log Data Acquisition

- QueryProvider - extensible query library targeting Azure Sentinel, OData
sources and other. Built-in parameterized queries allow complex queries to be run
from a single function call. Add your own queries using a simple YAML
schema.
- security_alert and security_event - encapsulation classes for alerts and events.
- entity_schema - definitions for multiple entities (Host, Account, File, IPAddress,
etc.)
QueryProvider is an extensible query library targeting Azure Sentinel/Log Analytics,
Splunk, OData
and other log data sources. It also has special support for
[Mordor](https://github.com/OTRF/mordor) data sets and using local data.

Built-in parameterized queries allow complex queries to be run
from a single function call. Add your own queries using a simple YAML
schema.

[Data Queries Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/Data_Queries.ipynb)

## Data Enrichment

### tiproviders
### Threat Intelligence providers

The TILookup class can lookup IoCs across multiple TI providers. built-in
providers include AlienVault OTX, IBM XForce, VirusTotal and Azure Sentinel.
Expand Down Expand Up @@ -119,13 +122,17 @@ using either:
and
[GeoIP Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/GeoIPLookups.ipynb)

### Azure Data
### Azure Resource Data, Storage and Azure Sentinel API

This package contains functionality for enriching data regarding Azure host
details with additional host details exposed via the Azure API.

[Azure Data](https://msticpy.readthedocs.io/en/latest/data_acquisition/AzureData.html)
The AzureData module contains functionality for enriching data regarding Azure host
details with additional host details exposed via the Azure API. The AzureSentinel
module allows you to query incidents, retrieve detector and hunting
queries. AzureBlogStorage lets you read and write data from blob storage.

[Azure Resource APIs](https://msticpy.readthedocs.io/en/latest/data_acquisition/AzureData.html),
[Azure Sentinel APIs](https://msticpy.readthedocs.io/en/latest/data_acquisition/AzureSentinel.html),
[Azure Storage](https://msticpy.readthedocs.io/en/latest/data_acquisition/AzureBlobStorage.html)
## Security Analysis

This subpackage contains several modules helpful for working on security investigations and hunting:
Expand All @@ -141,7 +148,7 @@ a mail forwarding rule on someone's mailbox.
and
[Anomalous Sequence Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/AnomalousSequence.ipynb)

### Time Series
### Time Series Analysis

Time series analysis allows you to identify unusual patterns in your log data
taking into account normal seasonal variations (e.g. the regular ebb and flow of
Expand All @@ -154,6 +161,53 @@ alt="Time Series anomalies" title="Time Series anomalies" height="300" />

[Time Series](https://msticpy.readthedocs.io/en/latest/visualization/TimeSeriesAnomalies.html)

## Visualization

### Event Timelines

Display any log events on an interactive timeline. Using the
[Bokeh Visualization Library](https://bokeh.org/) the timeline control enables
you to visualize one or more event streams, interactively zoom into specific time
slots and view event details for plotted events.

<img src="https://github.com/microsoft/msticpy/blob/master/docs/source/visualization/_static/TimeLine-01.png"
alt="Timeline" title="Msticpy Timeline Control" height="300" />

[Timeline](https://msticpy.readthedocs.io/en/latest/visualization/EventTimeline.html)
and
[Timeline Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/EventTimeline.ipynb)

### Process Trees

The process tree functionality has two main components:

- Process Tree creation - taking a process creation log from a host and building
the parent-child relationships between processes in the data set.
- Process Tree visualization - this takes the processed output displays an interactive process tree using Bokeh plots.

There are a set of utility functions to extract individual and partial trees from the processed data set.

<img src="https://github.com/microsoft/msticpy/blob/master/docs/source/visualization/_static/process_tree3.png"
alt="Process Tree"
title="Interactive Process Tree" height="400" />

[Process Tree](https://msticpy.readthedocs.io/en/latest/visualization/ProcessTree.html)
and
[Process Tree Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/ProcessTree.ipynb)

## Data Manipulation and Utility functions

### Pivot Functions

Lets you use *MSTICPy* functionality in an "entity-centric" way.
All functions, queries and lookups that relate to a particular entity type
(e.g. Host, IpAddress, Url) are collected together as methods of that
entity class. So, if you want to do things with an IP address, just load
the IpAddress entity and browse its methods.

[Pivot Functions](https://msticpy.readthedocs.io/en/latest/data_analysis/PivotFunctions.html)
and
[Pivot Functions Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/PivotFunctions.ipynb)
### base64unpack

Base64 and archive (gz, zip, tar) extractor. It will try to identify any base64 encoded
Expand All @@ -162,6 +216,7 @@ will unpack the contents. The results of each decode/unpack are rechecked for fu
base64 content and up to a specified depth.

[Base64 Decoding](https://msticpy.readthedocs.io/en/latest/data_analysis/Base64Unpack.html)
and
[Base64Unpack Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/Base64Unpack.ipynb)

### iocextract
Expand All @@ -171,6 +226,7 @@ DNS domains, Hashes, file paths.
Input can be a single string or a pandas dataframe.

[IoC Extraction](https://msticpy.readthedocs.io/en/latest/data_analysis/IoCExtract.html)
and
[IoCExtract Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/IoCExtract.ipynb)

### eventcluster (experimental)
Expand All @@ -186,51 +242,16 @@ events can often make it difficult to see unique and interesting items.
This is an unsupervised learning module implemented using SciKit Learn DBScan.

[Event Clustering](https://msticpy.readthedocs.io/en/latest/data_analysis/EventClustering.html)
and
[Event Clustering Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/EventClustering.ipynb)

## Visualization

### Timelines

Display any log events on an interactive timeline. Using the
[Bokeh Visualization Library](https://bokeh.org/) the timeline control enables
you to visualize one or more event streams, interactively zoom into specific time
slots and view event details for plotted events.

<img src="https://github.com/microsoft/msticpy/blob/master/docs/source/visualization/_static/TimeLine-01.png"
alt="Timeline" title="Msticpy Timeline Control" height="300" />

[Timeline](https://msticpy.readthedocs.io/en/latest/visualization/EventTimeline.html)
[Timeline Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/EventTimeline.ipynb)

### Process Trees

The process tree functionality has two main components:

- Process Tree creation - taking a process creation log from a host and building
the parent-child relationships between processes in the data set.
- Process Tree visualization - this takes the processed output displays an interactive process tree using Bokeh plots.

There are a set of utility functions to extract individual and partial trees from the processed data set.

<img src="https://github.com/microsoft/msticpy/blob/master/docs/source/visualization/_static/process_tree3.png"
alt="Process Tree"
title="Interactive Process Tree" height="400" />

[Process Tree](https://msticpy.readthedocs.io/en/latest/visualization/ProcessTree.html)
[Process Tree Notebook](https://github.com/microsoft/msticpy/blob/master/docs/notebooks/ProcessTree.ipynb)

## Other Tools

### auditdextract

Module to load and decode Linux audit logs. It collapses messages sharing the same
message ID into single events, decodes hex-encoded data fields and performs some
event-specific formatting and normalization (e.g. for process start events it will
re-assemble the process command line arguments into a single string).

This is still a work-in-progress.

### syslog_utils

Module to support an investigation of a Linux host with only syslog logging enabled.
Expand All @@ -242,6 +263,11 @@ user sessions containing suspicious activity.
A module to support he detection of known malicious command line activity or suspicious
patterns of command line activity.

### domain_utils

A module to support investigation of domain names and URLs with functions to
validate a domain name and screenshot a URL.

### Notebook widgets

These are built from the [Jupyter ipywidgets](https://ipywidgets.readthedocs.io/) collection
Expand Down
Binary file added docs/diagrams/MPSettingsConfig.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f386ccd

Please sign in to comment.