Skip to content
Microsoft Threat Intelligence Security Tools
Python Jupyter Notebook
Branch: master
Clone or download
ianhelle Pandas magic extensions (#46)
* NB magics mp_magics.py for base64unpack and IocExtract

Tidying the code up a bit in base64unpack
Adding tests for test_tiprovider_kql and test_tiproviders.py

* Added arguments checker to help with timeline and other functions that have a lot of kwargs.

Added requirements-dev.txt

* Fixing missing Dict import in utility.py

Fixing mypy errors in base64unpack.py

* Fixing parameters issue in timeline

Add parameter checks to timeline.py
Getting rid of and deprecating some functions

* Removing erroneous legend_column parameter.

Adding unit tests for utility.py

* Added more tests for utility.py

Fixed some errors and typos

* Fixing some tests an incorrect parameters used in notebooks

Fixing AttributeError for pandas datetime value (seeming to result from update to Pandas 1.0)
Changed tld_index and ssl_bl attributes to properties that auto-load on first use (prevents remote http request if data on class instantiation)
Change environment variable that controls test skipping to something more generic MSTICPY_TEST_NOSKIP

* Moved mp_magics to sectools_magics to avoid circular import problem

Added new location centering logic to foliummap
Add a closure to preserve config file name in pkg_config.py - also function to return the filename
process_tree - added pandas extension and changed main function so that it returnns the plot figure and layout
timeline - added pandas extension. added support for DateTime column in Tooltips (display as date time rather than number)
wsconfig - added method to dispaly available workspaces
base64unpack - added pandas extension
iocextract - added pandas extension

* Fixed bug in GeoIP DB downloader

Add doc of magic and pandas extension to IoCExtract.ipynb
Changed foliummap center functions to use median by default
Removed largely redundant os_family param from iocextract.py functions
Fixed sectools_magics iocextract class
Update test_ioc_extractor for new parameters

* Adding FoliumMap.ipynb notebook,

Updates to GeoIPLookups.ipynb
Added unit test test_folium.py
Fixed a few errors in foliummap.py

* Removed failing cell from end of GeoIPLookups notebook

* Missing test data file

* Another missing file

* And another!

* Updating docs for new usage.

Suppressing credscan error in AzureData.rst

* Removing notebook with misleading content

* Adding suppression file for credscan

* Credscan suppression for Sphinx-generated docs\build\html\_sources\data_acquisition\AzureData.rst.txt

* Trying to clean up pytest coverage report.

* Adding GeoIP tests.

Removing deprecated lines from coverage reports.

* Excluding test_geoip from local tests

* Spelling fixes for AzureData.rst

* Adding better help if someone tries to use a query that doesn't exist

* Review changes for foliummap

Experiment with image in README.md

* Adding a couple more graphics to README.md

* Fixing type of Azure in AzureData.rst
Latest commit 669f7ec Feb 14, 2020

README.md

MSTIC Jupyter and Python Security Tools

Microsoft Threat Intelligence Python Security Tools.

The msticpy package was initially developed to support Jupyter Notebook authoring for Azure Sentinel. Many of the included tools can be used in other security scenarios for threat hunting and threat investigation.

Timeline

There are three main sub-packages:

  • sectools - Python security tools to help with data enrichment, analysis or investigation.
  • nbtools - Jupyter-specific UI tools such as widgets, plotting and other data display.
  • data - data layer and pre-defined queries for Azure Sentinel, MDATP and other data sources.

We welcome feedback, bug reports, suggestions for new features and contributions.

Installing

pip install msticpy

or for the latest dev build

pip install git+https://github.com/microsoft/msticpy

Documentation

Full documentation is at ReadTheDocs

Sample notebooks for many of the modules are in the docs/notebooks folder and accompanying notebooks.

You can also browse through the sample notebooks referenced at the end of this document (especially the Windows Alert Investigation notebook) to see some of the functionality used in context.


Security Tools Sub-package - sectools

This subpackage contains several modules helpful for working on security investigations and hunting:

base64unpack

Base64 and archive (gz, zip, tar) extractor. Input can either be a single string or a specified column of a pandas dataframe. It will try to identify any base64 encoded strings and decode them. If the result looks like one of the supported archive types it will unpack the contents. The results of each decode/unpack are rechecked for further base64 content and will recurse down up to 20 levels (default can be overridden). Output is to a decoded string (for single string input) or a DataFrame (for dataframe input).

Base64Unpack Notebook

iocextract

Uses a set of built-in regular expressions to look for Indicator of Compromise (IoC) patterns. Input can be a single string or a pandas dataframe with one or more columns specified as input.

The following types are built-in:

  • IPv4 and IPv6
  • URL
  • DNS domain
  • Hashes (MD5, SHA1, SHA256)
  • Windows file paths
  • Linux file paths (this is kind of noisy because a legal Linux file path can have almost any character)

You can modify or add to the regular expressions used at runtime.

Output is a dictionary of matches (for single string input) or a DataFrame (for dataframe input).

IoCExtract Notebook

tiproviders

The TILookup class can lookup IoCs across multiple TI providers. built-in providers include AlienVault OTX, IBM XForce, VirusTotal and Azure Sentinel.

The input can be a single IoC observable or a pandas DataFrame containing multiple observables. Depending on the provider, you may require an account and an API key. Some providers also enforce throttling (especially for free tiers), which might affect performing bulk lookups.

For more details see TIProviders and TILookup Usage Notebook

vtlookup

Wrapper class around Virus Total API. Input can be a single IoC observable or a pandas DataFrame containing multiple observables. Processing requires a Virus Total account and API key and processing performance is limited to the number of requests per minute for the account type that you have. Support IoC Types:

  • Filehash
  • URL
  • DNS Domain
  • IPv4 Address

VTLookup Notebook

geoip

Geographic location lookup for IP addresses.

Folium map

This module has two classes for different services:

Both services offer a free tier for non-commercial use. However, a paid tier will normally get you more accuracy, more detail and a higher throughput rate. Maxmind geolite uses a downloadable database, while IPStack is an online lookup (API key required).

GeoIP Lookup Notebook

eventcluster

This module is intended to be used to summarize large numbers of events into clusters of different patterns. High volume repeating events can often make it difficult to see unique and interesting items.

Clustering

This is an unsupervised learning module implemented using SciKit Learn DBScan.

The module contains functions to generate clusterable features from string data. For example, an administration command that does some maintenance on thousands of servers with a commandline like the following

install-update -hostname {host.fqdn} -tmp:/tmp/{GUID}/rollback

can be collapsed into a single cluster pattern by ignoring the character values of the host and guids in the string and using delimiters or tokens to group the values. This allows you to more easily see distinct patterns of activity.

Event Clustering Notebook

outliers

Similar to the eventcluster module, but a little bit more experimental (read 'less tested'). It uses SkLearn Isolation Forest to identify outlier events in a single data set or using one data set as training data and another on which to predict outliers.

auditdextract

Module to load and decode Linux audit logs. It collapses messages sharing the same message ID into single events, decodes hex-encoded data fields and performs some event-specific formatting and normalization (e.g. for process start events it will re-assemble the process command line arguments into a single string).

This is still a work-in-progress.

syslog_utils

Module to support an investigation of a linux host with only syslog logging enabled. This includes functions for collating host data, clusting logon events and detecting user sessions containing suspicious activity.

cmd_line

A module to support he detection of known malicious command line activity or suspicious patterns of command line activity.

Notebook tools sub-package - nbtools

This is a collection of display and utility modules designed to make working with security data in Jupyter notebooks quicker and easier.

  • nbwidgets - groups common functionality such as list pickers, time boundary settings, saving and retrieving environment variables into a single line callable command.
  • nbdisplay - functions that implement common display of things like alerts, events in a slightly more consumable way than print()
  • entityschema - implements entity classes (e.g. Host, Account, IPAddress) used in Log Analytics alerts and in many of these modules. Each entity encaspulates one or more properties related to the entity.

Notebook Tools Notebook and Event Timeline Visualization

Data sub-package - data

These components are currently still part of the nbtools sub-package but will be refactored to separate them into their own package.

  • QueryProvider - extensible query library targeting Log Analytics or OData endpoints. Built-in parameterized queries allow complex queries to be run from a single function call. Add your own queries using a simple YAML schema.
  • security_alert and security_event - encapsulation classes for alerts and events.
  • entity_schema - definitions for multiple entities (Host, Account, File, IPAddress, etc.)

Each has a standard 'entities' property reflecting the entities found in the alert or event. These can also be used as meta-parameters for many of the queries. For example, the following query will extract the value for the hostname query parameter from the alert:

qry.list_host_logons(query_times, alert)

Data Queries Notebook


Clone the notebooks in this repo to Azure Notebooks

Requires sign-in to Azure Notebooks

More Notebooks

View directly on GitHub or copy and paste the link into nbviewer.org

Notebook examples with saved data

See the following notebooks for more examples of the use of this package in practice:

To-Do Items

  • Add additional notebooks to document use of the tools.
  • Expand list of supported TI provider classes.

Supported Platforms and Packages

  • msticpy is OS-independent
  • Requires Python 3.6 or later
  • Requires the following python packages: pandas, bokeh, matplotlib, seaborn, setuptools, urllib3, ipywidgets, numpy, attrs, requests, networkx, ipython, scikit_learn, typing
  • The following packages are recommended and needed for some specific functionality: Kqlmagic, maxminddb_geolite2, folium, dnspython, ipwhois

See requirements.txt for more details and version requirements.


Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

You can’t perform that action at this time.