This describes the use of the :pyDataViewer<msticpy.nbtools.data_viewer.DataViewer>
control.
DataViewer uses the Bokeh DataTable control to provide some basic data manipulation features for viewing pandas DataFrames more easily:
- Scrollable data viewer taking a fixed amount of the output cell
- Sorting data by column
- Column selection
- Data filtering
There is also a notebook with the contents of this document
To view a DataFrame in the viewer just pass the DataFrame as the data parameter.
from msticpy.nbtools.data_viewer import DataViewer
import pandas as pd
# data is a pandas DataFrame
DataViewer(data)
You can start the viewer with a restricted set of columns by passing a list of column names as the selected_columns parameter
columns = [
"Account",
"EventID",
"TimeGenerated",
"Computer",
"NewProcessName",
"CommandLine",
"ParentProcessName",
]
DataViewer(data, selected_cols=columns)
Click on a column heading to sort the displayed data by that column. Click again on the same column header to sort in reverse order.
The right side list contains the available columns in the DataFrame, the left side is the list of columns to display.
Use the Add/Remove buttons (1 in the screen shot below) to add or remove columns from the selected set. You can select multiple columns using Ctrl+Click or Shift+Click (the former selects or deselects an item for each click, the latter selects a range of items between the last item selected and the currently-clicked item).
Click on Apply columns (2 in the screen shot) to update the data view.
viewer = DataViewer(data, selected_cols=columns)
You can apply multiple filters - each filter is additive, i.e. each is logically AND-ed with the others.
The Filter data drop down shows the following controls:
- Column selector drop-down - which column you want the filter to apply to
- Not checkbox - invert the logic of the filter (for this filter item only)
- Operator drop-down - the available operators are different for string and non-string (numeric and dates)
- Expression text box - type in the expression that you want to match
- Add filter button - adds the current filter items as a new filter expression to Current filters
- Update filter - overwrites the selected filter in Current filters with the current filter expression.
- Select the filter expression you want to operate on from the Filters list
- Delete filter deletes the selected item
- Clear all filters removes all filter expressions
- Apply filter - applies the filter items to the data and updates the display
Selecting the query operator from the filter expression operator drop down lets you type in a pandas query expression.
Note
the selected column is not relevant for this operator since you specify the column name within the query expression. You can select any column name.
See this documentation for the syntax of the pandas *query* method
Use the filtered_data
property of the DataViewer to retrieve a DataFrame corresponding to the current column and row filtering.
Note
column sorting is not captured in this data.
viewer.filtered_data
Account | CommandLine | Computer | EventID | NewProcessName | ParentProcessName | TimeGenerated |
---|---|---|---|---|---|---|
MSTICAlertsWin1MSTICAdmin | .rundll32.exe /C mshtml,RunHTMLApplication javascript:alert(tada!) | MSTICAlertsWin1 |
|
C:DiagnosticsUserTmprundll32.exe | C:WindowsSystem32cmd.exe | 2019-01-15 05:15:16.663000 |
MSTICAlertsWin1MSTICAdmin | cmd /c C:WindowsSystem32mshta.exe vbscript:CreateObject("Wscript.. | MSTICAlertsWin1 |
|
C:DiagnosticsUserTmpcmd.exe | C:WindowsSystem32cmd.exe | 2019-01-15 05:15:16.020000 |
MSTICAlertsWin1MSTICAdmin | .wuauclt.exe /C "c:windowssoftwaredistributioncscript.exe" | MSTICAlertsWin1 |
|
C:DiagnosticsUserTmpwuauclt.exe | C:WindowsSystem32cmd.exe | 2019-01-15 05:15:18.080000 |
MSTICAlertsWin1MSTICAdmin | .lsass.exe /C "c:windowssoftwaredistributioncscript.exe" | MSTICAlertsWin1 |
|
C:DiagnosticsUserTmplsass.exe | C:WindowsSystem32cmd.exe | 2019-01-15 05:15:18.287000 |
MSTICAlertsWin1MSTICAdmin | cmd /c "powershell wscript.shell used to download a .gif" | MSTICAlertsWin1 |
|
C:DiagnosticsUserTmpcmd.exe | C:WindowsSystem32cmd.exe | 2019-01-15 05:15:18.337000 |
MSTICAlertsWin1MSTICAdmin | cacls.exe c:windowssystem32wscript.exe /e /t /g everyone:f | MSTICAlertsWin1 |
|
C:DiagnosticsUserTmpcacls.exe | C:WindowsSystem32cmd.exe | 2019-01-15 05:15:18.403000 |
MSTICAlertsWin1MSTICAdmin | cmd /c echo /e:vbscript.encode /b | MSTICAlertsWin1 |
|
C:DiagnosticsUserTmpcmd.exe | C:WindowsSystem32cmd.exe | 2019-01-15 05:15:18.820000 |
You can export the current filter set as a dictionary:
viewer.filters
{"ParentProcessName contains 'cmd'": FilterExpr(column='ParentProcessName', inv=False, operator='contains', expr='cmd'),
"CommandLine contains 'script'": FilterExpr(column='CommandLine', inv=False, operator='contains', expr='script')}
You can import an existing filter set like this:
# manually add a filter
sample_filter = {
"ParentProcessName contains 'cmd'": ("ParentProcessName", False, "contains", "cmd"),
"CommandLine contains 'script'": ("CommandLine", False, "contains", "script"),
}
viewer.import_filters(sample_filter)
The format of the filter dictionary is:
{
"Filter name": Tuple({column_name}, {not}, {operator}, {expression}),
"Filter two": Tuple({column_name}, {not}, {operator}, {expression}),
...
}
You can also use the :pyFilterExpr<msticpy.nbtools.data_viewer.FilterExpr>
named tuple to specify each filter condition:
from msticpy.nbtools.data_viewer import FilterExpr
sample_filter = {
"ParentProcessName contains 'cmd'": FilterExpr(
column="ParentProcessName",
inv=False,
operator="contains",
expr="cmd"
),
...
}