This class allows you to extract IoC patterns from a string or a DataFrame. Several patterns are built in to the class and you can override these or supply new ones.
# Imports
import sys
MIN_REQ_PYTHON = (3,6)
if sys.version_info < MIN_REQ_PYTHON:
print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')
print('or later is selected as the active kernel.')
sys.exit("Python %s.%s or later is required.\n" % MIN_REQ_PYTHON)
from IPython.display import display, HTML
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', 100)
# Load test data
process_tree = pd.read_csv('data/process_tree.csv')
process_tree[['CommandLine']].head()
CommandLine | |
---|---|
0 | .\ftp -s:C:\RECYCLER\xxppyy.exe |
1 | .\reg not /domain:everything that /sid:shines is /krbtgt:golden ! |
2 | cmd /c "systeminfo && systeminfo" |
3 | .\rundll32 /C 42424.exe |
4 | .\rundll32 /C c:\users\MSTICAdmin\42424.exe |
Just pass the string as a parameter to the extract() method.
Get a commandline from our data set.
# get a commandline from our data set
cmdline = process_tree['CommandLine'].loc[78]
cmdline
'netsh start capture=yes IPv4.Address=1.2.3.4 tracefile=C:\\Users\\user\\AppData\\Local\\Temp\\bzzzzzz.txt'
Instantiate an IoCExtract instance and pass the string to the extract() method.
# Instantiate an IoCExtract object
from msticpy.transform import IoCExtract
ioc_extractor = IoCExtract()
# any IoCs in the string?
iocs_found = ioc_extractor.extract(cmdline)
if iocs_found:
print('\nPotential IoCs found in alert process:')
display(iocs_found)
Potential IoCs found in alert process:
- defaultdict(set,
- {'ipv4': {'1.2.3.4'},
'windows_path': {'C:\\Users\\user\\AppData\\Local\\Temp\\bzzzzzz.txt'}})
The following IoC patterns are searched for:
- ipv4
- ipv6
- dns
- url
- windows_path
- linux_path
- md5_hash
- sha1_hash
- sha256_hash
You can use the data=
parameter to IoCExtract.extract() to pass a DataFrame. Use the columns
parameter to specify which column or columns that you want to search.
Note
When searching a DataFrame the following types are not included in the search by default windows_path
and linux_path
because of the likely high volume of results and number of false positive matches. You can include them by specifing include_paths=True
as a parameter to extract()
.
You can also use the ioc_types
parameter to explicitly list the ioc_types that you want to search for. This should be a list of strings of valid types. See :pyioc_types<msticpy.transform.ioc_extractor.IoCExtract.ioc_types>
ioc_extractor = IoCExtract()
ioc_df = ioc_extractor.extract(data=process_tree, columns=['CommandLine'])
if len(ioc_df):
display(HTML("<h3>IoC patterns found in process tree.</h3>"))
display(ioc_df)
IoCType | Observable | SourceIndex | |
---|---|---|---|
48 | windows_path | .\powershell | 36 |
49 | url | http://somedomain/best-kitten-names-1.jpg' | 37 |
53 | windows_path | .\pOWErS^H^ElL^.eX^e^ | 37 |
58 | md5_hash | 81ed03caf6901e444c72ac67d192fb9c | 44 |
59 | url | http://badguyserver/pwnme" | 46 |
68 | windows_path | .\reg query add mscfile\\\\open | 59 |
72 | windows_path | \system\CurrentControlSet\Control\Terminal | 63 |
92 | ipv4 | 1.2.3.4 | 78 |
108 | ipv4 | 127.0.0.1 | 102 |
109 | url | http://127.0.0.1/ | 102 |
110 | windows_path | \SOFTWARE\Microsoft\Windows NT\CurrentVersion\Svchost\MyNastySvcHostConfig | 103 |
See :pyIoCExtract<msticpy.transform.ioc_extractor.IoCExtract>
and See :pyIoCExtract<msticpy.transform.ioc_extractor.IoCExtract.extract>
from html import escape
extractor = IoCExtract()
for ioc_type, pattern in extractor.ioc_types.items():
esc_pattern = escape(pattern.comp_regex.pattern)
display(HTML(f'<b>{ioc_type}</b>'))
display(HTML(f'<div style="margin-left:20px"><pre>{esc_pattern}</pre></div>)'))
IoCType | Regex |
---|---|
ipv4 | (?P<ipaddress>(?:[0-9]{1,3}\\.){3}[0-9]{1,3}) |
ipv6 | (?<![:.\\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\\w]) |
dns | ((?=[a-z0-9-]{1,63}\\.)[a-z0-9]+(-[a-z0-9]+)*\\.){2,}[a-z]{2,63} |
url |
(?P<protocol>(https?|ftp|telnet|ldap|file)://) (?P<userinfo>([a-z0-9-._~!$&\\'()*+,;=:]|%[0-9A-F]{2})*@)? (?P<host>([a-z0-9-._~!$&\\'()*+,;=]|%[0-9A-F]{2})*) |
windows_path |
(?P<root>[a-z]:|\\\\\\\\[a-z0-9_.$-]+||[.]+) (?P<folder>\\\\(?:[^\\/:*?"\\\'<>|\\r\\n]+\\\\)*) > (?P<file>[^\\\\/*?""<>|\\r\\n ]+) |
linux_path |
(?P<root>/+||[.]+) (?P<folder>/(?:[^\\\\/:*?<>|\\r\\n]+/)*) (?P<file>[^/\\0<>|\\r\\n ]+) |
md5_hash | (?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{32})(?:$|[^A-Fa-f0-9]) |
sha1_hash | (?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{40})(?:$|[^A-Fa-f0-9]) |
ipv6 | (?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{64})(?:$|[^A-Fa-f0-9]) |
See :pyadd_ioc_type<msticpy.transform.ioc_extractor.IoCExtract.add_ioc_type>
Add an IoC type and regular expression to use to the built-in set.
Warning
Adding an ioc_type that exists in the internal set will overwrite that item
Regular expressions are compiled with re.I | re.X | re.M (Ignore case, Verbose and MultiLine)
add_ioc_type parameters:
- ioc_type{str} - a unique name for the IoC type
- ioc_regex{str} - a regular expression used to search for the type
import re
rcomp = re.compile(r'(?P<pipe>\\\\\.\\pipe\\[^\s\\]+)')
extractor.add_ioc_type(ioc_type='win_named_pipe', ioc_regex=r'(?P<pipe>\\\\\.\\pipe\\[^\s\\]+)')
# Check that it added ok
print(extractor.ioc_types['win_named_pipe'])
# Use it in our data set
ioc_extractor.extract(data=process_tree, columns=['CommandLine']).query('IoCType == \'win_named_pipe\'')
IoCPattern(ioc_type='win_named_pipe', comp_regex=re.compile('(?P<pipe>\\\\\.\\pipe\\[^\s\\]+)', re.IGNORECASEre.VERBOSE), priority=0)
IoCType | Observable | SourceIndex | |
---|---|---|---|
116 | win_named_pipe | \\.\pipe\blahtest" | 107 |
extract_df
functions identically to extract
with a data
parameter. It may be more convenient to use this when you know that your input is a DataFrame
ioc_extractor.extract_df(process_tree, columns=['NewProcessName', 'CommandLine']).head(10)
The SourceIndex column allows you to merge the results with the input DataFrame Where an input row has multiple IoC matches the output of this merge will result in duplicate rows from the input (one per IoC match). The previous index is preserved in the second column (and in the SourceIndex column).
Note: you will need to set the type of the SourceIndex column. In the example below case we are matching with the default numeric index so we force the type to be numeric. In cases where you are using an index of a different dtype you will need to convert the SourceIndex (dtype=object) to match the type of your index column.
input_df = data=process_tree.head(20)
output_df = ioc_extractor.extract(data=input_df, columns=['NewProcessName', 'CommandLine'])
# set the type of the SourceIndex column. In this case we are matching with the default numeric index.
output_df['SourceIndex'] = pd.to_numeric(output_df['SourceIndex'])
merged_df = pd.merge(left=input_df, right=output_df, how='outer', left_index=True, right_on='SourceIndex')
merged_df.head()
.. | TenantId | Account | EventID | TimeGenerated | Computer | SubjectUserSid | SubjectUserName | SubjectDomainName | SubjectLogonId | NewProcessId | NewProcessName | TokenElevationType | ProcessId | CommandLine | ParentProcessName | TargetLogonId | SourceComputerId | TimeCreatedUtc | NodeRole | Level | ProcessId1 | NewProcessId1 | IoCType | Observable | SourceIndex |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
802d39e1-9d70-404d-832c-2de5e2478eda | MSTICAlertsWin1MSTICAdmin |
|
2019-01-15 05:15:15.677 | MSTICAlertsWin1 | S-1-5-21-996632719-2361334927-4038480536-500 | MSTICAdmin | MSTICAlertsWin1 | 0xfaac27 | 0x1580 | C:DiagnosticsUserTmpftp.exe | %%1936 | 0xbc8 | .ftp -s:C:RECYCLERxxppyy.exe | C:WindowsSystem32cmd.exe | 0x0 | 46fe7078-61bb-4bed-9430-7ac01d91c273 | 2019-01-15 05:15:15.677 | source |
|
|
|
|
|
|
|
802d39e1-9d70-404d-832c-2de5e2478eda | MSTICAlertsWin1MSTICAdmin |
|
2019-01-15 05:15:16.167 | MSTICAlertsWin1 | S-1-5-21-996632719-2361334927-4038480536-500 | MSTICAdmin | MSTICAlertsWin1 | 0xfaac27 | 0x16fc | C:DiagnosticsUserTmpreg.exe | %%1936 | 0xbc8 | .reg not /domain:everything that /sid:shines is /krbtgt:golden ! | C:WindowsSystem32cmd.exe | 0x0 | 46fe7078-61bb-4bed-9430-7ac01d91c273 | 2019-01-15 05:15:16.167 | sibling |
|
|
|
|
|
|
|
802d39e1-9d70-404d-832c-2de5e2478eda | MSTICAlertsWin1MSTICAdmin |
|
2019-01-15 05:15:16.277 | MSTICAlertsWin1 | S-1-5-21-996632719-2361334927-4038480536-500 | MSTICAdmin | MSTICAlertsWin1 | 0xfaac27 | 0x1700 | C:DiagnosticsUserTmpcmd.exe | %%1936 | 0xbc8 | cmd /c "systeminfo && systeminfo" | C:WindowsSystem32cmd.exe | 0x0 | 46fe7078-61bb-4bed-9430-7ac01d91c273 | 2019-01-15 05:15:16.277 | sibling |
|
|
|
|
|
|
|
802d39e1-9d70-404d-832c-2de5e2478eda | MSTICAlertsWin1MSTICAdmin |
|
2019-01-15 05:15:16.340 | MSTICAlertsWin1 | S-1-5-21-996632719-2361334927-4038480536-500 | MSTICAdmin | MSTICAlertsWin1 | 0xfaac27 | 0x1728 | C:DiagnosticsUserTmprundll32.exe | %%1936 | 0xbc8 | .rundll32 /C 42424.exe | C:WindowsSystem32cmd.exe | 0x0 | 46fe7078-61bb-4bed-9430-7ac01d91c273 | 2019-01-15 05:15:16.340 | sibling |
|
|
|
|
|
|
|
802d39e1-9d70-404d-832c-2de5e2478eda | MSTICAlertsWin1MSTICAdmin |
|
2019-01-15 05:15:16.400 | MSTICAlertsWin1 | S-1-5-21-996632719-2361334927-4038480536-500 | MSTICAdmin | MSTICAlertsWin1 | 0xfaac27 | 0x175c | C:DiagnosticsUserTmprundll32.exe | %%1936 | 0xbc8 | .rundll32 /C c:usersMSTICAdmin42424.exe | C:WindowsSystem32cmd.exe | 0x0 | 46fe7078-61bb-4bed-9430-7ac01d91c273 | 2019-01-15 05:15:16.400 | sibling |
|
|
|
|
|
|
You can use the line magic %ioc
or cell magic %%ioc
to extract IoCs from text pasted directly into a cell
The ioc magic supports the following options:
--out OUT, -o OUT
The variable to return the results in the variable `OUT`
Note: the output variable is a dictionary iocs grouped by IoC Type
--ioc_types IOC_TYPES, -i IOC_TYPES
The types of IoC to search for (comma-separated string)
%%ioc --out ioc_capture
netsh start capture=yes IPv4.Address=1.2.3.4 tracefile=C:\Users\user\AppData\Local\Temp\bzzzzzz.txt
hostname customers-service.ddns.net Feb 5, 2020, 2:20:35 PM 7
URL \https://two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=password Feb 5, 2020, 2:20:35 PM 1
hostname mobile.phonechallenges-submit.site Feb 5, 2020, 2:20:35 PM 8
hostname youtube.service-activity-checkup.site Feb 5, 2020, 2:20:35 PM 8
hostname www.drive-accounts.com Feb 5, 2020, 2:20:35 PM 7
hostname google.drive-accounts.com Feb 5, 2020, 2:20:35 PM 7
domain niaconucil.org Feb 5, 2020, 2:20:35 PM 11
domain isis-online.net Feb 5, 2020, 2:20:35 PM 11
domain bahaius.info Feb 5, 2020, 2:20:35 PM 11
domain w3-schools.org Feb 5, 2020, 2:20:35 PM 12
domain system-services.site Feb 5, 2020, 2:20:35 PM 11
domain accounts-drive.com Feb 5, 2020, 2:20:35 PM 8
domain drive-accounts.com Feb 5, 2020, 2:20:35 PM 10
domain service-issues.site Feb 5, 2020, 2:20:35 PM 8
domain two-step-checkup.site Feb 5, 2020, 2:20:35 PM 8
domain customers-activities.site Feb 5, 2020, 2:20:35 PM 11
domain seisolarpros.org Feb 5, 2020, 2:20:35 PM 11
domain yah00.site Feb 5, 2020, 2:20:35 PM 4
domain skynevvs.com Feb 5, 2020, 2:20:35 PM 11
domain recovery-options.site Feb 5, 2020, 2:20:35 PM 4
domain malcolmrifkind.site Feb 5, 2020, 2:20:35 PM 8
domain instagram-com.site Feb 5, 2020, 2:20:35 PM 8
domain leslettrespersanes.net Feb 5, 2020, 2:20:35 PM 11
domain software-updating-managers.site Feb 5, 2020, 2:20:35 PM 8
domain cpanel-services.site Feb 5, 2020, 2:20:35 PM 8
domain service-activity-checkup.site Feb 5, 2020, 2:20:35 PM 7
domain inztaqram.ga Feb 5, 2020, 2:20:35 PM 8
domain unirsd.com Feb 5, 2020, 2:20:35 PM 8
domain phonechallenges-submit.site Feb 5, 2020, 2:20:35 PM 7
domain acconut-verify.com Feb 5, 2020, 2:20:35 PM 11
domain finance-usbnc.info Feb 5, 2020, 2:20:35 PM 8
FileHash-MD5 542128ab98bda5ea139b169200a50bce Feb 5, 2020, 2:20:35 PM 3
FileHash-MD5 3d67ce57aab4f7f917cf87c724ed7dab Feb 5, 2020, 2:20:35 PM 3
hostname x09live-ix3b.account-profile-users.info Feb 6, 2020, 2:56:07 PM 0
hostname www.phonechallenges-submit.site Feb 6, 2020, 2:56:07 PM
- [('ipv4', ['1.2.3.4']),
- ('dns',
- ['malcolmrifkind.site',
'w3-schools.org', 'niaconucil.org', 'software-updating-managers.site', 'isis-online.net', 'accounts-drive.com', 'cpanel-services.site', 'service-activity-checkup.site', 'service-issues.site', 'recovery-options.site', 'instagram-com.site', 'mobile.phonechallenges-submit.site', 'youtube.service-activity-checkup.site', 'google.drive-accounts.com', 'phonechallenges-submit.site', 'drive-accounts.com', 'www.phonechallenges-submit.site', 'yah00.site', 'seisolarpros.org', 'customers-activities.site', 'bahaius.info', 'system-services.site', 'two-step-checkup.site', 'x09live-ix3b.account-profile-users.info', 'customers-service.ddns.net', 'leslettrespersanes.net', 'www.drive-accounts.com', 'acconut-verify.com', 'finance-usbnc.info', 'unirsd.com', 'skynevvs.com', 'inztaqram.ga']),
- ('url',
('windows_path', ['C:\Users\user\AppData\Local\Temp\bzzzzzz.txt']), ('linux_path', ['//two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=passwordttFeb']), ('md5_hash', ['3d67ce57aab4f7f917cf87c724ed7dab', '542128ab98bda5ea139b169200a50bce'])]
%%ioc --ioc_types "ipv4, ipv6, linux_path, md5_hash"
netsh start capture=yes IPv4.Address=1.2.3.4 tracefile=C:\Users\user\AppData\Local\Temp\bzzzzzz.txt
tracefile2=/usr/localbzzzzzz.sh
hostname customers-service.ddns.net Feb 5, 2020, 2:20:35 PM 7
URL \https://two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=password Feb 5, 2020, 2:20:35 PM 1
hostname mobile.phonechallenges-submit.site Feb 5, 2020, 2:20:35 PM 8
hostname youtube.service-activity-checkup.site Feb 5, 2020, 2:20:35 PM 8
hostname www.drive-accounts.com Feb 5, 2020, 2:20:35 PM 7
hostname google.drive-accounts.com Feb 5, 2020, 2:20:35 PM 7
domain niaconucil.org Feb 5, 2020, 2:20:35 PM 11
domain isis-online.net Feb 5, 2020, 2:20:35 PM 11
domain bahaius.info Feb 5, 2020, 2:20:35 PM 11
domain w3-schools.org Feb 5, 2020, 2:20:35 PM 12
domain system-services.site Feb 5, 2020, 2:20:35 PM 11
domain accounts-drive.com Feb 5, 2020, 2:20:35 PM 8
domain drive-accounts.com Feb 5, 2020, 2:20:35 PM 10
domain service-issues.site Feb 5, 2020, 2:20:35 PM 8
domain two-step-checkup.site Feb 5, 2020, 2:20:35 PM 8
domain customers-activities.site Feb 5, 2020, 2:20:35 PM 11
domain seisolarpros.org Feb 5, 2020, 2:20:35 PM 11
domain yah00.site Feb 5, 2020, 2:20:35 PM 4
domain skynevvs.com Feb 5, 2020, 2:20:35 PM 11
domain recovery-options.site Feb 5, 2020, 2:20:35 PM 4
domain malcolmrifkind.site Feb 5, 2020, 2:20:35 PM 8
domain instagram-com.site Feb 5, 2020, 2:20:35 PM 8
domain leslettrespersanes.net Feb 5, 2020, 2:20:35 PM 11
domain software-updating-managers.site Feb 5, 2020, 2:20:35 PM 8
domain cpanel-services.site Feb 5, 2020, 2:20:35 PM 8
domain service-activity-checkup.site Feb 5, 2020, 2:20:35 PM 7
domain inztaqram.ga Feb 5, 2020, 2:20:35 PM 8
domain unirsd.com Feb 5, 2020, 2:20:35 PM 8
domain phonechallenges-submit.site Feb 5, 2020, 2:20:35 PM 7
domain acconut-verify.com Feb 5, 2020, 2:20:35 PM 11
domain finance-usbnc.info Feb 5, 2020, 2:20:35 PM 8
FileHash-MD5 542128ab98bda5ea139b169200a50bce Feb 5, 2020, 2:20:35 PM 3
FileHash-MD5 3d67ce57aab4f7f917cf87c724ed7dab Feb 5, 2020, 2:20:35 PM 3
hostname x09live-ix3b.account-profile-users.info Feb 6, 2020, 2:56:07 PM 0
hostname www.phonechallenges-submit.site Feb 6, 2020, 2:56:07 PM
- [('ipv4', ['1.2.3.4']),
- ('linux_path',
- ['//two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=passwordttFeb',
'/usr/localbzzzzzz.sh']),
- ('md5_hash',
['3d67ce57aab4f7f917cf87c724ed7dab', '542128ab98bda5ea139b169200a50bce'])]
The decoding functionality is also available in a pandas extension mp_ioc
. This supports a single method extract()
.
This supports the same syntax as extract
(described earlier).
process_tree.mp_ioc.extract(columns=['CommandLine'])
IoCType | Observable | SourceIndex | |
---|---|---|---|
0 | dns | microsoft.com | 24 |
1 | url | http://server/file.sct | 31 |
2 | dns | server | 31 |
3 | dns | evil.ps | 35 |
4 | url | http://somedomain/best-kitten-names-1.jpg' | 37 |
5 | dns | somedomain | 37 |
6 | dns | blah.ps | 40 |
7 | md5_hash | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | 40 |
8 | dns | blah.ps | 41 |
9 | md5_hash | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | 41 |
10 | md5_hash | 81ed03caf6901e444c72ac67d192fb9c | 44 |
11 | url | http://badguyserver/pwnme | 46 |
12 | dns | badguyserver | 46 |
13 | url | http://badguyserver/pwnme | 47 |
14 | dns | badguyserver | 47 |
15 | dns | Invoke-Shellcode.ps | 48 |
16 | dns | Invoke-ReverseDnsLookup.ps | 49 |
17 | dns | Wscript.Shell | 67 |
18 | url | http://system.management.automation.amsiutils').getfield('amsiinitfailed','nonpublic,static').s... | 77 |
19 | dns | system.management.automation.amsiutils').getfield('amsiinitfailed','nonpublic,static').setvalue(... | 77 |
20 | ipv4 | 1.2.3.4 | 78 |
21 | dns | wscript.shell | 81 |
22 | dns | abc.com | 90 |
23 | ipv4 | 127.0.0.1 | 102 |
24 | url | http://127.0.0.1/ | 102 |
25 | win_named_pipe | \.pipeblahtest" | 107 |
Note
the URLs in the previous table have been altered to prevent inadvertent navigation to them.