Skip to content

Latest commit

 

History

History
739 lines (583 loc) · 33.4 KB

IoCExtract.rst

File metadata and controls

739 lines (583 loc) · 33.4 KB

IoC Extraction

This class allows you to extract IoC patterns from a string or a DataFrame. Several patterns are built in to the class and you can override these or supply new ones.

# Imports
import sys
MIN_REQ_PYTHON = (3,6)
if sys.version_info < MIN_REQ_PYTHON:
    print('Check the Kernel->Change Kernel menu and ensure that Python 3.6')
    print('or later is selected as the active kernel.')
    sys.exit("Python %s.%s or later is required.\n" % MIN_REQ_PYTHON)

from IPython.display import display, HTML
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', 100)
# Load test data
process_tree = pd.read_csv('data/process_tree.csv')
process_tree[['CommandLine']].head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
CommandLine
0 .\ftp -s:C:\RECYCLER\xxppyy.exe
1 .\reg not /domain:everything that /sid:shines is /krbtgt:golden !
2 cmd /c "systeminfo && systeminfo"
3 .\rundll32 /C 42424.exe
4 .\rundll32 /C c:\users\MSTICAdmin\42424.exe

Looking for IoC in a String

Just pass the string as a parameter to the extract() method.

Get a commandline from our data set.

# get a commandline from our data set
cmdline = process_tree['CommandLine'].loc[78]
cmdline

'netsh start capture=yes IPv4.Address=1.2.3.4 tracefile=C:\\Users\\user\\AppData\\Local\\Temp\\bzzzzzz.txt'

Instantiate an IoCExtract instance and pass the string to the extract() method.

# Instantiate an IoCExtract object
from msticpy.transform import IoCExtract
ioc_extractor = IoCExtract()

# any IoCs in the string?
iocs_found = ioc_extractor.extract(cmdline)

if iocs_found:
    print('\nPotential IoCs found in alert process:')
    display(iocs_found)

Potential IoCs found in alert process:

defaultdict(set,
{'ipv4': {'1.2.3.4'},

'windows_path': {'C:\\Users\\user\\AppData\\Local\\Temp\\bzzzzzz.txt'}})

The following IoC patterns are searched for:

  • ipv4
  • ipv6
  • dns
  • url
  • windows_path
  • linux_path
  • md5_hash
  • sha1_hash
  • sha256_hash

Using a DataFrame as Input

You can use the data= parameter to IoCExtract.extract() to pass a DataFrame. Use the columns parameter to specify which column or columns that you want to search.

Note

When searching a DataFrame the following types are not included in the search by default windows_path and linux_path because of the likely high volume of results and number of false positive matches. You can include them by specifing include_paths=True as a parameter to extract().

You can also use the ioc_types parameter to explicitly list the ioc_types that you want to search for. This should be a list of strings of valid types. See :pyioc_types<msticpy.transform.ioc_extractor.IoCExtract.ioc_types>

ioc_extractor = IoCExtract()
ioc_df = ioc_extractor.extract(data=process_tree, columns=['CommandLine'])
if len(ioc_df):
    display(HTML("<h3>IoC patterns found in process tree.</h3>"))
    display(ioc_df)

IoC patterns found in process tree.

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
IoCType Observable SourceIndex
48 windows_path .\powershell 36
49 url http://somedomain/best-kitten-names-1.jpg' 37
53 windows_path .\pOWErS^H^ElL^.eX^e^ 37
58 md5_hash 81ed03caf6901e444c72ac67d192fb9c 44
59 url http://badguyserver/pwnme" 46
68 windows_path .\reg query add mscfile\\\\open 59
72 windows_path \system\CurrentControlSet\Control\Terminal 63
92 ipv4 1.2.3.4 78
108 ipv4 127.0.0.1 102
109 url http://127.0.0.1/ 102
110 windows_path \SOFTWARE\Microsoft\Windows NT\CurrentVersion\Svchost\MyNastySvcHostConfig 103

IoCExtractor API

See :pyIoCExtract<msticpy.transform.ioc_extractor.IoCExtract> and See :pyIoCExtract<msticpy.transform.ioc_extractor.IoCExtract.extract>

Predefined Regex Patterns

from html import escape
extractor = IoCExtract()

for ioc_type, pattern in extractor.ioc_types.items():
    esc_pattern = escape(pattern.comp_regex.pattern)
    display(HTML(f'<b>{ioc_type}</b>'))
    display(HTML(f'<div style="margin-left:20px"><pre>{esc_pattern}</pre></div>)'))
IoCType Regex
ipv4
(?P<ipaddress>(?:[0-9]{1,3}\\.){3}[0-9]{1,3})
ipv6
(?<![:.\\w])(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}(?![:.\\w])
dns
((?=[a-z0-9-]{1,63}\\.)[a-z0-9]+(-[a-z0-9]+)*\\.){2,}[a-z]{2,63}
url
(?P<protocol>(https?|ftp|telnet|ldap|file)://)
(?P<userinfo>([a-z0-9-._~!$&\\'()*+,;=:]|%[0-9A-F]{2})*@)?
(?P<host>([a-z0-9-._~!$&\\'()*+,;=]|%[0-9A-F]{2})*)
windows_path
(?P<root>[a-z]:|\\\\\\\\[a-z0-9_.$-]+||[.]+)
(?P<folder>\\\\(?:[^\\/:*?"\\\'<>|\\r\\n]+\\\\)*)
>
(?P<file>[^\\\\/*?""<>|\\r\\n ]+)
linux_path
(?P<root>/+||[.]+)
(?P<folder>/(?:[^\\\\/:*?<>|\\r\\n]+/)*)
(?P<file>[^/\\0<>|\\r\\n ]+)
md5_hash
(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{32})(?:$|[^A-Fa-f0-9])
sha1_hash
(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{40})(?:$|[^A-Fa-f0-9])
ipv6
(?:^|[^A-Fa-f0-9])(?P<hash>[A-Fa-f0-9]{64})(?:$|[^A-Fa-f0-9])

Adding your own pattern(s)

See :pyadd_ioc_type<msticpy.transform.ioc_extractor.IoCExtract.add_ioc_type>

Add an IoC type and regular expression to use to the built-in set.

Warning

Adding an ioc_type that exists in the internal set will overwrite that item

Regular expressions are compiled with re.I | re.X | re.M (Ignore case, Verbose and MultiLine)

add_ioc_type parameters:

  • ioc_type{str} - a unique name for the IoC type
  • ioc_regex{str} - a regular expression used to search for the type
import re
rcomp = re.compile(r'(?P<pipe>\\\\\.\\pipe\\[^\s\\]+)')
extractor.add_ioc_type(ioc_type='win_named_pipe', ioc_regex=r'(?P<pipe>\\\\\.\\pipe\\[^\s\\]+)')

# Check that it added ok
print(extractor.ioc_types['win_named_pipe'])

# Use it in our data set
ioc_extractor.extract(data=process_tree, columns=['CommandLine']).query('IoCType == \'win_named_pipe\'')

IoCPattern(ioc_type='win_named_pipe', comp_regex=re.compile('(?P<pipe>\\\\\.\\pipe\\[^\s\\]+)', re.IGNORECASEre.VERBOSE), priority=0)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
IoCType Observable SourceIndex
116 win_named_pipe \\.\pipe\blahtest" 107

extract_df()

extract_df functions identically to extract with a data parameter. It may be more convenient to use this when you know that your input is a DataFrame

ioc_extractor.extract_df(process_tree, columns=['NewProcessName', 'CommandLine']).head(10)

Merging output with source data

The SourceIndex column allows you to merge the results with the input DataFrame Where an input row has multiple IoC matches the output of this merge will result in duplicate rows from the input (one per IoC match). The previous index is preserved in the second column (and in the SourceIndex column).

Note: you will need to set the type of the SourceIndex column. In the example below case we are matching with the default numeric index so we force the type to be numeric. In cases where you are using an index of a different dtype you will need to convert the SourceIndex (dtype=object) to match the type of your index column.

input_df = data=process_tree.head(20)
output_df = ioc_extractor.extract(data=input_df, columns=['NewProcessName', 'CommandLine'])
# set the type of the SourceIndex column. In this case we are matching with the default numeric index.
output_df['SourceIndex'] = pd.to_numeric(output_df['SourceIndex'])
merged_df = pd.merge(left=input_df, right=output_df, how='outer', left_index=True, right_on='SourceIndex')
merged_df.head()
.. TenantId Account EventID TimeGenerated Computer SubjectUserSid SubjectUserName SubjectDomainName SubjectLogonId NewProcessId NewProcessName TokenElevationType ProcessId CommandLine ParentProcessName TargetLogonId SourceComputerId TimeCreatedUtc NodeRole Level ProcessId1 NewProcessId1 IoCType Observable SourceIndex

0

802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin

4688

2019-01-15 05:15:15.677 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x1580 C:DiagnosticsUserTmpftp.exe %%1936 0xbc8 .ftp -s:C:RECYCLERxxppyy.exe C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:15.677 source

0

nan

nan

nan

nan

0

1

802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin

4688

2019-01-15 05:15:16.167 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x16fc C:DiagnosticsUserTmpreg.exe %%1936 0xbc8 .reg not /domain:everything that /sid:shines is /krbtgt:golden ! C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:16.167 sibling

1

nan

nan

nan

nan

1

2

802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin

4688

2019-01-15 05:15:16.277 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x1700 C:DiagnosticsUserTmpcmd.exe %%1936 0xbc8 cmd /c "systeminfo && systeminfo" C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:16.277 sibling

1

nan

nan

nan

nan

2

3

802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin

4688

2019-01-15 05:15:16.340 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x1728 C:DiagnosticsUserTmprundll32.exe %%1936 0xbc8 .rundll32 /C 42424.exe C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:16.340 sibling

1

nan

nan

nan

nan

3

4

802d39e1-9d70-404d-832c-2de5e2478eda MSTICAlertsWin1MSTICAdmin

4688

2019-01-15 05:15:16.400 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0xfaac27 0x175c C:DiagnosticsUserTmprundll32.exe %%1936 0xbc8 .rundll32 /C c:usersMSTICAdmin42424.exe C:WindowsSystem32cmd.exe 0x0 46fe7078-61bb-4bed-9430-7ac01d91c273 2019-01-15 05:15:16.400 sibling

1

nan

nan

nan

nan

4

IPython magic

You can use the line magic %ioc or cell magic %%ioc to extract IoCs from text pasted directly into a cell

The ioc magic supports the following options:

--out OUT, -o OUT
    The variable to return the results in the variable `OUT`
    Note: the output variable is a dictionary iocs grouped by IoC Type
--ioc_types IOC_TYPES, -i IOC_TYPES
    The types of IoC to search for (comma-separated string)
%%ioc --out ioc_capture
netsh  start capture=yes IPv4.Address=1.2.3.4 tracefile=C:\Users\user\AppData\Local\Temp\bzzzzzz.txt
hostname    customers-service.ddns.net      Feb 5, 2020, 2:20:35 PM     7
URL \https://two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=password        Feb 5, 2020, 2:20:35 PM     1
hostname    mobile.phonechallenges-submit.site      Feb 5, 2020, 2:20:35 PM     8
hostname    youtube.service-activity-checkup.site       Feb 5, 2020, 2:20:35 PM     8
hostname    www.drive-accounts.com      Feb 5, 2020, 2:20:35 PM     7
hostname    google.drive-accounts.com       Feb 5, 2020, 2:20:35 PM     7
domain  niaconucil.org      Feb 5, 2020, 2:20:35 PM     11
domain  isis-online.net     Feb 5, 2020, 2:20:35 PM     11
domain  bahaius.info        Feb 5, 2020, 2:20:35 PM     11
domain  w3-schools.org      Feb 5, 2020, 2:20:35 PM     12
domain  system-services.site        Feb 5, 2020, 2:20:35 PM     11
domain  accounts-drive.com      Feb 5, 2020, 2:20:35 PM     8
domain  drive-accounts.com      Feb 5, 2020, 2:20:35 PM     10
domain  service-issues.site     Feb 5, 2020, 2:20:35 PM     8
domain  two-step-checkup.site       Feb 5, 2020, 2:20:35 PM     8
domain  customers-activities.site       Feb 5, 2020, 2:20:35 PM     11
domain  seisolarpros.org        Feb 5, 2020, 2:20:35 PM     11
domain  yah00.site      Feb 5, 2020, 2:20:35 PM     4
domain  skynevvs.com        Feb 5, 2020, 2:20:35 PM     11
domain  recovery-options.site       Feb 5, 2020, 2:20:35 PM     4
domain  malcolmrifkind.site     Feb 5, 2020, 2:20:35 PM     8
domain  instagram-com.site      Feb 5, 2020, 2:20:35 PM     8
domain  leslettrespersanes.net      Feb 5, 2020, 2:20:35 PM     11
domain  software-updating-managers.site     Feb 5, 2020, 2:20:35 PM     8
domain  cpanel-services.site        Feb 5, 2020, 2:20:35 PM     8
domain  service-activity-checkup.site       Feb 5, 2020, 2:20:35 PM     7
domain  inztaqram.ga        Feb 5, 2020, 2:20:35 PM     8
domain  unirsd.com      Feb 5, 2020, 2:20:35 PM     8
domain  phonechallenges-submit.site     Feb 5, 2020, 2:20:35 PM     7
domain  acconut-verify.com      Feb 5, 2020, 2:20:35 PM     11
domain  finance-usbnc.info      Feb 5, 2020, 2:20:35 PM     8
FileHash-MD5    542128ab98bda5ea139b169200a50bce        Feb 5, 2020, 2:20:35 PM     3
FileHash-MD5    3d67ce57aab4f7f917cf87c724ed7dab        Feb 5, 2020, 2:20:35 PM     3
hostname    x09live-ix3b.account-profile-users.info     Feb 6, 2020, 2:56:07 PM     0
hostname    www.phonechallenges-submit.site     Feb 6, 2020, 2:56:07 PM
[('ipv4', ['1.2.3.4']),
('dns',
['malcolmrifkind.site',

'w3-schools.org', 'niaconucil.org', 'software-updating-managers.site', 'isis-online.net', 'accounts-drive.com', 'cpanel-services.site', 'service-activity-checkup.site', 'service-issues.site', 'recovery-options.site', 'instagram-com.site', 'mobile.phonechallenges-submit.site', 'youtube.service-activity-checkup.site', 'google.drive-accounts.com', 'phonechallenges-submit.site', 'drive-accounts.com', 'www.phonechallenges-submit.site', 'yah00.site', 'seisolarpros.org', 'customers-activities.site', 'bahaius.info', 'system-services.site', 'two-step-checkup.site', 'x09live-ix3b.account-profile-users.info', 'customers-service.ddns.net', 'leslettrespersanes.net', 'www.drive-accounts.com', 'acconut-verify.com', 'finance-usbnc.info', 'unirsd.com', 'skynevvs.com', 'inztaqram.ga']),

('url',

['https://two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=password']),

('windows_path', ['C:\Users\user\AppData\Local\Temp\bzzzzzz.txt']), ('linux_path', ['//two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=passwordttFeb']), ('md5_hash', ['3d67ce57aab4f7f917cf87c724ed7dab', '542128ab98bda5ea139b169200a50bce'])]

%%ioc --ioc_types "ipv4, ipv6, linux_path, md5_hash"
netsh  start capture=yes IPv4.Address=1.2.3.4 tracefile=C:\Users\user\AppData\Local\Temp\bzzzzzz.txt
tracefile2=/usr/localbzzzzzz.sh
hostname    customers-service.ddns.net      Feb 5, 2020, 2:20:35 PM     7
URL \https://two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=password        Feb 5, 2020, 2:20:35 PM     1
hostname    mobile.phonechallenges-submit.site      Feb 5, 2020, 2:20:35 PM     8
hostname    youtube.service-activity-checkup.site       Feb 5, 2020, 2:20:35 PM     8
hostname    www.drive-accounts.com      Feb 5, 2020, 2:20:35 PM     7
hostname    google.drive-accounts.com       Feb 5, 2020, 2:20:35 PM     7
domain  niaconucil.org      Feb 5, 2020, 2:20:35 PM     11
domain  isis-online.net     Feb 5, 2020, 2:20:35 PM     11
domain  bahaius.info        Feb 5, 2020, 2:20:35 PM     11
domain  w3-schools.org      Feb 5, 2020, 2:20:35 PM     12
domain  system-services.site        Feb 5, 2020, 2:20:35 PM     11
domain  accounts-drive.com      Feb 5, 2020, 2:20:35 PM     8
domain  drive-accounts.com      Feb 5, 2020, 2:20:35 PM     10
domain  service-issues.site     Feb 5, 2020, 2:20:35 PM     8
domain  two-step-checkup.site       Feb 5, 2020, 2:20:35 PM     8
domain  customers-activities.site       Feb 5, 2020, 2:20:35 PM     11
domain  seisolarpros.org        Feb 5, 2020, 2:20:35 PM     11
domain  yah00.site      Feb 5, 2020, 2:20:35 PM     4
domain  skynevvs.com        Feb 5, 2020, 2:20:35 PM     11
domain  recovery-options.site       Feb 5, 2020, 2:20:35 PM     4
domain  malcolmrifkind.site     Feb 5, 2020, 2:20:35 PM     8
domain  instagram-com.site      Feb 5, 2020, 2:20:35 PM     8
domain  leslettrespersanes.net      Feb 5, 2020, 2:20:35 PM     11
domain  software-updating-managers.site     Feb 5, 2020, 2:20:35 PM     8
domain  cpanel-services.site        Feb 5, 2020, 2:20:35 PM     8
domain  service-activity-checkup.site       Feb 5, 2020, 2:20:35 PM     7
domain  inztaqram.ga        Feb 5, 2020, 2:20:35 PM     8
domain  unirsd.com      Feb 5, 2020, 2:20:35 PM     8
domain  phonechallenges-submit.site     Feb 5, 2020, 2:20:35 PM     7
domain  acconut-verify.com      Feb 5, 2020, 2:20:35 PM     11
domain  finance-usbnc.info      Feb 5, 2020, 2:20:35 PM     8
FileHash-MD5    542128ab98bda5ea139b169200a50bce        Feb 5, 2020, 2:20:35 PM     3
FileHash-MD5    3d67ce57aab4f7f917cf87c724ed7dab        Feb 5, 2020, 2:20:35 PM     3
hostname    x09live-ix3b.account-profile-users.info     Feb 6, 2020, 2:56:07 PM     0
hostname    www.phonechallenges-submit.site     Feb 6, 2020, 2:56:07 PM
[('ipv4', ['1.2.3.4']),
('linux_path',
['//two-step-checkup.site/securemail/secureLogin/challenge/url?ucode=d50a3eb1-9a6b-45a8-8389-d5203bbddaa1&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;service=mailservice&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;type=passwordttFeb',

'/usr/localbzzzzzz.sh']),

('md5_hash',

['3d67ce57aab4f7f917cf87c724ed7dab', '542128ab98bda5ea139b169200a50bce'])]

Pandas Extension

The decoding functionality is also available in a pandas extension mp_ioc. This supports a single method extract().

This supports the same syntax as extract (described earlier).

process_tree.mp_ioc.extract(columns=['CommandLine'])
IoCType Observable SourceIndex
0 dns microsoft.com 24
1 url http://server/file.sct 31
2 dns server 31
3 dns evil.ps 35
4 url http://somedomain/best-kitten-names-1.jpg' 37
5 dns somedomain 37
6 dns blah.ps 40
7 md5_hash aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 40
8 dns blah.ps 41
9 md5_hash aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 41
10 md5_hash 81ed03caf6901e444c72ac67d192fb9c 44
11 url http://badguyserver/pwnme 46
12 dns badguyserver 46
13 url http://badguyserver/pwnme 47
14 dns badguyserver 47
15 dns Invoke-Shellcode.ps 48
16 dns Invoke-ReverseDnsLookup.ps 49
17 dns Wscript.Shell 67
18 url http://system.management.automation.amsiutils').getfield('amsiinitfailed','nonpublic,static').s... 77
19 dns system.management.automation.amsiutils').getfield('amsiinitfailed','nonpublic,static').setvalue(... 77
20 ipv4 1.2.3.4 78
21 dns wscript.shell 81
22 dns abc.com 90
23 ipv4 127.0.0.1 102
24 url http://127.0.0.1/ 102
25 win_named_pipe \.pipeblahtest" 107

Note

the URLs in the previous table have been altered to prevent inadvertent navigation to them.