# Day 1 (Part 2): Acquiring Data

## I) Connecting to Databases

### **a. Elasticsearch**

- Importing libraries:

In [17]:
# Elasticsearch connector
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
# Data manipulation
import pandas as pd

ModuleNotFoundError: No module named 'elasticsearch'

- Initializing an Elasticsearch client:

    Initialize an Elasticsearch client using a specific Elasticsearch URL. Next, you can pass the client to the Search object that we will use to represent the search request in a little bit.

In [None]:
es = Elasticsearch(['http://<elasticsearch-ip>:9200'])
searchContext = Search(using=es, index='logs-*', doc_type='doc')

- Setting the query search context:

    In addition, we will need to use the query class to pass an Elasticsearch query_string . For example, what if I want to query event_id 1 events?.

In [None]:
s = searchContext.query('query_string', query='event_id:1')

- Running query & Exploring response:

    Finally, you can run the query and get the results back as a DataFrame.

In [None]:
response = s.execute()

if response.success():
    df = pd.DataFrame((d.to_dict() for d in s.scan()))

df

### **b. Splunk**

- Importing libraries:

In [None]:
from huntlib.splunk import SplunkDF

- Running query & Exploring response

In [None]:
df = s.search_df(
  spl="search index=win_events EventCode=4688",
  start_time="-2d@d",
  end_time="@d"
)

### **c. Sqlite**

- Installing libraries:

In [None]:
!pip install ipython-sql

- Loading library

In [1]:
%%capture
%load_ext sql

- Connecting to database

In [2]:
%sql sqlite:///../data/browser2.db

- Executing queries

In [3]:
%%sql
SELECT
    name
FROM
    sqlite_master
WHERE
    type='table';

 * sqlite:///../data/browser2.db
Done.


name
android_metadata
bookmarks
sqlite_sequence
history
images
searches
settings
thumbnails
_sync_state
_sync_state_metadata


In [4]:
%%sql
SELECT * FROM history;

 * sqlite:///../data/browser2.db
Done.


_id,title,url,created,date,visits,user_entered
1,Google,http://www.google.es/?gfe_rd=cr&dcr=0&ei=_OtlWu_eK5St8wespoTYBg,0,1516628986890,2,0
2,mobile congress - Buscar con Google,http://www.google.es/search?dcr=0&source=hp&ei=_OtlWqfYOIH-UPnHrqAH&sjs=16383&q=mobile+congress&oq=mobile+congress&gs_l=mobile-gws-hp.3..0l5.9699.17759..20649.......143.1677.2j13............mobile-gws-wiz-hp.....0..0i131.9Koqktw5naA%3D,0,1516629009457,1,0
3,Home | Mobile World Congress,https://www.mobileworldcongress.com/,0,1516629026677,3,0
4,Google,https://www.google.es/webhp?source=android-home&gws_rd=cr&dcr=0&ei=LexlWsqTBMXvUOOsg6gF,0,1516629037678,2,0
5,apk mirror - Buscar con Google,https://www.google.es/search?source=android-home&dcr=0&source=hp&ei=LexlWvvkIIzXUbDmn8gB&sjs=16383&q=apk+mirror&oq=apk+mirror&gs_l=mobile-gws-hp.3..0l5.4318.7753..7951.......136.1152.1j9............mobile-gws-wiz-hp.....0..0i131j0i10.GOGxmTJuhIg%3D,0,1516629047435,1,0
6,APKMirror - Free APK Downloads - Download Free Android APKs #APKPLZ,https://www.apkmirror.com/,0,1516629052435,2,0
7,"Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more",https://www.amazon.com/,0,1516629109159,2,0
8,http://192.168.74.128/i6ADxOqMEyyI,http://192.168.74.128/i6ADxOqMEyyI,0,1516629327266,1,0
9,http://192.168.74.128/i6ADxOqMEyyI/EeMVfx/,http://192.168.74.128/i6ADxOqMEyyI/EeMVfx/,0,1516629327667,1,0
10,apks - Google Search,http://www.google.es/search?hl=en&source=android-browser-type&q=apks&gws_rd=cr&dcr=0&ei=dO1lWoSxGMjSUYrwkpgG,0,1516629364685,2,0


- Save query results in a Pandas DataFrame

In [5]:
df = _.DataFrame()

In [6]:
df

Unnamed: 0,_id,title,url,created,date,visits,user_entered
0,1,Google,http://www.google.es/?gfe_rd=cr&dcr=0&ei=_OtlW...,0,1516628986890,2,0
1,2,mobile congress - Buscar con Google,http://www.google.es/search?dcr=0&source=hp&ei...,0,1516629009457,1,0
2,3,Home | Mobile World Congress,https://www.mobileworldcongress.com/,0,1516629026677,3,0
3,4,Google,https://www.google.es/webhp?source=android-hom...,0,1516629037678,2,0
4,5,apk mirror - Buscar con Google,https://www.google.es/search?source=android-ho...,0,1516629047435,1,0
5,6,APKMirror - Free APK Downloads - Download Free...,https://www.apkmirror.com/,0,1516629052435,2,0
6,7,"Amazon.com: Online Shopping for Electronics, A...",https://www.amazon.com/,0,1516629109159,2,0
7,8,http://192.168.74.128/i6ADxOqMEyyI,http://192.168.74.128/i6ADxOqMEyyI,0,1516629327266,1,0
8,9,http://192.168.74.128/i6ADxOqMEyyI/EeMVfx/,http://192.168.74.128/i6ADxOqMEyyI/EeMVfx/,0,1516629327667,1,0
9,10,apks - Google Search,http://www.google.es/search?hl=en&source=andro...,0,1516629364685,2,0


### **d. Log analytics workspace**

- Importing libraries:

### **e. M365 advanced hunting APIs**

- Importing libraries:

### **f. MSTICPy**

MSTICPy isn't a data source - just wraps a bunch of data sources in common API.

Currently supports:
- MS Sentinel (aka Azure Sentinel, Log Analytics), MS Defender, MS Graph, Azure Resource Graph
- Splunk
- Sumologic
- Local data (CSV or Pickle) - would be easy to add supported pandas formats (any requests?)
- Experimental support for Kusto/Azure Data explorer

Typical usage:
- import QueryProvider class (or `import msticpy; msticpy.init_notebook(globals())`):
- run `connect()` method - params vary (e.g. connection string)
- pre-defined, parameterized queries appear as methods of the query_provider class.
- 

In [34]:
# To install
# %pip install msticpy

# Alternative import - init_notebook imports QueryProvider and a bunch of other stuff
# import msticpy
# msticpy.init_notebook(globals())

from msticpy.data import QueryProvider
sentinel_prov = QueryProvider("AzureSentinel")

local_prov = QueryProvider("LocalData")

Please wait. Loading Kqlmagic extension...done


#### Accessing queries as functions
(usually need to connect before running one)

In [None]:
sentinel_prov.

In [35]:
sentinel_prov.browse()


VBox(children=(Text(value='', description='Filter:', style=DescriptionStyle(description_width='initial')), Sel…

In [42]:
sentinel_prov.connect(
    "loganalytics://code().tenant('72f988bf-86f1-41af-91ab-2d7cd011db47').workspace('8ecf8077-cf51-4820-aadd-14040956f35d')"
)

Connecting... 

connected


## II) Collecting Security Datasets

### **a. Reading Local Files with Pandas**

#### **1. Reading Pickle files**

- Importing libraries for data manipulation.

In [8]:
import pandas as pd

- Reading Pickle file using [read_pickle](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_pickle.html) Pandas method.

In [10]:
logons_full_df = pd.read_pickle("../data/host_logons.pkl")

- Exploring pickle file.

In [11]:
logons_full.head()

Unnamed: 0,TenantId,Account,EventID,TimeGenerated,SourceComputerId,Computer,SubjectUserName,SubjectDomainName,SubjectUserSid,TargetUserName,TargetDomainName,TargetUserSid,TargetLogonId,LogonProcessName,LogonType,AuthenticationPackageName,Status,IpAddress,WorkstationName,TimeCreatedUtc
0,52b1ab41-869e-4138-9e40-2a4457f09bf0,NT AUTHORITY\SYSTEM,4624,2019-02-12 04:56:34.307,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,MSTICAlertsWin1$,WORKGROUP,S-1-5-18,SYSTEM,NT AUTHORITY,S-1-5-18,0x3e7,Advapi,5,Negotiate,,-,-,2019-02-12 04:56:34.307
1,52b1ab41-869e-4138-9e40-2a4457f09bf0,MSTICAlertsWin1\MSTICAdmin,4624,2019-02-12 04:37:25.340,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,-,-,S-1-0-0,MSTICAdmin,MSTICAlertsWin1,S-1-5-21-996632719-2361334927-4038480536-500,0xc90e957,NtLmSsp,3,NTLM,,131.107.147.209,IANHELLE-DEV17,2019-02-12 04:37:25.340
2,52b1ab41-869e-4138-9e40-2a4457f09bf0,MSTICAlertsWin1\MSTICAdmin,4624,2019-02-12 04:37:27.997,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,-,-,S-1-0-0,MSTICAdmin,MSTICAlertsWin1,S-1-5-21-996632719-2361334927-4038480536-500,0xc90ea44,NtLmSsp,3,NTLM,,131.107.147.209,IANHELLE-DEV17,2019-02-12 04:37:27.997
3,52b1ab41-869e-4138-9e40-2a4457f09bf0,MSTICAlertsWin1\MSTICAdmin,4624,2019-02-12 04:38:16.550,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,-,-,S-1-0-0,MSTICAdmin,MSTICAlertsWin1,S-1-5-21-996632719-2361334927-4038480536-500,0xc912d62,NtLmSsp,3,NTLM,,131.107.147.209,IANHELLE-DEV17,2019-02-12 04:38:16.550
4,52b1ab41-869e-4138-9e40-2a4457f09bf0,MSTICAlertsWin1\MSTICAdmin,4624,2019-02-12 04:38:21.370,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,-,-,S-1-0-0,MSTICAdmin,MSTICAlertsWin1,S-1-5-21-996632719-2361334927-4038480536-500,0xc913737,NtLmSsp,3,NTLM,,131.107.147.209,IANHELLE-DEV17,2019-02-12 04:38:21.370


#### **2. Reading Comma Separated Values (CSV) files**

- Importing libraries for data manipulation.

In [2]:
import pandas as pd

- Reading CSV file using [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) Pandas method.

In [3]:
attck_event_mappings = pd.read_csv(filepath_or_buffer="../data/attack_events_mapping.csv")

- Exploring CSV file.

In [4]:
attck_event_mappings.head()

Unnamed: 0,Data Source,Component,Source,Relationship,Target,EventID,Event Name,Event Platform,Log Provider,Log Channel,Audit Category,Audit Sub-Category,Enable Commands,GPO Audit Policy
0,User Account,user account authentication,user,attempted to authenticate from,port,4624,An account was successfully logged on.,Windows,Microsoft-Windows-Security-Auditing,Security,Logon/Logoff,Logon,auditpol /set /subcategory:Logon /success:enab...,Computer Configuration -> Windows Settings -> ...
1,User Account,user account authentication,user,attempted to authenticate from,port,4625,An account failed to log on.,Windows,Microsoft-Windows-Security-Auditing,Security,Logon/Logoff,Account Lockout,auditpol /set /subcategory:Account Lockout /su...,Computer Configuration -> Windows Settings -> ...
2,User Account,user account authentication,user,attempted to authenticate from,port,4648,A logon was attempted using explicit credentials.,Windows,Microsoft-Windows-Security-Auditing,Security,Logon/Logoff,Logon,auditpol /set /subcategory:Logon /success:enab...,Computer Configuration -> Windows Settings -> ...
3,User Account,user account authentication,user,attempted to authenticate from,port,LogonSuccess,LogonSuccess,Windows,Microsoft Defender for Endpoint,DeviceLogonEvents,,,,
4,User Account,user account creation,user,created,user,4720,A user account was created.,Windows,Microsoft-Windows-Security-Auditing,Security,Account Management,User Account Management,auditpol /set /subcategory:User Account Manage...,Computer Configuration -> Windows Settings -> ...


#### **3. Reading JavaScript Object Notation (JSON) files**

- Importing libraries for data manipulation.

In [5]:
from pandas.io import json

- Reading JSON file using [json.read_json](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.read_json.html) Pandas method.

### **b. Reading OTR-SecurityDatasets**

- Importing libraries.

In [12]:
# Generate HTTP request
import requests
# Zip file object manipulation
from zipfile import ZipFile
# Byte data manipulations
from io import BytesIO
# Read JSON file
from pandas.io import json

- Making an HTTP request using [get](https://docs.python-requests.org/en/latest/user/quickstart/#make-a-request) method.

In [15]:
url = 'https://raw.githubusercontent.com/OTRF/Security-Datasets/master/datasets/atomic/windows/discovery/host/empire_shell_net_localgroup_administrators.zip'
zipFileRequest = requests.get(url)

- Opening zip file after reading request content (In [bytes](https://docs.python.org/3/library/io.html#binary-i-o) format) using the [ZipFile](https://docs.python.org/3/library/zipfile.html#zipfile-objects) class.

In [16]:
zipFile = ZipFile(BytesIO(zipFileRequest.content))

- Extracting first file within the compressed folder using the [extract](https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.extract) method. This method returns the normalized path to the JSON file.

In [19]:
datasetJSONPath = zipFile.extract(zipFile.namelist()[0], path = '../data')

print(datasetJSONPath)

../data/empire_shell_net_localgroup_administrators_2020-09-21191843.json


- Reading JSON file using the [read_json](https://pandas.pydata.org/docs/reference/api/pandas.io.json.read_json.html) method.

In [21]:
dataset = json.read_json(path_or_buf = datasetJSONPath, lines=True)

- Exploring dataset using the [head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) method.

In [22]:
dataset.head(n=1)

Unnamed: 0,Keywords,SeverityValue,TargetObject,EventTypeOrignal,EventID,ProviderGuid,ExecutionProcessID,host,Channel,UserID,...,SourceIsIpv6,DestinationPortName,DestinationHostname,Service,Details,ShareName,EnabledPrivilegeList,DisabledPrivilegeList,ShareLocalPath,RelativeTargetName
0,-9223372036854775808,2,HKU\S-1-5-21-4228717743-1032521047-1810997296-...,INFO,12,{5770385F-C22A-43E0-BF4C-06F5698FFBD9},3172,wec.internal.cloudapp.net,Microsoft-Windows-Sysmon/Operational,S-1-5-18,...,,,,,,,,,,


### **c. Using MSTICPy to access Security Datasets**


In [18]:
#%pip install msticpy

import pandas as pd
from msticpy.data import QueryProvider
from msticpy.vis import mp_pandas_plot


qry_prov = QueryProvider("Mordor")

In [2]:
qry_prov.connect()

Retrieving Mitre data...
Retrieving Mordor data...


Downloading Mordor metadata: 100%|██████████| 95/95 [00:26<00:00,  3.63 files/s]


In [25]:
qry_prov.list_queries()

[]

In [24]:
qry_prov.search_queries("empire + localgroup")

AttributeError: search_queries is not a valid attribute.

In [9]:
emp_df = qry_prov.atomic.windows.discovery.host.empire_shell_net_localgroup_administrators()
emp_df.head()

https://raw.githubusercontent.com/OTRF/Security-Datasets/master/datasets/atomic/windows/discovery/host/empire_shell_net_localgroup_administrators.zip
Extracting empire_shell_net_localgroup_administrators_2020-09-21191843.json


Unnamed: 0,Keywords,SeverityValue,TargetObject,EventTypeOrignal,EventID,ProviderGuid,ExecutionProcessID,host,Channel,UserID,...,SourceIsIpv6,DestinationPortName,DestinationHostname,Service,Details,ShareName,EnabledPrivilegeList,DisabledPrivilegeList,ShareLocalPath,RelativeTargetName
0,-9223372036854775808,2,HKU\S-1-5-21-4228717743-1032521047-1810997296-...,INFO,12,{5770385F-C22A-43E0-BF4C-06F5698FFBD9},3172,wec.internal.cloudapp.net,Microsoft-Windows-Sysmon/Operational,S-1-5-18,...,,,,,,,,,,
1,0,2,,,4103,{A0C1853B-5C40-4B15-8766-3CF1C58F985A},7456,wec.internal.cloudapp.net,Microsoft-Windows-PowerShell/Operational,S-1-5-21-4228717743-1032521047-1810997296-1104,...,,,,,,,,,,
2,0,2,,,4103,{A0C1853B-5C40-4B15-8766-3CF1C58F985A},7456,wec.internal.cloudapp.net,Microsoft-Windows-PowerShell/Operational,S-1-5-21-4228717743-1032521047-1810997296-1104,...,,,,,,,,,,
3,-9214364837600034816,2,,,5158,{54849625-5478-4994-A5BA-3E3B0328C30D},4,wec.internal.cloudapp.net,Security,,...,,,,,,,,,,
4,-9214364837600034816,2,,,5156,{54849625-5478-4994-A5BA-3E3B0328C30D},4,wec.internal.cloudapp.net,Security,,...,,,,,,,,,,


In [19]:
emp_df["EventTime"] = pd.to_datetime(emp_df["EventTime"])

In [21]:
emp_df.mp_plot.timeline(time_column="EventTime", group_by="EventID")

### Security Datasets Browser

- Browser properties
- Filter by MITRE Tactic/Technique
- Search across metadata, file names
- Download selected datasets

In [27]:
from msticpy.data.browsers.mordor_browser import MordorBrowser
m_browser = MordorBrowser()

Retrieving Mitre data...
Retrieving Mordor data...


VBox(children=(VBox(children=(HTML(value='<h2>Mordor dataset browser</h2>'), Select(description='Data sets', l…

Unnamed: 0,Keywords,SeverityValue,Application,EventID,ProviderGuid,ExecutionProcessID,Channel,host,EventReceivedTime,ProcessId,...,AttributeSyntaxOID,OpCorrelationID,AttributeLDAPDisplayName,DSName,ObjectDN,AppCorrelationID,ObjectClass,AttributeValue,DSType,ObjectGUID
0,-9214364837600034816,2,\device\harddiskvolume2\windows\system32\svcho...,5156,{54849625-5478-4994-A5BA-3E3B0328C30D},4,Security,wec.internal.cloudapp.net,2020-09-21 13:09:44,5908,...,,,,,,,,,,
1,-9214364837600034816,2,,4689,{54849625-5478-4994-A5BA-3E3B0328C30D},4,Security,wec.internal.cloudapp.net,2020-09-21 13:09:44,0x2210,...,,,,,,,,,,
2,36028797018963968,2,,800,,0,Windows PowerShell,wec.internal.cloudapp.net,2020-09-21 13:09:44,,...,,,,,,,,,,
3,36028797018963968,2,,800,,0,Windows PowerShell,wec.internal.cloudapp.net,2020-09-21 13:09:44,,...,,,,,,,,,,
4,36028797018963968,2,,800,,0,Windows PowerShell,wec.internal.cloudapp.net,2020-09-21 13:09:44,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8997,-9223372036854775808,2,,12,{5770385F-C22A-43E0-BF4C-06F5698FFBD9},9848,Microsoft-Windows-Sysmon/Operational,wec.internal.cloudapp.net,2020-09-21 13:13:25,,...,,,,,,,,,,
8998,-9223372036854775808,2,,10,{5770385F-C22A-43E0-BF4C-06F5698FFBD9},9848,Microsoft-Windows-Sysmon/Operational,wec.internal.cloudapp.net,2020-09-21 13:13:25,,...,,,,,,,,,,
8999,-9223372036854775808,2,,10,{5770385F-C22A-43E0-BF4C-06F5698FFBD9},9848,Microsoft-Windows-Sysmon/Operational,wec.internal.cloudapp.net,2020-09-21 13:13:25,,...,,,,,,,,,,
9000,-9223372036854775808,2,,10,{5770385F-C22A-43E0-BF4C-06F5698FFBD9},9848,Microsoft-Windows-Sysmon/Operational,wec.internal.cloudapp.net,2020-09-21 13:13:25,,...,,,,,,,,,,


### Downloaded data available in `browser.current_dataset`

In [31]:
m_browser.current_dataset.head(3)

Unnamed: 0,Keywords,SeverityValue,Application,EventID,ProviderGuid,ExecutionProcessID,Channel,host,EventReceivedTime,ProcessId,...,AttributeSyntaxOID,OpCorrelationID,AttributeLDAPDisplayName,DSName,ObjectDN,AppCorrelationID,ObjectClass,AttributeValue,DSType,ObjectGUID
0,-9214364837600034816,2,\device\harddiskvolume2\windows\system32\svcho...,5156,{54849625-5478-4994-A5BA-3E3B0328C30D},4,Security,wec.internal.cloudapp.net,2020-09-21 13:09:44,5908,...,,,,,,,,,,
1,-9214364837600034816,2,,4689,{54849625-5478-4994-A5BA-3E3B0328C30D},4,Security,wec.internal.cloudapp.net,2020-09-21 13:09:44,0x2210,...,,,,,,,,,,
2,36028797018963968,2,,800,,0,Windows PowerShell,wec.internal.cloudapp.net,2020-09-21 13:09:44,,...,,,,,,,,,,


### Cached datasets available in `browser.datasets`

In [29]:
m_browser.datasets

{'https://raw.githubusercontent.com/OTRF/Security-Datasets/master/datasets/atomic/windows/defense_evasion/host/empire_powerview_ldap_ntsecuritydescriptor.zip':                  Keywords  SeverityValue  \
 0    -9214364837600034816              2   
 1    -9214364837600034816              2   
 2       36028797018963968              2   
 3       36028797018963968              2   
 4       36028797018963968              2   
 ...                   ...            ...   
 8997 -9223372036854775808              2   
 8998 -9223372036854775808              2   
 8999 -9223372036854775808              2   
 9000 -9223372036854775808              2   
 9001 -9223372036854775808              2   
 
                                             Application  EventID  \
 0     \device\harddiskvolume2\windows\system32\svcho...     5156   
 1                                                   NaN     4689   
 2                                                   NaN      800   
 3                     