# Day 1 (Part 2): Acquiring Data

## I) Connecting to Databases

### **a. Elasticsearch**

- Importing libraries:

In [1]:
# Elasticsearch connector
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
# Data manipulation
import pandas as pd

- Initializing an Elasticsearch client:

    Initialize an Elasticsearch client using a specific Elasticsearch URL. Next, you can pass the client to the Search object that we will use to represent the search request in a little bit.

In [None]:
es = Elasticsearch(['http://<elasticsearch-ip>:9200'])
searchContext = Search(using=es, index='logs-*', doc_type='doc')

- Setting the query search context:

    In addition, we will need to use the query class to pass an Elasticsearch query_string . For example, what if I want to query event_id 1 events?.

In [None]:
s = searchContext.query('query_string', query='event_id:1')

- Running query & Exploring response:

    Finally, you can run the query and get the results back as a DataFrame.

In [None]:
response = s.execute()

if response.success():
    df = pd.DataFrame((d.to_dict() for d in s.scan()))

df

### **b. Splunk**

- Importing libraries:

### **c. Sqlite**

- Importing libraries:

### **d. Log analytics workspace**

- Importing libraries:

### **e. M365 advanced hunting APIs**

- Importing libraries:

### **f. MSTICPy**

- Importing libraries:

## II) Collecting Datasets

### **a. Reading Pickle files with Pandas**

- Importing libraries.

In [8]:
# Data manipulation
import pandas as pd

- Reading Pickle file using [read_pickle](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_pickle.html) Pandas method.

In [10]:
logons_full_df = pd.read_pickle("../data/host_logons.pkl")

- Exploring pickle file.

In [11]:
logons_full_df.head()

Unnamed: 0,TenantId,Account,EventID,TimeGenerated,SourceComputerId,Computer,SubjectUserName,SubjectDomainName,SubjectUserSid,TargetUserName,TargetDomainName,TargetUserSid,TargetLogonId,LogonProcessName,LogonType,AuthenticationPackageName,Status,IpAddress,WorkstationName,TimeCreatedUtc
0,52b1ab41-869e-4138-9e40-2a4457f09bf0,NT AUTHORITY\SYSTEM,4624,2019-02-12 04:56:34.307,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,MSTICAlertsWin1$,WORKGROUP,S-1-5-18,SYSTEM,NT AUTHORITY,S-1-5-18,0x3e7,Advapi,5,Negotiate,,-,-,2019-02-12 04:56:34.307
1,52b1ab41-869e-4138-9e40-2a4457f09bf0,MSTICAlertsWin1\MSTICAdmin,4624,2019-02-12 04:37:25.340,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,-,-,S-1-0-0,MSTICAdmin,MSTICAlertsWin1,S-1-5-21-996632719-2361334927-4038480536-500,0xc90e957,NtLmSsp,3,NTLM,,131.107.147.209,IANHELLE-DEV17,2019-02-12 04:37:25.340
2,52b1ab41-869e-4138-9e40-2a4457f09bf0,MSTICAlertsWin1\MSTICAdmin,4624,2019-02-12 04:37:27.997,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,-,-,S-1-0-0,MSTICAdmin,MSTICAlertsWin1,S-1-5-21-996632719-2361334927-4038480536-500,0xc90ea44,NtLmSsp,3,NTLM,,131.107.147.209,IANHELLE-DEV17,2019-02-12 04:37:27.997
3,52b1ab41-869e-4138-9e40-2a4457f09bf0,MSTICAlertsWin1\MSTICAdmin,4624,2019-02-12 04:38:16.550,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,-,-,S-1-0-0,MSTICAdmin,MSTICAlertsWin1,S-1-5-21-996632719-2361334927-4038480536-500,0xc912d62,NtLmSsp,3,NTLM,,131.107.147.209,IANHELLE-DEV17,2019-02-12 04:38:16.550
4,52b1ab41-869e-4138-9e40-2a4457f09bf0,MSTICAlertsWin1\MSTICAdmin,4624,2019-02-12 04:38:21.370,263a788b-6526-4cdc-8ed9-d79402fe4aa0,MSTICAlertsWin1,-,-,S-1-0-0,MSTICAdmin,MSTICAlertsWin1,S-1-5-21-996632719-2361334927-4038480536-500,0xc913737,NtLmSsp,3,NTLM,,131.107.147.209,IANHELLE-DEV17,2019-02-12 04:38:21.370


### **b. Reading OTR-SecurityDatasets**

- Importing libraries.

In [12]:
# Generate HTTP request
import requests
# Zip file object manipulation
from zipfile import ZipFile
# Byte data manipulations
from io import BytesIO
# Read JSON file
from pandas.io import json

- Making an HTTP request using [get](https://docs.python-requests.org/en/latest/user/quickstart/#make-a-request) method.

In [15]:
url = 'https://raw.githubusercontent.com/OTRF/Security-Datasets/master/datasets/atomic/windows/discovery/host/empire_shell_net_localgroup_administrators.zip'
zipFileRequest = requests.get(url)

- Opening zip file after reading request content (In [bytes](https://docs.python.org/3/library/io.html#binary-i-o) format) using the [ZipFile](https://docs.python.org/3/library/zipfile.html#zipfile-objects) class.

In [16]:
zipFile = ZipFile(BytesIO(zipFileRequest.content))

- Extracting first file within the compressed folder using the [extract](https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.extract) method. This method returns the normalized path to the JSON file.

In [19]:
datasetJSONPath = zipFile.extract(zipFile.namelist()[0], path = '../data')

print(datasetJSONPath)

../data/empire_shell_net_localgroup_administrators_2020-09-21191843.json


- Reading JSON file using the [read_json](https://pandas.pydata.org/docs/reference/api/pandas.io.json.read_json.html) method.

In [21]:
dataset = json.read_json(path_or_buf = datasetJSONPath, lines=True)

- Exploring dataset using the [head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) method.

In [22]:
dataset.head(n=1)

Unnamed: 0,Keywords,SeverityValue,TargetObject,EventTypeOrignal,EventID,ProviderGuid,ExecutionProcessID,host,Channel,UserID,...,SourceIsIpv6,DestinationPortName,DestinationHostname,Service,Details,ShareName,EnabledPrivilegeList,DisabledPrivilegeList,ShareLocalPath,RelativeTargetName
0,-9223372036854775808,2,HKU\S-1-5-21-4228717743-1032521047-1810997296-...,INFO,12,{5770385F-C22A-43E0-BF4C-06F5698FFBD9},3172,wec.internal.cloudapp.net,Microsoft-Windows-Sysmon/Operational,S-1-5-18,...,,,,,,,,,,


### **c. MSTICPy Security Datasets**

- Importing libraries.