# OpenSearch Examples Notebook

The following notebook contains examples on various ways to connect, and interact with OpenSearch from a Jupyter Notebook. 

Please be aware, that this notebook is _NOT_ intented to work from top to bottom. 

Please read each of the notes above the code block for what the code does and how it should be executed. 

See [Low-level Python Client](https://opensearch.org/docs/latest/clients/python-low-level/) for further details.

## Install and Import
First, we need to install the OpenSearch pip packages. For best results, run this within a venv virtual environment. 

For the imports, this block imports everything necessary for you to run the whole notebook, so if you are taking this code into your own notebook, make sure you check if it is all needed (i.e. you don't need scan for SQL transactions)

In [None]:
#%%capture
%pip install --upgrade opensearch-py 

In [None]:
#Perform the necessary imports.
from opensearchpy import OpenSearch
from opensearchpy.helpers import scan

#Date
import datetime

#Password 
from getpass import getpass

#JSON
import json

#PPrint to help with debugging
from pprint import pprint

## Variables

Set the necessary variables for the script

In [None]:
hosts = [{"host": "os01", "port": 9200}] #Opensearch details
username = 'admin' #Username, there is always an admin account in DFIR2Go
password = getpass() #Prompt the user for the password 
auth = (username, password) #Create the authentication details

ca_certs_path = '/certs/ca/ca.crt' #Certs path

## Establish a connection to OpenSearch Server

The following block of code establishes the connection to the OpenSearch server and presents a client object ready to be used to interact with OpenSearch. 

**WARNING** For functioning code, you will need to have run all of the blocks above so far. (Some edits might be necessary), after this block, you just want to run the blocks relevant to how you want to interact with OpenSearch. (i.e. just run the SQL parts. )

In [None]:
client = OpenSearch(
    hosts=hosts, 
    http_compress=True,
    http_auth=auth,
    use_ssl=True,
    timeout=300,
    verify_certs=True,
    ssl_assert_hostname=False,
    ssl_show_warn=False,
    ca_certs=ca_certs_path        
)


# DQL Searching
The following block perform a search on the data using DQL formatted search. DQL is very powerful, but building the queries is complex. 

In [None]:
index = 'artifact_windows_sysinternals_sysmonlogforward*' #The index you want to search.

#The query is the search to perform. The following will return an entire index. 
#To assist in building your query, you can use the developer tools in the browser. 
#When performing a search, review the submitted webpage which will contain a query value which can be used as a starting point.
query = {
    "size": "10000",  
    "timeout": "300s",
    "query": {
        "bool": {
            "must": [],
            "filter": [
              {
                "match_all": {}
              }
            ],
            "should": [],
            "must_not": []
          }
    }
}

#Use Scan function to interact with OpenSearch, once the results have been returned, loop through all of them printing the _source component.
for results in scan(client, query=query, index=index):
    #Print the record
    print(results["_source"])

# Creating an Index and adding data

The next set of blocks will create an index and then use a loop to add documents into that index.  

In [None]:
index_name = 'demo_index' #Name of the index to be created.
index_body = {
    'settings': {
        'index': {
            'number_of_shards': 1
        }
    }
}

#Create the index in OpenSearch
response = client.indices.create(index_name, body=index_body)

#Use a loop to put lots of data into the index
for number in range(0,100):
    #Create a document, add all the fields you want to add to the index here within the document dictionary
    document = {
        'TimeCreated': datetime.datetime.now(),
        'Number': number,
        'Text': 'Some Text'
    }

    #Add the data into OpenSearch
    response = client.index(
        index = index_name,
        body = document,
        id = number,
        refresh = True
    )

# SQL Searching

The next block of code will interact with the database using SQL

In [None]:
#Define the query
query = "SELECT * FROM artifact_windows_sysinternals_sysmonlogforward"
schema = []
results = []

#Perform the search on the OpenSearch Data using client.transport.perform_request and providing the query into the body
response = client.transport.perform_request(
    'POST', 
    '/_plugins/_sql?format=json',
    body={'query': query})

#If we want the data in JSON, dump it as a JSON string
pprint(json.dumps(response))

#Print the data
for entry in response['hits']['hits']:
    if entry['_source']['ID'] == 1:
        print(entry['_source']['EventData']['CommandLine'])
    elif entry['_source']['ID'] == 6:
        print(entry['_source']['EventData']['ImageLoaded'])
    