# Guided Hunting - Domain Generation Algorithm (DGA) Detection
<details>
    <summary><u>Details...</u></summary>
**Python Version:** Python 3.8 (including Python 3.8 - AzureML)<br>
**Required Packages**:  msticpy, pandas, numpy, matplotlib, plotly, ipywidgets, ipython, sklearn <br>

**Data Sources Required**:
- Log Analytics - DeviceNetworkEvents

</details>

Brings together a series of queries and visualizations to help you investigate anomalous processes in your network. There are then guided hunting steps to investigate these occurences in further dept. This notebook authenticates with environment variables and requires the following:
- msticpyconfig.yaml has been properly configured
- managed identity with appropriate RBAC

## Log in with Managed Identity
Replace the [CLIENT_ID] with the client id of your Managed Identity. This can be found on the Azure Portal at Managed Identities -> Overview

In [None]:
!az login --identity --username [CLIENT_ID]

## Import Libraries

In [None]:
import os
import msticpy
import msticpy as mp
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
from azure.keyvault.secrets import SecretClient
from azure.mgmt.resource import ResourceManagementClient


# Initialize ManagedIdentity
credential = ManagedIdentityCredential()


# Now you can use ManagedIdentity or other credential classes
print(credential)


## Setup msticpyconfig.yaml
Ensure your msticpyconfig.yaml has been set up and saved in the current directory you are running this notebook.

In [None]:
import msticpy
from msticpy.config import MpConfigFile, MpConfigEdit
import os
import json
from pathlib import Path

mp_conf = "msticpyconfig.yaml"

# check if MSTICPYCONFIG is already an env variable
mp_env = os.environ.get("MSTICPYCONFIG")
mp_conf = mp_env if mp_env and Path(mp_env).is_file() else mp_conf

if not Path(mp_conf).is_file():
    print(
        "No msticpyconfig.yaml was found!",
        "Please check that there is a config.json file in your workspace folder.",
        "If this is not there, go back to the Microsoft Sentinel portal and launch",
        "this notebook from there.",
        sep="\n"
    )
else:
    mpedit = MpConfigEdit(mp_conf)
    mpconfig = MpConfigFile(mp_conf)
    
    # Convert SettingsDict to a regular dictionary
    settings_dict = {k: v for k, v in mpconfig.settings.items()}
    print(f"Configured Sentinel workspaces: {json.dumps(settings_dict, indent=4)}")

msticpy.settings.refresh_config()


## Setup QueryProvider

In [None]:
# Refresh any config items that might have been saved
# to the msticpyconfig in the previous steps.
msticpy.settings.refresh_config()

# Initialize a QueryProvider for Microsoft Sentinel
qry_prov = mp.QueryProvider("AzureSentinel")

## Connect to Sentinel
You should see "connected" output after running this code block. Once you are connected, you can continue on with the notebook.

In [None]:
# Get the default Microsoft Sentinel workspace details from msticpyconfig.yaml

ws_config = mp.WorkspaceConfig()

# Connect to Microsoft Sentinel with our QueryProvider and config details
qry_prov.connect(ws_config, mp_az_auth=["msi"])

## DGA Model Creation
Make sure "domain.csv" is saved in your current working directory. Change the "model_filename" to the appropriate path in your environment.

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
import joblib
import os

# Load the CSV file containing the labeled domains
labeled_domains_df = pd.read_csv('/home/azureuser/cloudfiles/code/Users/jgraff1/domain.csv')

# Preprocess the data
X = labeled_domains_df['Domain']
y = labeled_domains_df['Label'].apply(lambda x: 1 if x == 'DGA' else 0)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

# Create a pipeline that combines the CountVectorizer and the MultinomialNB classifier
model = make_pipeline(CountVectorizer(), MultinomialNB())

# Train the model
model.fit(X_train, y_train)

# Save the trained model to a file
model_filename = '/home/azureuser/cloudfiles/code/Users/jgraff1/Models/dga_model.joblib'
joblib.dump(model, model_filename)
print(f'Model saved to {model_filename}')

# Evaluate the model (optional)
accuracy = model.score(X_test, y_test)
print(f'Model accuracy: {accuracy:.2f}')

## Apply dga_model.joblib to Sentinel Data

In [None]:
query = """
DeviceNetworkEvents
| where TimeGenerated < ago(30d)
| where ActionType == "DnsConnectionInspected"
| extend QueryField = tostring(parse_json(AdditionalFields).query)
| where isnotempty(QueryField)
| where QueryField matches regex @"[a-zA-Z0-9]{8,}"
| summarize Count = count() by QueryField
| where Count > 10
"""

# Set the maximum column width to None (no truncation)
pd.set_option('display.max_colwidth', None)
df = qry_prov.exec_query(query)

# Load the trained model from the file
model = joblib.load(model_filename)
print(f'Model loaded from {model_filename}')

# Define a function to check if a domain is associated with a DGA using the trained model
def is_dga(domain):
    return model.predict([domain])[0] == 1

# Apply the function to the "QueryField" column
df['IsDGA'] = df['QueryField'].apply(is_dga)

# Display the updated dataframe
df.head(20)

## Output All Results to CSV
Change the "output_path" variable to match your environment.

In [None]:
# Ensure the directory exists
output_path = '/home/azureuser/cloudfiles/code/Users/jgraff1/dgaresults.csv'
os.makedirs(os.path.dirname(output_path), exist_ok=True)

# Export the DataFrame to a CSV file in the specified file path
df.to_csv(output_path, index=False)

print(f"DataFrame has been exported to {output_path}")

## Filter DGA Results to CSV
Any results that match the DGA detection algorithm will be saved to a csv. Change the "output_path" to your environment


In [None]:
import os
import pandas as pd

# Assuming df is your DataFrame
# Filter the DataFrame to only include rows where isDGA is "true"
filtered_df = df[df['IsDGA'] == True]

# Ensure the directory exists
output_path = '/home/azureuser/cloudfiles/code/Users/jgraff1/dgaresults2.csv'
os.makedirs(os.path.dirname(output_path), exist_ok=True)

# Export the filtered DataFrame to a CSV file in the specified file path
filtered_df.to_csv(output_path, index=False)

print(f"Filtered DataFrame has been exported to {output_path}")

###