# 🔍 Log Analysis with Musashi LogAnalyzer
This notebook runs `LogAnalyzer` to process EVTX, CSV, JSON logs, extract Sigma rules, and detect suspicious activity


## 📘 Installation of Libraries and Repo

In [None]:
!git clone https://github.com/aj-tap/taptaptools

In [None]:
!pip install -e taptaptools/

In [None]:
# Optional for AI prompting
!pip install llama-index llama-index-experimental llama_index.llms.gemini llama-index-llms-openai #llama-index-llms-huggingface

In [3]:
from core.musashi import *

## 📂 Log Analyzer

In [4]:
# Run the log analyzer

input_path = '/content/taptaptools/tests/evtx-attack-samples/Persistence'  # @param {type: "string"}
output_path = '/content/taptaptools/tests'  # @param {type: "string"}
sigmarule_path = '/content/taptaptools/sigma_all_rules/rules/windows' # @param {type: "string"}
logtype = "winevtx" # @param ["winevtx", "defender", "azure"]
# threads = 5 # @param {type: "slider", min: 2, max: 10}

log_analyzer = LogAnalyzer(input_path, output_path, logtype, sigmarule_path, num_threads=None)

Input Log: /content/taptaptools/tests/evtx-attack-samples/Persistence
Output path results: /content/taptaptools/tests
Using Log Format: winevtx
Using Sigma rules path: /content/taptaptools/sigma_all_rules/rules/windows/**/*.yml
Using 2 threads
Initializing the lake...
Removing existing datalake/ directory...
Lake initialized and SuperDB server started.

Loading logs from: /content/taptaptools/tests/evtx-attack-samples/Persistence
Found 22 log files to process.
Parsing EVTX: /content/taptaptools/tests/evtx-attack-samples/Persistence/Persistence_Shime_Microsoft-Windows-Application-Experience_Program-Telemetry_500.evtx
Parsing EVTX: /content/taptaptools/tests/evtx-attack-samples/Persistence/persistence_sysmon_11_13_1_shime_appfix.evtx
EVTX processed: /content/taptaptools/tests/evtx-attack-samples/Persistence/Persistence_Shime_Microsoft-Windows-Application-Experience_Program-Telemetry_500.evtx (7 records)
Parsing EVTX: /content/taptaptools/tests/evtx-attack-samples/Persistence/persist_turl

 ## 🚀 Perform Detection

In [10]:
log_analyzer.execute()

Sigma Rules are already up to date! (Version r2025-02-03)
Number of rules loaded: 2115

=== Running Detections ===
Sigma Rule Triggered: Exports Registry Key To an Alternate Data Stream (3 event matches)
Sigma Rule Triggered: WMI Event Consumer Created Named Pipe (1 event matches)
Sigma Rule Triggered: WMI Event Subscription (8 event matches)
Sigma Rule Triggered: DNS Query Request By Regsvr32.EXE (2 event matches)
Sigma Rule Triggered: Suspicious Msbuild Execution By Uncommon Parent Process (1 event matches)
Sigma Rule Triggered: Bad Opsec Defaults Sacrificial Processes With Improper Arguments (1 event matches)
Sigma Rule Triggered: System File Execution Location Anomaly (30 event matches)
Sigma Rule Triggered: PsExec Service Execution (2 event matches)
Sigma Rule Triggered: Potential Shim Database Persistence via Sdbinst.EXE (11 event matches)
Sigma Rule Triggered: Rundll32 Execution Without CommandLine Parameters (1 event matches)
Sigma Rule Triggered: Regsvr32 DLL Execution With Su

  merged_df = pd.concat(all_results, ignore_index=True)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


All Sigma results merged and saved to /content/taptaptools/tests/all-sigma-results.csv
Detection process completed.
Epoch 1/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 88ms/step - loss: 0.0888 - val_loss: 0.0762
Epoch 2/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step - loss: 0.0762 - val_loss: 0.0719
Epoch 3/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 31ms/step - loss: 0.0709 - val_loss: 0.0699
Epoch 4/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - loss: 0.0665 - val_loss: 0.0691
Epoch 5/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step - loss: 0.0650 - val_loss: 0.0686
Epoch 6/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 26ms/step - loss: 0.0602 - val_loss: 0.0684
Epoch 7/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - loss: 0.0572 - val_loss: 0.0686
Epoch 8/10
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1

## 📊 Analysis of Results

### Display Sigma Results

In [11]:
log_analyzer.get_pool_data(log_type='sigma')

Unnamed: 0,#attributes_xmlns,System_Provider_#attributes_Name,System_Provider_#attributes_Guid,System_EventID,System_Version,System_Level,System_Task,System_Opcode,System_Keywords,System_TimeCreated_#attributes_SystemTime,...,EventData_DestinationIp,EventData_DestinationHostname,EventData_DestinationPort,EventData_DestinationPortName,EventData_TargetFilename,EventData_CreationUtcTime,EventData_ImageLoaded,EventData_Signed,EventData_Signature,EventData_SignatureStatus
0,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,3,5,4,3,0,0x8000000000000000,2019-05-13T18:03:21.212898Z,...,151.101.128.133,,443.0,https,,,,,,
1,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,11,2,4,11,0,0x8000000000000000,2019-03-19T20:41:14.563441Z,...,,,,,C:\Windows\Tasks\SA.DAT,2009-07-14 04:53:47.031,,,,
2,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,11,2,4,11,0,0x8000000000000000,2019-03-19T20:46:25.856126Z,...,,,,,C:\Windows\System32\Tasks\Microsoft\Windows De...,2009-07-14 04:42:30.134,,,,
3,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,11,2,4,11,0,0x8000000000000000,2019-03-19T23:18:53.501883Z,...,,,,,C:\Windows\Tasks\SA.DAT,2009-07-14 04:53:47.031,,,,
4,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,11,2,4,11,0,0x8000000000000000,2019-03-19T23:24:08.294533Z,...,,,,,C:\Windows\System32\Tasks\Microsoft\Windows De...,2009-07-14 04:42:30.134,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
56,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,1,5,4,1,0,0x8000000000000000,2019-03-19T21:22:28.886411Z,...,,,,,,,,,,
57,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,1,5,4,1,0,0x8000000000000000,2019-03-19T23:18:58.288766Z,...,,,,,,,,,,
58,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,1,5,4,1,0,0x8000000000000000,2019-04-03T18:12:00.016862Z,...,,,,,,,,,,
59,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,1,5,4,1,0,0x8000000000000000,2019-05-13T18:03:19.681478Z,...,,,,,,,,,,




### Anomaly Detections

In [13]:
log_analyzer.get_pool_data(log_type='anomalies')

Unnamed: 0,#attributes_xmlns,System_Provider_#attributes_Name,System_Provider_#attributes_Guid,System_EventID,System_Version,System_Level,System_Task,System_Opcode,System_Keywords,System_TimeCreated_#attributes_SystemTime,...,UserData_CompatibilityFixEvent_FixID,UserData_CompatibilityFixEvent_Flags,UserData_CompatibilityFixEvent_ExePath,UserData_CompatibilityFixEvent_FixName,Anomaly_Autoencoder,Anomaly_IsolationForest,Anomaly_OCSVM,Anomaly_LOF,Anomaly_PCA,Anomaly_Ensemble
0,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,3,5,4,3,0,0x8000000000000000,2019-05-13T18:03:21.212898Z,...,,,,,1,0,1,1,1,1
1,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,10,3,4,10,0,0x8000000000000000,2019-05-21T00:35:07.474160Z,...,,,,,1,1,1,1,1,1
2,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Sysmon,5770385F-C22A-43E0-BF4C-06F5698FFBD9,10,3,4,10,0,0x8000000000000000,2019-05-21T00:35:07.474160Z,...,,,,,1,1,1,1,0,1
3,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Eventlog,{fc65ddd8-d6ef-4962-83d5-6e5cfe9ce148},1102,0,4,104,0,0x4020000000000000,2019-03-25T21:28:11.073626Z,...,,,,,0,1,0,1,1,1
4,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Bits-Client,EF1CC15B-46C1-414E-BB95-E76B077BD51E,60,1,4,0,2,0x4000000000000000,2019-05-12T18:04:50.137072Z,...,,,,,1,1,1,1,1,1
5,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Bits-Client,EF1CC15B-46C1-414E-BB95-E76B077BD51E,60,1,4,0,2,0x4000000000000000,2019-05-13T14:50:59.405301Z,...,,,,,1,1,1,1,1,1
6,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Bits-Client,EF1CC15B-46C1-414E-BB95-E76B077BD51E,59,1,4,0,1,0x4000000000000000,2019-05-12T18:04:50.121447Z,...,,,,,1,1,1,1,1,1
7,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Bits-Client,EF1CC15B-46C1-414E-BB95-E76B077BD51E,59,1,4,0,1,0x4000000000000000,2019-05-13T14:50:59.389676Z,...,,,,,1,1,0,1,1,1
8,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Winsock-WS2HELP,D5C25F9A-4D47-493E-9184-40DD397A004D,1,0,0,0,0,0x8000000000000000,2019-08-23T12:37:37.100800Z,...,,,,,1,1,1,1,1,1
9,http://schemas.microsoft.com/win/2004/08/event...,Microsoft-Windows-Winsock-WS2HELP,D5C25F9A-4D47-493E-9184-40DD397A004D,1,0,0,0,0,0x8000000000000000,2019-08-23T12:37:38.521158Z,...,,,,,1,1,1,1,1,1




## 🤖 AI Agent: Natural language to Prompts

## Libraries & Setup API Key

In [14]:
# Set the gemini API key in Secrets
from google.colab import userdata
import logging
import sys
from IPython.display import Markdown, display
import pandas as pd
from llama_index.experimental.query_engine import PandasQueryEngine
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

gemini_key = userdata.get('GOOGLE_API_KEY')
openai_key = userdata.get('OPENAI_API_KEY')

from llama_index.llms.gemini import Gemini
llm_gemini = Gemini(model="models/gemini-1.5-flash", api_key=gemini_key)

from llama_index.llms.openai import OpenAI
llm_chatgpt = OpenAI(model="gpt-4", api_key=openai_key)

### Agent: GPT

In [7]:
prompt_text_chatgpt = 'long tail analysis of process'  # @param {type: "string"}
verbose = True  # @param {type: "boolean"}
logtype = "raw" # @param ["raw", "sigma", "anomalies"]

df_results = log_analyzer.get_pool_data(log_type=logtype)

query_engine = PandasQueryEngine(df=df_results, verbose=verbose,llm=llm_chatgpt)

response = query_engine.query(prompt_text_chatgpt)

> Pandas Instructions:
```
df['System_Provider_#attributes_Name'].value_counts().tail()
```
> Pandas Output: System_Provider_#attributes_Name
Microsoft-Windows-Security-Auditing         38
Microsoft-Windows-Application-Experience     7
Microsoft-Windows-Bits-Client                6
Microsoft-Windows-Eventlog                   2
Microsoft-Windows-Winsock-WS2HELP            2
Name: count, dtype: int64


### Agent: Gemini

In [15]:
prompt_text_gemini = 'what are the anomalies entry'  # @param {type: "string"}
verbose = True  # @param {type: "boolean"}
logtype = "anomalies" # @param ["raw", "sigma", "anomalies"]

df_results = log_analyzer.get_pool_data(log_type=logtype)

query_engine = PandasQueryEngine(df=df_results, verbose=True,llm=llm_gemini)

response = query_engine.query(prompt_text_gemini)

> Pandas Instructions:
```
df[['Anomaly_Autoencoder', 'Anomaly_IsolationForest', 'Anomaly_OCSVM', 'Anomaly_LOF', 'Anomaly_PCA', 'Anomaly_Ensemble']].head()

```
> Pandas Output:    Anomaly_Autoencoder  Anomaly_IsolationForest  Anomaly_OCSVM  Anomaly_LOF  \
0                    1                        0              1            1   
1                    1                        1              1            1   
2                    1                        1              1            1   
3                    0                        1              0            1   
4                    1                        1              1            1   

   Anomaly_PCA  Anomaly_Ensemble  
0            1                 1  
1            1                 1  
2            0                 1  
3            1                 1  
4            1                 1  


### Agent: local llm huggingface

In [None]:
!pip install llama-index-llms-huggingface

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "HuggingFaceH4/zephyr-7b-alpha"

# Load tokenizer and model separately to check
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

print("Tokenizer vocab size:", tokenizer.vocab_size)
print("Model vocab size:", model.config.vocab_size)

In [None]:
prompt_text_zephyr = 'What are the processes'  # @param {type: "string"}
verbose = True  # @param {type: "boolean"}
query_engine = PandasQueryEngine(df=log_analyzer.df_results, verbose=True,llm=llm_zephyr)
response = query_engine.query(prompt_text_zephyr)