# Molt-Shield Demo

**Zero-Trust Engineering Gateway** - An educational demonstration of how Molt-Shield anonymizes proprietary XML data.

This notebook shows step-by-step how value masking, tag shadowing, and sibling shuffling work.

## 1. Setup

Install dependencies and set up the environment.

In [None]:
# Clone the repository (if not already present)
# !git clone https://github.com/your-repo/molt-shield.git /content/molt-shield

# Install dependencies
!pip install -q lxml pydantic pyyaml

In [None]:
import sys
sys.path.insert(0, '/content/molt-shield/src')

import os
os.chdir('/content/molt-shield')

from src.config import load_config, MaskingConfig, ShufflingConfig
from src.gatekeeper import apply_gatekeeper, mask_values, shuffle_siblings, DEFAULT_TAG_MAP
from src.policy_engine import Policy, Rule
from src.vault import Vault
from pathlib import Path
from lxml import etree

## 2. Sample Data

Let's define a sample XML file representing proprietary simulation data.

In [None]:
# Sample proprietary XML data
SAMPLE_XML = """<?xml version="1.0" encoding="UTF-8"?>
<simulation>
    <metadata>
        <id>sim-001</id>
        <type>thermal_analysis</type>
    </metadata>
    <element id="e1">
        <pressure>123.45</pressure>
        <temperature>500.0</temperature>
        <velocity>25.5</velocity>
    </element>
    <element id="e2">
        <pressure>678.90</pressure>
        <temperature>600.0</temperature>
        <velocity>30.2</velocity>
    </element>
    <node id="n1">
        <coordinates x="10.5" y="20.3" z="30.7"/>
    </node>
</simulation>"""

# Save to file
with open('sample_input.xml', 'w') as f:
    f.write(SAMPLE_XML)

print("=== ORIGINAL DATA (PROPRIETARY) ===")
print(SAMPLE_XML)

## 3. Value Masking

**Problem:** Numeric values in your data could reveal proprietary information.

**Solution:** Replace all numeric values with UUID placeholders.

In [None]:
# Configure masking
masking_config = MaskingConfig()

# Create a temporary vault
vault = Vault('demo_vault.json')

# Apply masking
masked_xml = mask_values(SAMPLE_XML, masking_config, vault)

print("=== AFTER VALUE MASKING ===")
print(masked_xml)

# Save vault
vault.save()
print(f"\nVault saved with {len(vault)} entries")

## 4. Tag Shadowing

**Problem:** Tag names like `<pressure>`, `<temperature>` reveal what kind of data this is.

**Solution:** Map proprietary tag names to generic equivalents.

In [None]:
# Apply tag shadowing manually (using internal function)
from src.gatekeeper import _apply_tag_shadowing

tree = etree.fromstring(masked_xml.encode())
_apply_tag_shadowing(tree, DEFAULT_TAG_MAP)
shadowed_xml = etree.tostring(tree, encoding='unicode')

print("=== AFTER TAG SHADOWING ===")
print(shadowed_xml)

print("\n=== TAG MAPPING APPLIED ===")
for original, shadowed in DEFAULT_TAG_MAP.items():
    print(f"  {original} → {shadowed}")

## 5. Sibling Shuffling

**Problem:** The order of elements can reveal patterns or be fingerprinted.

**Solution:** Randomize the order of sibling elements (deterministically with a seed).

In [None]:
# Configure shuffling
shuffling_config = ShufflingConfig(seed=42)  # Fixed seed for reproducibility

# Apply shuffling
shuffled_xml = shuffle_siblings(shadowed_xml, shuffling_config)

print("=== AFTER SIBLING SHUFFLING ===")
print(shuffled_xml)

## 6. Full Pipeline

Now let's run the complete gatekeeper pipeline using the Policy engine.

In [None]:
# Create a policy
policy = Policy(
    version="1.0",
    global_masking=True,
    rules=[
        Rule(tag_pattern="pressure", action="mask_value"),
        Rule(tag_pattern="temperature", action="mask_value"),
        Rule(tag_pattern="velocity", action="mask_value"),
        Rule(tag_pattern="coordinates", action="mask_value"),
        Rule(tag_pattern="element", action="shuffle_siblings"),
    ]
)

# Load config
config = load_config('config/default.yaml')

# Apply full pipeline
input_path = Path('sample_input.xml')
sanitized_path, vault_path = apply_gatekeeper(input_path, policy, config)

print("=== FULL PIPELINE RESULT ===\n")
print("--- SANITIZED OUTPUT (SAFE TO SHARE) ---")
print(sanitized_path.read_text())

## 7. Rehydration

After AI analysis, restore original values from the vault.

In [None]:
# Load the vault created by apply_gatekeeper
vault = Vault(vault_path)
vault.load()

# Simulated AI output (using masked values)
ai_output_xml = """
<element id="e1">
    <metric_alpha>VAL_abc123</metric_alpha>
    <thermal_beta>VAL_def456</thermal_beta>
</element>"""

print("=== REHYDRATION DEMO ===\n")
print("AI returned (masked):")
print(ai_output_xml)

# Rehydrate the XML
rehydrated_xml = vault.rehydrate_xml(ai_output_xml)

print("\nAfter rehydration (original values restored):")
print(rehydrated_xml)

# Show vault contents
print(f"\nVault contains {len(vault)} entries:")
for masked, entry in list(vault.entries.items())[:5]:
    print(f"  {masked} → {entry.original_value}")

## Summary

Molt-Shield transforms proprietary data into anonymized data that AI can analyze:

| Step | What It Does | Example |
|------|--------------|--------|
| Value Masking | Replace numbers with UUIDs | `123.45` → `VAL_x7k2m` |
| Tag Shadowing | Rename proprietary tags | `<pressure>` → `<metric_alpha>` |
| Sibling Shuffle | Randomize element order | `[e1, e2]` → `[e2, e1]` |
| Vault Storage | Save originals for later | `VAL_x7k2m` → `123.45` |

The AI receives anonymized data that preserves semantic structure but reveals nothing proprietary.