# Molt-Shield Demo

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/batmanvane/molt-shield/blob/main/notebooks/molt_shield_demo.ipynb)

**Zero-Trust Engineering Gateway** - An educational demonstration of how Molt-Shield anonymizes proprietary XML data.

This notebook shows step-by-step how value masking, tag shadowing, and sibling shuffling work.

## 1. Setup

First, mount this notebook from GitHub or upload the files, then install dependencies.

### Option A: Clone from GitHub (recommended)
```python
!git clone https://github.com/batmanvane/molt-shield.git /content/molt-shield
```

### Option B: Upload files manually
Upload these files to Colab:
- `src/` folder
- `config/` folder

### Option C: Install dependencies only
```python
!pip install -q lxml pydantic pyyaml
```
Then manually copy the source files to `/content/molt-shield/src/`

In [1]:
# ============================================================
# SETUP: Install dependencies (run this FIRST)
# ============================================================
# These packages are NOT pre-installed in Google Colab

!pip install -q lxml pydantic pyyaml

print("✓ Dependencies installed: lxml, pydantic, pyyaml")

# Clone repository
!git clone https://github.com/batmanvane/molt-shield.git /content/molt-shield

import sys
sys.path.insert(0, '/content/molt-shield/src')

import os
os.chdir('/content/molt-shield')

print("✓ Environment ready!")

✓ Dependencies installed: lxml, pydantic, pyyaml
Cloning into '/content/molt-shield'...
remote: Enumerating objects: 120, done.[K
remote: Counting objects: 100% (120/120), done.[K
remote: Compressing objects: 100% (84/84), done.[K
remote: Total 120 (delta 45), reused 109 (delta 34), pack-reused 0 (from 0)[K
Receiving objects: 100% (120/120), 73.73 KiB | 4.34 MiB/s, done.
Resolving deltas: 100% (45/45), done.
✓ Environment ready!


In [2]:
import sys
sys.path.insert(0, '/content/molt-shield/src')

import os
os.chdir('/content/molt-shield')

from src.config import load_config, MaskingConfig, ShufflingConfig
from src.gatekeeper import apply_gatekeeper, mask_values, shuffle_siblings, DEFAULT_TAG_MAP, _apply_tag_shadowing
from src.policy_engine import Policy, Rule
from src.vault import Vault
from pathlib import Path
from lxml import etree

## 2. Sample Data

Let's define a sample XML file representing proprietary simulation data.

In [3]:
# Sample proprietary XML data
SAMPLE_XML = """<?xml version="1.0" encoding="UTF-8"?>
<simulation>
    <metadata>
        <id>sim-001</id>
        <type>thermal_analysis</type>
    </metadata>
    <element id="e1">
        <pressure>123.45</pressure>
        <temperature>500.0</temperature>
        <velocity>25.5</velocity>
    </element>
    <element id="e2">
        <pressure>678.90</pressure>
        <temperature>600.0</temperature>
        <velocity>30.2</velocity>
    </element>
    <node id="n1">
        <coordinates x="10.5" y="20.3" z="30.7"/>
    </node>
</simulation>"""

# Parse XML to tree (required by mask_values)
xml_tree = etree.fromstring(SAMPLE_XML.encode())

# Save to file
with open('sample_input.xml', 'w') as f:
    f.write(SAMPLE_XML)

print("=== ORIGINAL DATA (PROPRIETARY) ===")
print(SAMPLE_XML)

=== ORIGINAL DATA (PROPRIETARY) ===
<?xml version="1.0" encoding="UTF-8"?>
<simulation>
    <metadata>
        <id>sim-001</id>
        <type>thermal_analysis</type>
    </metadata>
    <element id="e1">
        <pressure>123.45</pressure>
        <temperature>500.0</temperature>
        <velocity>25.5</velocity>
    </element>
    <element id="e2">
        <pressure>678.90</pressure>
        <temperature>600.0</temperature>
        <velocity>30.2</velocity>
    </element>
    <node id="n1">
        <coordinates x="10.5" y="20.3" z="30.7"/>
    </node>
</simulation>


## 3. Value Masking

**Problem:** Numeric values in your data could reveal proprietary information.

**Solution:** Replace all numeric values with UUID placeholders.

In [4]:
# Configure masking
masking_config = MaskingConfig()

# Create a temporary vault
vault = Vault('demo_vault.json')

# Apply masking (pass the tree, not the string)
masked_tree = mask_values(xml_tree, masking_config, vault)
masked_xml = etree.tostring(masked_tree, encoding='unicode')

print("=== AFTER VALUE MASKING ===")
print(masked_xml)

# Save vault
vault.save()
print(f"\nVault saved with {len(vault)} entries")

=== AFTER VALUE MASKING ===
<simulation>
    <metadata>
        <id>sim-001</id>
        <type>thermal_analysis</type>
    </metadata>
    <element id="e1">
        <pressure>VAL_4d15112a74ec</pressure>
        <temperature>VAL_15cefbc946a6</temperature>
        <velocity>VAL_11bed83ed4d5</velocity>
    </element>
    <element id="e2">
        <pressure>VAL_31d618c62a81</pressure>
        <temperature>VAL_89e3926c3005</temperature>
        <velocity>VAL_cc9105793f93</velocity>
    </element>
    <node id="n1">
        <coordinates x="10.5" y="20.3" z="30.7"/>
    </node>
</simulation>

Vault saved with 6 entries


## 4. Tag Shadowing

**Problem:** Tag names like `<pressure>`, `<temperature>` reveal what kind of data this is.

**Solution:** Map proprietary tag names to generic equivalents.

In [5]:
# Apply tag shadowing directly to the masked tree
_apply_tag_shadowing(masked_tree, DEFAULT_TAG_MAP)
shadowed_xml = etree.tostring(masked_tree, encoding='unicode')

print("=== AFTER TAG SHADOWING ===")
print(shadowed_xml)

print("\n=== TAG MAPPING APPLIED ===")
for original, shadowed in DEFAULT_TAG_MAP.items():
    print(f"  {original} → {shadowed}")

=== AFTER TAG SHADOWING ===
<simulation>
    <metadata>
        <id>sim-001</id>
        <type>thermal_analysis</type>
    </metadata>
    <element id="e1">
        <metric_alpha>VAL_4d15112a74ec</metric_alpha>
        <thermal_beta>VAL_15cefbc946a6</thermal_beta>
        <kinematic_gamma>VAL_11bed83ed4d5</kinematic_gamma>
    </element>
    <element id="e2">
        <metric_alpha>VAL_31d618c62a81</metric_alpha>
        <thermal_beta>VAL_89e3926c3005</thermal_beta>
        <kinematic_gamma>VAL_cc9105793f93</kinematic_gamma>
    </element>
    <node id="n1">
        <spatial_delta x="10.5" y="20.3" z="30.7"/>
    </node>
</simulation>

=== TAG MAPPING APPLIED ===
  pressure → metric_alpha
  temperature → thermal_beta
  velocity → kinematic_gamma
  coordinates → spatial_delta


# Configure shuffling
shuffling_config = ShufflingConfig(seed=42)  # Fixed seed for reproducibility

# Apply shuffling (now works with string input)
shuffled_xml = shuffle_siblings(shadowed_xml, shuffling_config)

print("=== AFTER SIBLING SHUFFLING ===")
print(shuffled_xml)

In [6]:
# Configure shuffling
shuffling_config = ShufflingConfig(seed=42)  # Fixed seed for reproducibility

# Apply shuffling
shuffled_xml = shuffle_siblings(shadowed_xml, shuffling_config)

print("=== AFTER SIBLING SHUFFLING ===")
print(shuffled_xml)

=== AFTER SIBLING SHUFFLING ===
<simulation>
    <metadata>
        <id>sim-001</id>
        <type>thermal_analysis</type>
    </metadata>
    <element id="e1">
        <thermal_beta>VAL_15cefbc946a6</thermal_beta>
        <metric_alpha>VAL_4d15112a74ec</metric_alpha>
        <kinematic_gamma>VAL_11bed83ed4d5</kinematic_gamma>
    </element>
    <element id="e2">
        <kinematic_gamma>VAL_cc9105793f93</kinematic_gamma>
    <thermal_beta>VAL_89e3926c3005</thermal_beta>
        <metric_alpha>VAL_31d618c62a81</metric_alpha>
        </element>
    <node id="n1">
        <spatial_delta x="10.5" y="20.3" z="30.7"/>
    </node>
</simulation>


## 6. Full Pipeline

Now let's run the complete gatekeeper pipeline using the Policy engine.

In [7]:
# Create a policy
policy = Policy(
    version="1.0",
    global_masking=True,
    rules=[
        Rule(tag_pattern="pressure", action="mask_value"),
        Rule(tag_pattern="temperature", action="mask_value"),
        Rule(tag_pattern="velocity", action="mask_value"),
        Rule(tag_pattern="coordinates", action="mask_value"),
        Rule(tag_pattern="element", action="shuffle_siblings"),
    ]
)

# Load config
config = load_config('config/default.yaml')

# Apply full pipeline
input_path = Path('sample_input.xml')
sanitized_path, vault_path = apply_gatekeeper(input_path, policy, config)

print("=== FULL PIPELINE RESULT ===\n")
print("--- SANITIZED OUTPUT (SAFE TO SHARE) ---")
print(sanitized_path.read_text())

=== FULL PIPELINE RESULT ===

--- SANITIZED OUTPUT (SAFE TO SHARE) ---
<?xml version='1.0' encoding='UTF-8'?>
<simulation>
    <metadata>
        <id>sim-001</id>
        <type>thermal_analysis</type>
    </metadata>
    <element id="e1">
        <thermal_beta>VAL_f3072ae6b1e4</thermal_beta>
        <kinematic_gamma>VAL_c31c650f074e</kinematic_gamma>
    <metric_alpha>VAL_9eb85924bee5</metric_alpha>
        </element>
    <element id="e2">
        <metric_alpha>VAL_96e8cbbdea36</metric_alpha>
        <kinematic_gamma>VAL_d6fc6ec4d194</kinematic_gamma>
    <thermal_beta>VAL_f4c522ea4135</thermal_beta>
        </element>
    <node id="n1">
        <spatial_delta x="10.5" y="20.3" z="30.7"/>
    </node>
</simulation>



## 7. Rehydration

After AI analysis, restore original values from the vault.

In [8]:
# Load the vault created by apply_gatekeeper
vault = Vault(vault_path)
vault.load()

# Simulated AI output (using masked values)
ai_output_xml = """
<element id="e1">
    <metric_alpha>VAL_abc123</metric_alpha>
    <thermal_beta>VAL_def456</thermal_beta>
</element>"""

print("=== REHYDRATION DEMO ===\n")
print("AI returned (masked):")
print(ai_output_xml)

# Rehydrate the XML
rehydrated_xml = vault.rehydrate_xml(ai_output_xml)

print("\nAfter rehydration (original values restored):")
print(rehydrated_xml)

# Show vault contents
print(f"\nVault contains {len(vault)} entries:")
for masked, entry in list(vault.entries.items())[:5]:
    print(f"  {masked} → {entry.original_value}")

=== REHYDRATION DEMO ===

AI returned (masked):

<element id="e1">
    <metric_alpha>VAL_abc123</metric_alpha>
    <thermal_beta>VAL_def456</thermal_beta>
</element>

After rehydration (original values restored):

<element id="e1">
    <metric_alpha>VAL_abc123</metric_alpha>
    <thermal_beta>VAL_def456</thermal_beta>
</element>

Vault contains 6 entries:
  VAL_9eb85924bee5 → 123.45
  VAL_f3072ae6b1e4 → 500.0
  VAL_c31c650f074e → 25.5
  VAL_96e8cbbdea36 → 678.90
  VAL_f4c522ea4135 → 600.0


## Summary

Molt-Shield transforms proprietary data into anonymized data that AI can analyze:

| Step | What It Does | Example |
|------|--------------|--------|
| Value Masking | Replace numbers with UUIDs | `123.45` → `VAL_x7k2m` |
| Tag Shadowing | Rename proprietary tags | `<pressure>` → `<metric_alpha>` |
| Sibling Shuffle | Randomize element order | `[e1, e2]` → `[e2, e1]` |
| Vault Storage | Save originals for later | `VAL_x7k2m` → `123.45` |

The AI receives anonymized data that preserves semantic structure but reveals nothing proprietary.