Skip to content

KeyaAggarwal/biohackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SynthShield: DNA Synthesis Security System

A Comprehensive Biosecurity Platform that screens DNA synthesis orders for dangerous pathogens using 8-layer detection architecture with AI, cryptography, and blockchain integration.

Executive Summary

SynthShield is a production-ready DNA synthesis security system that protects against:

  • Dangerous pathogens (AI screening with ESM-2 protein language model)
  • Semantic evasion attacks (5-method ensemble: reverse complement, frame shifts, junk interleaving, codon optimization, synthetic patterns)
  • Split-order reassembly attacks (temporal Edison Guard with rolling buffer)
  • Tampering & fraud (HMAC-chained cryptographic logging with Merkle trees)
  • Unauthorized synthesis (hardware-enforced token verification with TPM)
  • Regulatory compliance (immutable blockchain audit trail on Ethereum L2)

Detection Rate: 85%+ across all attack types
Processing Time: 300-400ms per order
Entry Point: Single function call with complete results


Table of Contents


Quick Start

Installation

# Clone repository
git clone https://github.com/your-org/synthshield.git
cd synthshield

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r pyproject.toml

# Verify installation
python -c "from synthshield.pipeline import SynthShieldPipeline; print('βœ“ Ready')"

Basic Usage (30 seconds)

from synthshield.pipeline import SynthShieldPipeline, SynthesisDecision

# Initialize pipeline (one-time setup)
pipeline = SynthShieldPipeline(
    hardware_id="SYNTH-LAB-001",
    toxin_references=["ATCGATCGATCGATCG", "GCTAGCTAGCTAGCTA"],  # Your dangerous sequences
    use_blockchain=True,
    enable_edison_guard=True
)

# Process a DNA synthesis order
result = pipeline.process_synthesis_order(
    dna_sequence="ATCGATCGATCGATCGATCG",
    metadata={'customer': 'Research Lab', 'order_id': 'ORD-001'}
)

# Get results
print(f"Decision: {result.decision}")                      # APPROVED/BLOCKED/REVIEW
print(f"Risk Score: {result.risk_scores.combined:.1%}")   # 0-100%
print(f"Hardware Auth: {result.hardware_authorized}")     # True/False
print(f"Processing Time: {result.processing_time_ms:.0f}ms")

# Access detailed information
if result.decision == SynthesisDecision.BLOCKED:
    print(f"Reason: {result.decision_reasoning}")
    print(f"Recommendations:")
    for rec in result.recommendations:
        print(f"  - {rec}")

πŸ—οΈ Architecture

8-Layer Detection Pipeline

DNA Input
  ↓
[Layer 1] Embedding Generation (ESM-2) β†’ 1280-dimensional vector
  ↓
[Layer 2] Evasion Detection (5 methods) β†’ Semantic attack scoring
  ↓
[Layer 3] Neural Screening (Sentinel Head) β†’ Risk score + token
  ↓
[Layer 4] Ensemble Decision β†’ APPROVED/BLOCKED/REVIEW
  ↓ (if APPROVED)
[Layer 5] Fragment Management (Edison Guard) β†’ Split-order detection
  ↓
[Layer 6] Cryptographic Logging (Black Box) β†’ HMAC-chained audit
  ↓
[Layer 7] L2 Blockchain Anchoring β†’ Immutable record
  ↓
[Layer 8] Hardware Interlock β†’ Valve authorization
  ↓
Complete Result (decision, risks, audit trail, blockchain, hardware status)

Key Components

Component File Purpose Input Output
Embedding core/embeddings.py ESM-2 protein representation DNA sequence 1280D vector
Evasion Detector core/evasion_detection.py 5-method attack ensemble DNA, reference toxins Risk score, attacks found
Sentinel Head core/sentinel_head.py Neural risk scorer Embedding, sequence hash Risk score, HMAC token
Screener core/screening.py Threshold-based decision Risk scores APPROVED/BLOCKED/REVIEW
Edison Guard hardware/edison_window.py Split-order detector DNA fragments, timestamp Buffer status, threat flag
Black Box hardware/blackbox.py Cryptographic logging All results Chain hash, Merkle root
L2 Anchor blockchain/ethereum_anchor.py Blockchain submission Merkle root Transaction hash
Interlock hardware/interlock.py Hardware valve control HMAC token Valve state

Code Structure

synthshield/
β”œβ”€β”€ pipeline.py                     [MAIN: Unified orchestrator - START HERE]
β”‚
β”œβ”€β”€ core/                           [AI Screening & Orchestration]
β”‚   β”œβ”€β”€ embeddings.py                  β€’ ESM-2 embedding wrapper
β”‚   β”œβ”€β”€ sentinel_head.py               β€’ Residual MLP risk scorer (generates tokens)
β”‚   β”œβ”€β”€ screening.py                   β€’ Threshold-based decision logic
β”‚   β”œβ”€β”€ evasion_detection.py           β€’ 5-method semantic attack ensemble
β”‚   β”œβ”€β”€ enhanced_edison.py             β€’ Edison Guard with evasion integration
β”‚   β”œβ”€β”€ trained_classifier.py          β€’ Research-based ML classifier
β”‚   β”œβ”€β”€ forensic_orchestrator.py       β€’ 6-stage orchestration (legacy)
β”‚   β”œβ”€β”€ notebook_integration.py        β€’ Bridge notebook researchβ†’production
β”‚   └── demo_l2_integration.py         β€’ L2 blockchain demo
β”‚
β”œβ”€β”€ hardware/                       [Hardware & Cryptography]
β”‚   β”œβ”€β”€ blackbox.py                    β€’ HMAC-SHA256 event chaining
β”‚   β”œβ”€β”€ edison_window.py               β€’ Rolling buffer (50kb) + reassembly detection
β”‚   β”œβ”€β”€ interlock.py                   β€’ Solenoid valve control with token verification
β”‚   └── demo_edison.py                 β€’ Edison Guard demo
β”‚
β”œβ”€β”€ blockchain/                     [L2 Blockchain Integration]
β”‚   └── ethereum_anchor.py             β€’ Optimism/Arbitrum/Base L2 submission
β”‚
β”œβ”€β”€ audit/                          [Forensic & Compliance]
β”‚   └── verify_chain.py                β€’ Chain integrity verification
β”‚
β”œβ”€β”€ data/                           [Datasets]
β”‚   └── datasets.py                    β€’ Dataset utilities
β”‚
β”œβ”€β”€ detection/                      [Legacy: Split-order Detection]
β”‚   └── (files moved to hardware/)
β”‚
└── frontend/                       [Web Dashboard]
    └── App.jsx                        β€’ React frontend (optional)

esm_biosecurity_screening.ipynb contains the training and evaluation for our ESM classifier.

How to Implement

Step 1: Prepare Toxin References

# Load your lab's known dangerous sequences
toxin_references = [
    "ATCGATCGATCGATCGATCGATCG",  # Botulinum toxin
    "GCTAGCTAGCTAGCTAGCTAGCTA",  # Ricin toxin
    "TTAATTAATTAATTAATTAATTAA",  # Anthrax-like
    # ... load from database or file
]

Step 2: Initialize Pipeline

from synthshield.pipeline import SynthShieldPipeline

pipeline = SynthShieldPipeline(
    hardware_id="YOUR_HARDWARE_ID",
    toxin_references=toxin_references,
    use_blockchain=True,              # Enable L2 anchoring
    use_trained_classifier=True,      # Use ML ensemble
    enable_edison_guard=True,         # Enable split-order detection
    enable_logging=True,              # Enable Black Box logging
    mock_blockchain=False             # Use real L2 (or True for testing)
)

Step 3: Process Orders

# For each DNA synthesis request
result = pipeline.process_synthesis_order(
    dna_sequence=customer_dna,
    metadata={
        'customer': 'Customer Name',
        'lab': 'Lab ID',
        'order_id': 'ORD-123',
        'timestamp': time.time()
    }
)

Step 4: Handle Results

from synthshield.pipeline import SynthesisDecision

if result.decision == SynthesisDecision.APPROVED:
    # Synthesis approved
    # - Hardware token sent automatically
    # - Solenoid valve opens
    # - Synthesis proceeds
    log_to_lims(result)
    
elif result.decision == SynthesisDecision.BLOCKED:
    # Synthesis blocked
    # - Token not generated
    # - Hardware stays closed
    # - Alert customer
    notify_customer(result.decision_reasoning)
    
else:  # REVIEW
    # Manual review needed
    # - Moderate risk detected
    # - Contact security team
    escalate_to_security_team(result)

Step 5: Access Audit Trail

# Complete audit trail available
for stage in result.audit_trail:
    print(f"Stage {stage['stage']}: {stage['name']} - {stage['status']}")
    print(f"  Timestamp: {stage['timestamp']}")

# Export for compliance
audit_json = result.to_json()
save_audit_log(result.sequence_hash, audit_json)

# Blockchain record
if result.blockchain_record:
    tx_hash = result.blockchain_record.get('tx_hash')
    print(f"Immutable record on L2: {tx_hash}")

API Reference

SynthShieldPipeline

Constructor:

SynthShieldPipeline(
    hardware_id: str,                    # Hardware identifier
    toxin_references: List[str] = [],    # Known dangerous sequences
    use_blockchain: bool = False,        # Enable L2 anchoring
    use_trained_classifier: bool = True, # Use ML classifier
    trained_classifier_path: Optional[str] = None,
    enable_edison_guard: bool = True,    # Enable split-order detection
    enable_logging: bool = True,         # Enable Black Box
    mock_blockchain: bool = False,       # Use mock L2 for testing
    tpm_secret: bytes = b"..."          # TPM secret key
)

Main Method:

def process_synthesis_order(
    dna_sequence: str,                   # DNA to screen
    metadata: Optional[Dict] = None,     # Order metadata
    is_fragment: bool = False,           # Fragment for Edison Guard
    fragment_id: Optional[str] = None    # Fragment identifier
) -> SynthesisResult

Result Object:

SynthesisResult:
  .decision                 β†’ SynthesisDecision (APPROVED/BLOCKED/REVIEW)
  .risk_scores              β†’ RiskScores (evasion, neural, combined)
  .risk_level               β†’ RiskLevel (LOW/MEDIUM/HIGH/CRITICAL)
  .evasion_details          β†’ EvasionDetails (attack types found)
  .neural_screening_result  β†’ Dict (risk details)
  .hardware_authorized      β†’ bool (valve opened?)
  .valve_state              β†’ str ('OPEN'|'CLOSED'|'ERROR')
  .block_hash               β†’ str (Black Box chain hash)
  .blockchain_record        β†’ Dict (L2 submission details)
  .audit_trail              β†’ List[Dict] (all 8 stages)
  .processing_time_ms       β†’ float (total time)
  .to_dict()                β†’ Dict (JSON-serializable)
  .to_json()                β†’ str (JSON string)

About

A Comprehensive Biosecurity Platform that screens DNA synthesis orders for dangerous pathogens using 8-layer detection architecture with AI, cryptography, and blockchain integration.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors