# ISO/IEC 27002:2022 CONTROL ASSESSMENT - REVISED

**REVISION NOTES:**
- Added ALL 13 ISO controls from thesis Appendix E.4
- Aligned assessment functions with indicator intent from thesis
- Maintained A.8.23 and A.8.26 as they provide relevant assessment data
- Total: 15 controls assessed

Analyses 3: Comprehensive assessment of ISO 27002 controls for identified CPSS devices.

Input:
- cpss_all_services_enhanced.csv (from analyses_2)

Output:
- cpss_iso27002_assessment.csv
- cpss_iso27002_heatmap.png
- cpss_iso27002_report.txt

ISO 27002:2022 Controls Assessed (aligned with Thesis Appendix E.4):

**Access Control Domain:**
- A.5.15: Access control
- A.8.2: Privileged access rights
- A.8.3: Information access restriction
- A.8.5: Secure authentication

**Configuration Domain:**
- A.8.9: Configuration management

**Network Domain:**
- A.8.20: Network security
- A.8.21: Security of network services
- A.8.22: Segregation of networks
- A.8.23: Web filtering (supplementary)

**Cryptography Domain:**
- A.8.24: Use of cryptography

**Patch & Lifecycle Domain:**
- A.8.8: Management of technical vulnerabilities

**Logging Domain:**
- A.8.16: Monitoring activities

**Cloud Domain:**
- A.5.23: Information security for use of cloud services

**Supply Chain Domain:**
- A.5.19: Information security in supplier relationships

**Application Domain:**
- A.8.26: Application security requirements (supplementary)


## Configuration

In [47]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from datetime import datetime, timedelta
import re

# ============================================================================
# CONFIGURATION - ENHANCED
# ============================================================================

# Input/Output paths
INPUT_FILE = Path('./output/2_cpss_identification/cpss_all_services_enhanced.csv')
OUTPUT_DIR = Path('./output/3_iso27002_assessment')
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

# ISO 27002:2022 Control Definitions (Complete mapping from Thesis)
ISO_CONTROLS = {
    # Access Control Domain
    'A.5.15': {
        'name': 'Access control',
        'description': 'Rules to control physical and logical access',
        'domain': 'Access Control',
        'interpretation': 'Access control mechanisms ensure only authorized entities can access CPSS resources. Non-compliance indicates inadequate authentication, authorization, or access management, enabling unauthorized physical or logical access to critical infrastructure.'
    },
    'A.8.2': {
        'name': 'Privileged access rights',
        'description': 'Allocation and use of privileged access rights',
        'domain': 'Access Control',
        'interpretation': 'Privileged access management prevents misuse of administrative credentials in CPSS. Weaknesses allow attackers to escalate privileges, manipulate system configurations, or disable security controls entirely.'
    },
    'A.8.3': {
        'name': 'Information access restriction',
        'description': 'Access to information and other associated assets',
        'domain': 'Access Control',
        'interpretation': 'Information access restrictions protect sensitive CPSS data and configurations. Non-compliance exposes operational data, credentials, and system architecture to unauthorized parties.'
    },
    'A.8.5': {
        'name': 'Secure authentication',
        'description': 'Secure authentication technologies and procedures',
        'domain': 'Access Control',
        'interpretation': 'Secure authentication prevents unauthorized CPSS access through robust credential verification. Weak or default credentials represent the most common attack vector against deployed CPSS infrastructure.'
    },
    # Configuration Domain
    'A.8.9': {
        'name': 'Configuration management',
        'description': 'Configurations including security configurations',
        'domain': 'Configuration',
        'interpretation': 'Configuration management ensures CPSS devices maintain secure baseline settings. Insecure configurations create persistent vulnerabilities that attackers can exploit without requiring sophisticated techniques.'
    },
    # Network Domain
    'A.8.20': {
        'name': 'Network security',
        'description': 'Security of networks and network services',
        'domain': 'Network',
        'interpretation': 'Network security controls protect CPSS communication channels from interception and manipulation. Inadequate network security enables man-in-the-middle attacks and unauthorized network access.'
    },
    'A.8.21': {
        'name': 'Security of network services',
        'description': 'Security mechanisms, service levels, requirements',
        'domain': 'Network',
        'interpretation': 'Network service security ensures CPSS services operate within defined security parameters. Non-compliance indicates services running without adequate protection or monitoring.'
    },
    'A.8.22': {
        'name': 'Segregation of networks',
        'description': 'Groups of information services, users and systems',
        'domain': 'Network',
        'interpretation': 'Network segregation isolates CPSS from general IT networks, limiting lateral movement in security breaches. Lack of segregation allows compromise of CPSS through adjacent systems.'
    },
    'A.8.23': {
        'name': 'Web filtering',
        'description': 'Access to external websites managed',
        'domain': 'Network',
        'interpretation': 'Web filtering prevents CPSS devices from accessing malicious external resources. Unfiltered access enables command-and-control communication and malware download.'
    },
    # Cryptography Domain
    'A.8.24': {
        'name': 'Use of cryptography',
        'description': 'Rules for effective use of cryptography',
        'domain': 'Cryptography',
        'interpretation': 'Cryptographic controls protect CPSS data confidentiality and integrity. Weak or absent encryption exposes sensitive operational data and enables undetected manipulation.'
    },
    # Patch & Lifecycle Domain
    'A.8.8': {
        'name': 'Management of technical vulnerabilities',
        'description': 'Information about technical vulnerabilities',
        'domain': 'Patch & Lifecycle',
        'interpretation': 'Vulnerability management ensures CPSS software remains patched against known exploits. Unpatched systems are prime targets for automated attacks and ransomware deployment.'
    },
    # Logging Domain
    'A.8.16': {
        'name': 'Monitoring activities',
        'description': 'Networks, systems and applications monitored',
        'domain': 'Logging',
        'interpretation': 'Activity monitoring enables detection of unauthorized CPSS access and anomalous behavior. Absent monitoring prevents incident detection and forensic investigation.'
    },
    # Cloud Domain
    'A.5.23': {
        'name': 'Information security for use of cloud services',
        'description': 'Processes for acquisition, use, management and exit',
        'domain': 'Cloud',
        'interpretation': 'Cloud security controls govern CPSS services deployed in cloud environments. Non-compliance creates shared responsibility gaps and data sovereignty issues.'
    },
    # Supply Chain Domain
    'A.5.19': {
        'name': 'Information security in supplier relationships',
        'description': 'Security in supplier relationships',
        'domain': 'Supply Chain',
        'interpretation': 'Supply chain security addresses risks from CPSS vendors and service providers. Weak vendor controls enable supply chain attacks and introduce systemic vulnerabilities.'
    },
    # Application Domain
    'A.8.26': {
        'name': 'Application security requirements',
        'description': 'Information security requirements for applications',
        'domain': 'Application',
        'interpretation': 'Application security ensures CPSS software is developed and deployed securely. Insecure applications introduce code-level vulnerabilities exploitable through multiple attack vectors.'
    }
}

# CPSS Type-Specific Risk Scenario Mapping (Impact Indicator)
SCENARIO_CONTROLS = {
    'EACS': ['A.5.15', 'A.8.2', 'A.8.5', 'A.8.9', 'A.8.22', 'A.8.20'],
    'VSS': ['A.8.24', 'A.8.21', 'A.8.8', 'A.8.5', 'A.8.9'],
    'IHAS': ['A.5.15', 'A.8.5', 'A.8.8', 'A.8.22', 'A.8.20', 'A.8.24', 'A.8.21']
}

# Risk scoring thresholds
RISK_THRESHOLDS = {
    'critical': 8.0,
    'high': 6.0,
    'medium': 4.0,
    'low': 0.0
}

# Risk level determination criteria
RISK_LEVELS = {
    'Low': {'noncompliance_pct': (0, 20), 'kev_required': False, 'description': '<20% non-compliance, no KEV CVEs'},
    'Medium': {'noncompliance_pct': (20, 50), 'kev_required': False, 'description': '20-50% non-compliance or isolated KEV CVEs'},
    'High': {'noncompliance_pct': (50, 100), 'kev_required': True, 'description': '>50% non-compliance + KEV CVEs + scenario relevance'}
}

# Color scheme - UPDATED
COLORS = {
    'primary': '#A02B93',      # Primary color for charts
    'critical': '#DC2626',      # Red
    'high': '#EA580C',          # Orange  
    'medium': '#F59E0B',        # Amber
    'low': '#10B981',           # Green
    'compliant': '#059669',     # Dark green
    'background': '#FFFFFF',    # White background
}

# Known default credential vendors (from thesis research)
DEFAULT_CRED_VENDORS = ['hikvision', 'dahua', 'axis', 'mobotix', 'hanwha', 'geovision',
                        'nedap', 'paxton', 'genetec', 'salto', 'assa', 'ajax', 
                        'vanderbilt', 'honeywell', 'bosch']

# Cloud providers (for A.5.23)
CLOUD_PROVIDERS = ['aws', 'amazon', 'azure', 'microsoft', 'google', 'cloudflare', 
                   'digitalocean', 'linode', 'ovh']

# Management ports (for A.8.20)
MANAGEMENT_PORTS = [22, 23, 3389, 5900, 5800]  # SSH, Telnet, RDP, VNC

# CPSS specific ports
RTSP_PORTS = [554, 8554]  # VSS
ACCESS_CONTROL_PORTS = [80, 443, 502, 4840]  # EACS (HTTP, HTTPS, Modbus, OPC-UA)
IAS_PORTS = [80, 443, 8080, 9000]  # IHAS

print("Enhanced configuration loaded")
print(f"Controls defined: {len(ISO_CONTROLS)}")
print(f"CPSS types with scenario mapping: {list(SCENARIO_CONTROLS.keys())}")


Enhanced configuration loaded
Controls defined: 15
CPSS types with scenario mapping: ['EACS', 'VSS', 'IHAS']


## Assessment Functions

### Access Control Domain (A.5.15, A.8.2, A.8.3, A.8.5)

## Assessment Functions - Enhanced

All ISO 27002:2022 control assessment functions with detailed findings and risk indicators.

In [48]:

# ============================================================================
# ASSESSMENT FUNCTIONS - ACCURATE WITH VERIFIED FIELD NAMES
# ============================================================================
# Based on actual available fields in cpss_all_services_enhanced.csv:
# ✓ service.http.body, service.http.title, service.http.status_code
# ✓ service.banner, service.port, service.protocol
# ✓ service.cves, service.cves_count, is_kev
# ✓ service.tls.* (supported_versions, is_self_signed, is_valid, etc.)
# ✓ service.fingerprints.* (vendor, product, version, tags, etc.)
# Note: service.http.headers missing - using banner + body as fallback
# ============================================================================

def assess_A_5_15(row):
    """
    A.5.15: Access control
    Indicators: Default credentials, authentication bypass
    """
    findings = []
    
    vendor = str(row.get('service.fingerprints.vendor', '')).lower()
    product = str(row.get('service.fingerprints.service.product', '')).lower()
    port = row.get('service.port', 0)
    banner = str(row.get('service.banner', '')).lower()
    http_title = str(row.get('service.http.title', '')).lower()
    http_body = str(row.get('service.http.body', '')).lower()
    http_status = row.get('service.http.status_code', 0)
    
    # 1. Default credential vendors
    default_cred_vendors = ['hikvision', 'dahua', 'axis', 'mobotix', 'hanwha', 'geovision',
                           'nedap', 'paxton', 'genetec', 'salto', 'assa', 'ajax',
                           'vanderbilt', 'honeywell', 'bosch']
    
    if any(v in vendor or v in product for v in default_cred_vendors):
        findings.append(f"Vendor/product ({vendor or product}) known for default credentials (CWE-798)")
    
    # 2. Authentication bypass - CPSS interfaces without 401/403
    cpss_indicators = ['dvr', 'camera', 'login', 'nvr', 'surveillance', 'video', 'recorder']
    if any(ind in http_title for ind in cpss_indicators):
        if http_status == 200 and '401' not in banner and '403' not in banner:
            findings.append(f"CPSS interface accessible without authentication: '{http_title[:40]}' (CWE-306)")
    
    # 3. Telnet
    if port == 23:
        findings.append("Telnet protocol - no secure access control (CWE-287)")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.5.15_noncompliant': noncompliant, 'A.5.15_details': details})


def assess_A_8_2(row):
    """
    A.8.2: Privileged access rights
    Indicators: Admin interfaces without MFA, exposed management ports
    """
    findings = []
    
    port = row.get('service.port', 0)
    http_title = str(row.get('service.http.title', '')).lower()
    http_body = str(row.get('service.http.body', '')).lower()
    banner = str(row.get('service.banner', '')).lower()
    
    # 1. Admin interface detection
    admin_indicators = ['admin', 'configuration', 'settings', 'management', 'control panel', 
                       'administrator', 'setup', 'config']
    is_admin = any(term in http_title for term in admin_indicators)
    
    if is_admin:
        # Check for MFA in body/banner
        mfa_indicators = ['otp', 'totp', '2fa', 'mfa', 'two-factor', 'multi-factor',
                         'authenticator', 'duo', 'authy']
        has_mfa = any(ind in http_body or ind in banner for ind in mfa_indicators)
        
        if not has_mfa:
            findings.append(f"Admin interface without MFA: '{http_title[:50]}' (CWE-308)")
    
    # 2. Management ports
    mgmt_ports = {22: 'SSH', 23: 'Telnet', 3389: 'RDP', 5900: 'VNC', 8080: 'HTTP-Alt'}
    if port in mgmt_ports:
        findings.append(f"{mgmt_ports[port]} management protocol on port {port} publicly accessible")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.2_noncompliant': noncompliant, 'A.8.2_details': details})


def assess_A_8_3(row):
    """
    A.8.3: Information access restriction
    Indicators: Directory exposure, information disclosure
    """
    findings = []
    
    banner = str(row.get('service.banner', '')).lower()
    http_body = str(row.get('service.http.body', '')).lower()
    http_title = str(row.get('service.http.title', '')).lower()
    
    # 1. Exposed directories/paths
    exposed_paths = ['/cgi-bin/', '/admin/', '/config/', '/api/', '/setup/', '/system/', 
                    '/backup/', '/upload/', '/logs/']
    found_paths = [path for path in exposed_paths if path in http_body]
    if found_paths:
        findings.append(f"Exposed paths: {', '.join(found_paths[:5])} (CWE-548)")
    
    # 2. Directory listing
    if 'index of' in http_title or 'index of' in http_body or 'parent directory' in http_body:
        findings.append("Directory listing enabled (CWE-548)")
    
    # 3. Server/version disclosure
    disclosure_patterns = ['server:', 'x-powered-by:', 'apache/', 'nginx/', 'iis/']
    for pattern in disclosure_patterns:
        if pattern in banner:
            findings.append(f"Software disclosed in banner (CWE-200)")
            break
    
    # 4. Sensitive files
    sensitive_files = ['.bak', '.backup', 'config.', '.conf', '.env', 'web.config']
    found_files = [f for f in sensitive_files if f in http_body]
    if found_files:
        findings.append(f"Sensitive file references: {', '.join(found_files[:3])} (CWE-552)")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.3_noncompliant': noncompliant, 'A.8.3_details': details})


def assess_A_8_5(row):
    """
    A.8.5: Secure authentication
    Indicators: Weak auth mechanisms, default credentials
    """
    findings = []
    
    vendor = str(row.get('service.fingerprints.vendor', '')).lower()
    product = str(row.get('service.fingerprints.service.product', '')).lower()
    port = row.get('service.port', 0)
    banner = str(row.get('service.banner', '')).lower()
    http_body = str(row.get('service.http.body', '')).lower()
    
    # 1. Default credentials
    default_cred_vendors = ['hikvision', 'dahua', 'axis', 'mobotix', 'hanwha', 'geovision',
                           'nedap', 'paxton', 'genetec', 'salto', 'assa', 'ajax',
                           'vanderbilt', 'honeywell', 'bosch']
    if any(v in vendor or v in product for v in default_cred_vendors):
        findings.append(f"Vendor ({vendor or product}) ships with default credentials (CWE-798)")
    
    # 2. Weak authentication mechanisms
    if 'www-authenticate: basic' in banner or 'basic realm' in http_body.lower():
        findings.append("HTTP Basic Authentication (cleartext credentials) (CWE-522)")
    if 'www-authenticate: digest' in banner or 'digest realm' in http_body.lower():
        findings.append("HTTP Digest Authentication (weak hashing) (CWE-522)")
    
    # 3. Cleartext protocols
    if port == 23:
        findings.append("Telnet uses cleartext authentication (CWE-319)")
    if port == 21:
        findings.append("FTP uses cleartext authentication (CWE-319)")
    
    # 4. Weak password policy
    import re
    policy_match = re.search(r'password.*?(\d+).*?character', banner, re.IGNORECASE)
    if policy_match:
        min_length = int(policy_match.group(1))
        if min_length < 12:
            findings.append(f"Weak password policy: minimum {min_length} characters (CWE-521)")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.5_noncompliant': noncompliant, 'A.8.5_details': details})


def assess_A_8_9(row):
    """
    A.8.9: Configuration management
    Indicators: Insecure defaults, debug features
    """
    findings = []
    
    banner = str(row.get('service.banner', '')).lower()
    http_title = str(row.get('service.http.title', '')).lower()
    http_body = str(row.get('service.http.body', '')).lower()
    
    # 1. Default configuration indicators
    default_indicators = ['default', 'welcome', 'dvr login', 'network camera', 'web service']
    if any(ind in http_title for ind in default_indicators):
        findings.append(f"Default configuration detected: '{http_title[:40]}' (CWE-1188)")
    
    # 2. Debug features
    if any(term in banner or term in http_body for term in ['debug', 'trace', 'x-debug']):
        findings.append("Debug features enabled (CWE-489, CWE-215)")
    
    # 3. Unnecessary discovery services
    tags = str(row.get('service.fingerprints.tags', '')).lower()
    if any(svc in tags for svc in ['upnp', 'mdns', 'ssdp']):
        findings.append("Unnecessary discovery service enabled (CWE-453)")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.9_noncompliant': noncompliant, 'A.8.9_details': details})


def assess_A_8_20(row):
    """
    A.8.20: Network security
    Indicators: Insecure protocols, excessive exposure
    """
    findings = []
    
    port = row.get('service.port', 0)
    
    # 1. Insecure protocols
    insecure_ports = {23: 'Telnet', 21: 'FTP', 69: 'TFTP', 161: 'SNMP', 5900: 'VNC'}
    if port in insecure_ports:
        findings.append(f"{insecure_ports[port]} protocol is inherently insecure (CWE-1288)")
    
    # 2. Unencrypted HTTP
    if port in [80, 8080]:
        findings.append("Unencrypted HTTP on accessible interface (CWE-319)")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.20_noncompliant': noncompliant, 'A.8.20_details': details})


def assess_A_8_21(row):
    """
    A.8.21: Security of network services
    Indicators: Unsecured network services
    """
    findings = []
    
    port = row.get('service.port', 0)
    tags = str(row.get('service.fingerprints.tags', '')).lower()
    
    # 1. RTSP without encryption
    if port in [554, 8554]:
        findings.append("RTSP streaming service exposed without access control (CWE-284)")
    
    # 2. ONVIF exposure
    if 'onvif' in tags:
        findings.append("ONVIF device management exposed (CWE-284)")
    
    # 3. Security through obscurity
    if port > 10000:
        findings.append(f"Service on non-standard high port {port} (security through obscurity)")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.21_noncompliant': noncompliant, 'A.8.21_details': details})


def assess_A_8_22(row):
    """
    A.8.22: Segregation of networks
    Indicators: Poor network segmentation
    """
    findings = []
    
    port = row.get('service.port', 0)
    
    # Management services on external network suggest poor segregation
    mgmt_ports = [22, 23, 3389, 5900]
    if port in mgmt_ports:
        findings.append("Management service accessible from external network - suggests inadequate segregation (CWE-923)")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.22_noncompliant': noncompliant, 'A.8.22_details': details})


# REMOVED: def assess_A_8_23(row):
    """
    A.8.23: Web filtering
    Note: Cannot assess from external scans - marked compliant (conservative)
    """
    compliant = True
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.23_noncompliant': noncompliant, 'A.8.23_details': details})


def assess_A_8_24(row):
    """
    A.8.24: Use of cryptography
    Indicators: Cleartext protocols, weak TLS, certificate issues
    """
    findings = []
    
    port = row.get('service.port', 0)
    tls_versions = str(row.get('service.tls.supported_versions', '')).lower()
    is_self_signed = row.get('service.tls.is_self_signed', None)
    is_valid = row.get('service.tls.is_valid', None)
    is_trusted = row.get('service.tls.is_trusted', None)
    expires_at = row.get('service.tls.expires_at', '')
    
    # 1. Cleartext protocols
    cleartext_ports = {21: 'FTP', 23: 'Telnet', 80: 'HTTP', 8080: 'HTTP', 554: 'RTSP'}
    if port in cleartext_ports:
        findings.append(f"{cleartext_ports[port]} on port {port} - no encryption (CWE-319)")
    
    # 2. Weak TLS versions
    if tls_versions:
        if any(weak in tls_versions for weak in ['tlsv1.0', 'tlsv1.1', 'sslv2', 'sslv3']):
            findings.append(f"Weak TLS/SSL version: {tls_versions[:50]} (require TLS 1.2+) (CWE-326)")
    
    # 3. Certificate validation
    if is_valid is False:
        findings.append("Invalid TLS certificate (CWE-297)")
    if is_self_signed is True:
        findings.append("Self-signed certificate (CWE-295)")
    if is_trusted is False:
        findings.append("Certificate not from trusted CA (CWE-295)")
    
    # 4. Certificate expiration
    if expires_at:
        try:
            exp_date = pd.to_datetime(expires_at)
            if exp_date < pd.Timestamp.now():
                findings.append(f"Expired certificate (CWE-297)")
        except:
            pass
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.24_noncompliant': noncompliant, 'A.8.24_details': details})


def assess_A_8_8(row):
    """
    A.8.8: Management of technical vulnerabilities
    Indicators: Known CVEs, KEV presence, outdated software
    """
    findings = []
    
    # 1. ACTUAL CVEs
    cves = str(row.get('service.cves', ''))
    cve_count = row.get('service.cves_count', 0)
    if cve_count and cve_count > 0:
        cve_display = cves[:80] + '...' if len(cves) > 80 else cves
        findings.append(f"{cve_count} CVE(s) identified: {cve_display} (CWE-1104)")
    
    # 2. KEV STATUS - CRITICAL INDICATOR
    is_kev = row.get('is_kev', False)
    if is_kev:
        findings.append("⚠️ CRITICAL: Contains CVE(s) in CISA KEV catalog (actively exploited!) (CWE-1395)")
    
    # 3. Software version disclosure
    product = str(row.get('service.fingerprints.service.product', ''))
    version = str(row.get('service.fingerprints.service.version', ''))
    if product and version and str(product) != 'nan' and str(version) != 'nan':
        findings.append(f"Software version disclosed: {product} {version}")
    
    # 4. End-of-life OS
    os_product = str(row.get('service.fingerprints.os.product', '')).lower()
    os_version = str(row.get('service.fingerprints.os.version', '')).lower()
    eol_indicators = ['windows xp', 'windows vista', 'windows 2003', 'windows 2008 ', 'windows 7']
    if os_product:
        os_full = f"{os_product} {os_version}".lower()
        for eol in eol_indicators:
            if eol in os_full:
                findings.append(f"End-of-life OS: {os_product} {os_version} (CWE-1104)")
                break
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.8_noncompliant': noncompliant, 'A.8.8_details': details})


# REMOVED: def assess_A_8_16(row):
    """
    A.8.16: Monitoring activities
    Indicators: No logging/monitoring services
    """
    findings = []
    
    port = row.get('service.port', 0)
    banner = str(row.get('service.banner', '')).lower()
    tags = str(row.get('service.fingerprints.tags', '')).lower()
    
    # Check for monitoring services
    has_syslog = 'syslog' in tags or port == 514
    has_snmp = 'snmp' in tags or port == 161
    
    if not (has_syslog or has_snmp):
        findings.append("No evidence of logging/monitoring services (CWE-778)")
    
    # SNMP without v3
    if has_snmp and 'v3' not in banner:
        findings.append("SNMP monitoring without v3 authentication (CWE-287)")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.16_noncompliant': noncompliant, 'A.8.16_details': details})


def assess_A_5_23(row):
    """
    A.5.23: Information security for use of cloud services
    Indicators: Cloud hosting without proper controls
    """
    findings = []
    
    org = str(row.get('asn.org', '')).lower()
    cloud_providers = ['aws', 'amazon', 'azure', 'microsoft', 'google', 'cloudflare', 
                       'digitalocean', 'linode', 'ovh']
    
    is_cloud = any(provider in org for provider in cloud_providers)
    
    if is_cloud:
        findings.append(f"CPSS hosted on cloud provider ({org}) - cloud security controls required")
        
        # Cloud services should use HTTPS
        port = row.get('service.port', 0)
        if port in [80, 8080]:
            findings.append("Cloud-hosted CPSS without HTTPS (CWE-319)")
    
    noncompliant = is_cloud and len(findings) > 1
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.5.23_noncompliant': noncompliant, 'A.5.23_details': details})


def assess_A_5_19(row):
    """
    A.5.19: Information security in supplier relationships
    Indicators: Third-party services, unclear supply chain
    """
    findings = []
    
    vendor = str(row.get('service.fingerprints.vendor', '')).lower()
    org = str(row.get('asn.org', '')).lower()
    
    # Third-party hosting
    if org and vendor and org != vendor:
        findings.append(f"Third-party hosting detected (vendor: {vendor}, host: {org})")
    
    # Unknown/problematic vendors
    if not vendor or vendor in ['unknown', 'generic', ''] or str(vendor) == 'nan':
        findings.append("Vendor unclear - supply chain security cannot be verified (CWE-1395)")
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.5.19_noncompliant': noncompliant, 'A.5.19_details': details})


def assess_A_8_26(row):
    """
    A.8.26: Application security requirements
    Indicators: Application-level vulnerabilities
    """
    findings = []
    
    http_body = str(row.get('service.http.body', '')).lower()
    banner = str(row.get('service.banner', '')).lower()
    
    # Security headers check (in banner or body)
    security_headers = ['csp', 'x-frame', 'hsts', 'content-security-policy']
    has_security_headers = any(header in banner or header in http_body for header in security_headers)
    
    if not has_security_headers and http_body:  # Only check if there's HTTP content
        findings.append("Web application missing security headers (CWE-693)")
    
    # Framework disclosure
    frameworks = ['php', 'asp', 'jsp', 'node', 'django', 'rails']
    for fw in frameworks:
        if fw in banner or fw in http_body:
            findings.append(f"Application framework disclosed: {fw} (CWE-200)")
            break
    
    noncompliant = len(findings) > 0
    details = "" if not noncompliant else '; '.join(findings)
    return pd.Series({'A.8.26_noncompliant': noncompliant, 'A.8.26_details': details})


print("All 15 accurate assessment functions loaded with real field names")



# ============================================================================
# MAIN ASSESSMENT ORCHESTRATION FUNCTION
# ============================================================================

def run_assessment(df):
    print("\n" + "="*70)
    print("RUNNING ISO 27002:2022 ASSESSMENT")
    print("="*70)
    print(f"Total devices: {len(df)}")
    
    # Initialize results dataframe
    results = df.copy()
    
    # Run all 15 assessment functions
    assessment_functions = [
        ('A.5.15', assess_A_5_15, 'Access control'),
        ('A.8.2', assess_A_8_2, 'Privileged access rights'),
        ('A.8.3', assess_A_8_3, 'Information access restriction'),
        ('A.8.5', assess_A_8_5, 'Secure authentication'),
        ('A.8.9', assess_A_8_9, 'Configuration management'),
        ('A.8.20', assess_A_8_20, 'Networks security'),
        ('A.8.21', assess_A_8_21, 'Security of network services'),
        ('A.8.22', assess_A_8_22, 'Segregation of networks'),
        # REMOVED: ('A.8.23', assess_A_8_23, 'Web filtering'),
        ('A.8.24', assess_A_8_24, 'Use of cryptography'),
        ('A.8.8', assess_A_8_8, 'Management of technical vulnerabilities'),
        # REMOVED: ('A.8.16', assess_A_8_16, 'Monitoring activities'),
        ('A.5.23', assess_A_5_23, 'Cloud services security'),
        ('A.5.19', assess_A_5_19, 'Supplier relationships'),
        ('A.8.26', assess_A_8_26, 'Application security'),
    ]
    
    print(f"\nRunning {len(assessment_functions)} control assessments\t")
    
    for control_id, func, description in assessment_functions:
        print(f"  {control_id}: {description}\t\t", end=" ")
        try:
            # Run assessment
            assessment_result = results.apply(func, axis=1)
            
            # Merge results
            results = pd.concat([results, assessment_result], axis=1)
            
            # Count compliance
            noncompliant_count = results[f'{control_id}_noncompliant'].sum()
            total = len(results)
            noncompliance_pct = (noncompliant_count / total * 100) if total > 0 else 0
            
            print(f" ({noncompliant_count}/{total} noncompliant - {noncompliance_pct:.1f}%)")
        except Exception as e:
            print(f"✗ ERROR: {e}")
    
    # Calculate compliance metrics
    print("\nCalculating compliance metrics")
    
    # Get all compliance columns
    noncompliance_cols = [col for col in results.columns if col.endswith('_noncompliant')]
    
    results['total_noncompliant'] = results[noncompliance_cols].sum(axis=1)
    results['total_controls'] = len(noncompliance_cols)
    results['noncompliance_rate'] = (results['total_noncompliant'] / results['total_controls'] * 100).round(2)
    results['noncompliance_pct'] = results['noncompliance_rate']  # Alias for compatibility
    
    print(f"   Compliance metrics calculated")
    
    # Calculate risk indicators
    print("\nCalculating risk indicators...")
    
    # 1. KEV presence (likelihood indicator)
    results['kev_present'] = results.get('is_kev', False)
    kev_count = results['kev_present'].sum()
    print(f"   KEV indicator: {kev_count} devices with Known Exploited Vulnerabilities")
    
    # 2. Scenario impact indicator (per CPSS type)
    SCENARIO_CONTROLS = {
        'EACS': ['A.5.15', 'A.8.2', 'A.8.5', 'A.8.9', 'A.8.22', 'A.8.20'],
        'VSS': ['A.8.24', 'A.8.21', 'A.8.8', 'A.8.5', 'A.8.9'],
        'IHAS': ['A.5.15', 'A.8.5', 'A.8.8', 'A.8.22', 'A.8.20', 'A.8.24', 'A.8.21']
    }
    
    def calculate_scenario_impact(row):
        """Check if device is non-compliant with scenario-relevant controls"""
        cpss_type = row.get('cpss_primary_category', 'Unknown')
        
        if cpss_type not in SCENARIO_CONTROLS:
            return False
        
        relevant_controls = SCENARIO_CONTROLS[cpss_type]
        
        # Check if ANY relevant control is non-compliant
        for control in relevant_controls:
            if f'{control}_noncompliant' in row.index:
                if row[f'{control}_noncompliant']:
                    return True  # Non-compliant with scenario control
        
        return False
    
    results['scenario_impact'] = results.apply(calculate_scenario_impact, axis=1)
    scenario_count = results['scenario_impact'].sum()
    print(f"   Scenario impact: {scenario_count} devices non-compliant with risk scenario controls")
    
    # 3. Calculate overall risk score (0-10 scale)
    print("\nCalculating risk scores...")
    
    def calculate_risk_score(row):
        """
        Calculate risk score using three-component model:
        - Base: Noncompliance percentage (0-6 points)
        - Likelihood: KEV presence (+2 points)
        - Impact: Scenario control non-compliance (+2 points)
        """
        # Base score from noncompliance (60% of total)
        base_score = (row['noncompliance_pct'] / 100) * 6
        
        # KEV bonus (likelihood)
        kev_bonus = 2 if row.get('kev_present', False) else 0
        
        # Scenario bonus (impact)
        scenario_bonus = 2 if row.get('scenario_impact', False) else 0
        
        # Total score (capped at 10)
        total_score = min(10, base_score + kev_bonus + scenario_bonus)
        
        return round(total_score, 2)
    
    results['overall_risk_score'] = results.apply(calculate_risk_score, axis=1)
    
    # 4. Determine risk level per record
    def determine_risk_level(row):
        """Determine risk level based on multiple factors"""
        noncompliance = row['noncompliance_pct']
        has_kev = row.get('kev_present', False)
        has_scenario = row.get('scenario_impact', False)
        
        # High risk: >50% non-compliant + KEV + scenario impact
        if noncompliance > 50 and has_kev and has_scenario:
            return 'High'
        
        # Medium risk: 20-50% non-compliant OR isolated KEV/scenario
        if noncompliance >= 20 or has_kev or has_scenario:
            return 'Medium'
        
        # Low risk: <20% non-compliant, no KEV
        return 'Low'
    
    results['risk_level_record'] = results.apply(determine_risk_level, axis=1)
    
    # Summary statistics
    risk_counts = results['risk_level_record'].value_counts()
    print(f"   Risk levels assigned:")
    for level in ['High', 'Medium', 'Low']:
        count = risk_counts.get(level, 0)
        pct = (count / len(results) * 100) if len(results) > 0 else 0
        print(f"    {level}: {count} devices ({pct:.1f}%)")
    
    # print("\n" + "="*70)
    # print("ASSESSMENT COMPLETE")
    # print("="*70)
    # print(f"\nResults summary:")
    # print(f"  Total devices assessed: {len(results)}")
    # print(f"  Average noncompliance rate: {results['noncompliance_rate'].mean():.1f}%")
    # print(f"  Devices with KEV CVEs: {results['kev_present'].sum()}")
    # print(f"  Average risk score: {results['overall_risk_score'].mean():.2f}/10")
    
    return results

print("run_assessment function loaded")


All 15 accurate assessment functions loaded with real field names
run_assessment function loaded


## Analysis & Reporting - Enhanced

Comprehensive visualization and reporting with separate PNGs per CPSS type.

In [49]:

# ============================================================================
# ANALYSIS & REPORTING FUNCTIONS - ENHANCED
# ============================================================================

def generate_summary_statistics(results_df):
    """Generate comprehensive summary statistics by CPSS type"""

    print("\n" + "="*70)
    print("SUMMARY STATISTICS BY CPSS TYPE")
    print("="*70)

    # Overall statistics
    print(f"\nOVERALL:")
    print(f"  Total devices assessed: {len(results_df):,}")
    print(f"  Average noncompliance: {results_df['noncompliance_pct'].mean():.1f}%")
    print(f"  Devices with KEV CVEs: {results_df['kev_present'].sum():,}")
    print(f"  Devices with scenario impact: {results_df['scenario_impact'].sum():,}")
    print(f"  Average risk score: {results_df['overall_risk_score'].mean():.2f}/10")

    # Risk level distribution
    print(f"\n  Risk Level Distribution:")
    for level in ['Low', 'Medium', 'High']:
        count = (results_df['risk_level_record'] == level).sum()
        pct = count / len(results_df) * 100
        print(f"    {level:8s}: {count:4,} devices ({pct:5.1f}%)")

    # By CPSS category
    if 'cpss_primary_category' in results_df.columns:
        print("\nBY CPSS TYPE:")
        cpss_types = ['EACS', 'VSS', 'IHAS', 'Unknown']
        
        for cpss_type in cpss_types:
            cat_df = results_df[results_df['cpss_primary_category'] == cpss_type]
            if len(cat_df) == 0:
                continue
                
            print(f"\n  {cpss_type}:")
            print(f"    Devices: {len(cat_df):,}")
            print(f"    Avg noncompliance: {cat_df['noncompliance_pct'].mean():.1f}%")
            print(f"    Avg risk score: {cat_df['overall_risk_score'].mean():.2f}/10")
            print(f"    KEV present: {cat_df['kev_present'].sum():,} devices")
            print(f"    Scenario impact: {cat_df['scenario_impact'].sum():,} devices")
            
            # Risk levels for this type
            print(f"    Risk levels:")
            for level in ['Low', 'Medium', 'High']:
                count = (cat_df['risk_level_record'] == level).sum()
                pct = count / len(cat_df) * 100 if len(cat_df) > 0 else 0
                print(f"      {level:8s}: {count:3,} ({pct:5.1f}%)")
            
            # Calculate aggregate risk level for CPSS type
            avg_noncomp = cat_df['noncompliance_pct'].mean()
            has_kev = cat_df['kev_present'].any()
            has_scenario_impact = cat_df['scenario_impact'].any()
            
            if avg_noncomp > 50 and has_kev and has_scenario_impact:
                cpss_risk = 'High'
            elif avg_noncomp >= 20:
                cpss_risk = 'Medium'
            else:
                cpss_risk = 'Low'
            
            print(f"    AGGREGATE RISK LEVEL: {cpss_risk}")
    
    # By domain
    print("\nBY RESILIENCE DOMAIN:")
    domains = sorted(set(ISO_CONTROLS[c]['domain'] for c in ISO_CONTROLS.keys()))
    
    for domain in domains:
        domain_controls = [c for c in ISO_CONTROLS.keys() if ISO_CONTROLS[c]['domain'] == domain]
        
        # Calculate noncompliance for domain
        domain_noncomp_counts = []
        for control in domain_controls:
            col = f'{control}_noncompliant'
            if col in results_df.columns:
                noncomp_count = (~results_df[col]).sum()
                domain_noncomp_counts.append(noncomp_count)
        
        if domain_noncomp_counts:
            total_assessments = len(results_df) * len(domain_controls)
            total_noncompliant = sum(domain_noncomp_counts)
            noncomp_pct = (total_noncompliant / total_assessments) * 100
            
            print(f"  {domain:20s}: {total_noncompliant:4,}/{total_assessments:4,} noncompliant ({noncomp_pct:5.1f}%)")

def generate_domain_table(results_df):
    """Generate Table 5.4-1: Non-compliance rates per ISO 27002 domain"""

    print("\n" + "="*70)
    print("TABLE 5.4-1: NON-COMPLIANCE RATES PER ISO 27002 DOMAIN")
    print("="*70)

    # Domain mapping
    domain_mapping = {
        'A.5.15': 'Access Control',
        'A.8.2': 'Access Control',
        'A.8.3': 'Access Control',
        'A.8.5': 'Access Control',
        'A.8.9': 'Configuration',
        'A.8.20': 'Network',
        'A.8.21': 'Network',
        'A.8.22': 'Network',
        'A.8.24': 'Cryptography',
        'A.8.8': 'Patch & Lifecycle',
        'A.5.23': 'Cloud',
        'A.5.19': 'Supply Chain',
        'A.8.26': 'Application'
    }

    domains = sorted(set(domain_mapping.values()))
    cpss_types = ['EACS', 'VSS', 'IHAS']
    table_data = []

    for domain in domains:
        row = {'Domain': domain}
        domain_controls = [ctrl for ctrl, dom in domain_mapping.items() if dom == domain]

        for cpss_type in cpss_types:
            cpss_df = results_df[results_df['cpss_primary_category'] == cpss_type]

            if len(cpss_df) == 0:
                row[cpss_type] = 'N/A'
                continue

            domain_noncomp_counts = []
            for control in domain_controls:
                col = f'{control}_noncompliant'
                if col in cpss_df.columns:
                    noncomp_count = cpss_df[col].sum()
                    domain_noncomp_counts.append(noncomp_count)

            if domain_noncomp_counts:
                total_noncomp = sum(domain_noncomp_counts)
                total_possible = len(cpss_df) * len(domain_controls)
                pct = (total_noncomp / total_possible * 100) if total_possible > 0 else 0
                row[cpss_type] = f"{pct:.1f}%"
            else:
                row[cpss_type] = 'N/A'

        table_data.append(row)

    table_df = pd.DataFrame(table_data)
    print("\n" + table_df.to_string(index=False))

    output_file = OUTPUT_DIR / 'table_5_4_1_domain_noncompliance.csv'
    table_df.to_csv(output_file, index=False)
    print(f"\n✓ Saved table to: {output_file.name}")

    md_output = OUTPUT_DIR / 'table_5_4_1_domain_noncompliance.md'
    with open(md_output, 'w') as f:
        f.write("# Table 5.4-1: Non-compliance Rates per ISO 27002 Domain\n\n")
        f.write("*(% of records with noncompliance indicator in domain)*\n\n")
        f.write("| Domain | EACS | VSS | IHAS |\n")
        f.write("|--------|------|-----|------|\n")
        for _, row in table_df.iterrows():
            f.write(f"| {row['Domain']} | {row['EACS']} | {row['VSS']} | {row['IHAS']} |\n")
        f.write("\n")
    print(f"✓ Saved markdown table to: {md_output.name}")

    return table_df

# def generate_domain_table(results_df):
#     """Generate Table 5.4-1: Non-compliance rates per ISO 27002 domain"""
#
#     print("\n" + "="*70)
#     print("TABLE 5.4-1: NON-COMPLIANCE RATES PER ISO 27002 DOMAIN")
#     print("="*70)
#
#     # Domain mapping
#     domain_mapping = {
#         'A.5.15': 'Access Control',
#         'A.8.2': 'Access Control',
#         'A.8.3': 'Access Control',
#         'A.8.5': 'Access Control',
#         'A.8.9': 'Configuration',
#         'A.8.20': 'Network',
#         'A.8.21': 'Network',
#         'A.8.22': 'Network',
#         'A.8.24': 'Cryptography',
#         'A.8.8': 'Patch & Lifecycle',
#         'A.5.23': 'Cloud',
#         'A.5.19': 'Supply Chain',
#         'A.8.26': 'Application'
#     }
#
#     # Get unique domains
#     domains = sorted(set(domain_mapping.values()))
#
#     # CPSS types
#     cpss_types = ['EACS', 'VSS', 'IHAS']
#
#     # Calculate noncompliance per domain per CPSS type
#     table_data = []
#
#     for domain in domains:
#         row = {'Domain': domain}
#
#         # Get controls in this domain
#         domain_controls = [ctrl for ctrl, dom in domain_mapping.items() if dom == domain]
#
#         for cpss_type in cpss_types:
#             # Filter data
#             cpss_df = results_df[results_df['cpss_primary_category'] == cpss_type]
#
#             if len(cpss_df) == 0:
#                 row[cpss_type] = 'N/A'
#                 continue
#
#             # Calculate noncompliance for this domain
#             domain_noncomp_counts = []
#
#             for control in domain_controls:
#                 col = f'{control}_noncompliant'
#                 if col in cpss_df.columns:
#                     noncomp_count = cpss_df[col].sum()
#                     domain_noncomp_counts.append(noncomp_count)
#
#             # Calculate percentage
#             if domain_noncomp_counts:
#                 total_noncomp = sum(domain_noncomp_counts)
#                 total_possible = len(cpss_df) * len(domain_controls)
#                 pct = (total_noncomp / total_possible * 100) if total_possible > 0 else 0
#                 row[cpss_type] = f"{pct:.1f}%"
#             else:
#                 row[cpss_type] = 'N/A'
#
#         table_data.append(row)
#
#     # Create DataFrame
#     table_df = pd.DataFrame(table_data)
#
#     # Print table
#     print("\n" + table_df.to_string(index=False))
#
#     # Save to CSV
#     output_file = OUTPUT_DIR / 'table_5_4_1_domain_noncompliance.csv'
#     table_df.to_csv(output_file, index=False)
#     print(f"\n✓ Saved table to: {output_file.name}")
#
#     # Save markdown version
#     md_output = OUTPUT_DIR / 'table_5_4_1_domain_noncompliance.md'
#     with open(md_output, 'w') as f:
#         f.write("# Table 5.4-1: Non-compliance Rates per ISO 27002 Domain\n\n")
#         f.write("*(% of records with noncompliance indicator in domain)*\n\n")
#         f.write(table_df.to_markdown(index=False))
#         f.write("\n")
#     print(f"✓ Saved markdown table to: {md_output.name}")
#
#     return table_df

def create_visualizations(results_df):
    """Create comprehensive visualizations as separate PNG files"""

    print("\n" + "="*70)
    print("CREATING VISUALIZATIONS")
    print("="*70)
    
    # Set style
    plt.style.use('default')
    sns.set_palette([COLORS['primary']])
    
    cpss_types = ['EACS', 'VSS', 'IHAS', 'all']
    
    # ========================================================================
    # VISUALIZATION 1: Noncompliance by Control (per CPSS type)
    # ========================================================================
    
    for cpss_type in cpss_types:
        if cpss_type == 'all':
            plot_df = results_df.copy()
            title_suffix = "All CPSS Types"
        else:
            plot_df = results_df[results_df['cpss_primary_category'] == cpss_type]
            title_suffix = cpss_type
        
        if len(plot_df) == 0:
            continue
        
        fig, ax = plt.subplots(figsize=(12, 8), facecolor=COLORS['background'])
        ax.set_facecolor(COLORS['background'])
        
        # Calculate noncompliance per control
        controls = []
        noncomp_counts = []
        noncomp_pcts = []
        
        for control in sorted(ISO_CONTROLS.keys()):
            col = f'{control}_noncompliant'
            if col in plot_df.columns:
                noncomp_count = plot_df[col].sum()
                noncomp_pct = (noncomp_count / len(plot_df)) * 100
                controls.append(f"{control}\n{ISO_CONTROLS[control]['name']}")
                noncomp_counts.append(noncomp_count)
                noncomp_pcts.append(noncomp_pct)
        
        # Create bars
        y_pos = np.arange(len(controls))
        bars = ax.barh(y_pos, noncomp_pcts, color=COLORS['primary'], alpha=0.8)
        
        # Add count labels
        for i, (count, pct) in enumerate(zip(noncomp_counts, noncomp_pcts)):
            ax.text(pct + 2, i, f'{count} ({pct:.1f}%)', 
                   va='center', fontsize=9, fontweight='bold')
        
        ax.set_yticks(y_pos)
        ax.set_yticklabels(controls, fontsize=8)
        ax.set_xlabel('Noncompliance (%)', fontsize=11, fontweight='bold')
        ax.set_title(f'ISO 27002:2022 Control Noncompliance - {title_suffix}\n(n={len(plot_df)} devices)',
                    fontsize=13, fontweight='bold', pad=15)
        ax.set_xlim([0, max(noncomp_pcts) * 1.15 if noncomp_pcts else 100])
        ax.grid(axis='x', alpha=0.3, linestyle='--')
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        
        plt.tight_layout()
        output_file = OUTPUT_DIR / f'cpss_iso_{cpss_type.lower()}_noncompliance_by_control.png'
        plt.savefig(output_file, dpi=300, bbox_inches='tight', facecolor=COLORS['background'])
        print(f"   Saved: {output_file.name}")
        plt.close()
    
    # ========================================================================
    # VISUALIZATION 2: Noncompliance by Domain (per CPSS type)
    # ========================================================================
    
    for cpss_type in cpss_types:
        if cpss_type == 'all':
            plot_df = results_df.copy()
            title_suffix = "All CPSS Types"
        else:
            plot_df = results_df[results_df['cpss_primary_category'] == cpss_type]
            title_suffix = cpss_type
        
        if len(plot_df) == 0:
            continue
        
        fig, ax = plt.subplots(figsize=(10, 7), facecolor=COLORS['background'])
        ax.set_facecolor(COLORS['background'])
        
        domains = sorted(set(ISO_CONTROLS[c]['domain'] for c in ISO_CONTROLS.keys()))
        domain_noncomp = []
        domain_labels = []
        
        for domain in domains:
            domain_controls = [c for c in ISO_CONTROLS.keys() if ISO_CONTROLS[c]['domain'] == domain]
            total_noncompliant = 0
            total_assessments = 0
            
            for control in domain_controls:
                col = f'{control}_noncompliant'
                if col in plot_df.columns:
                    total_noncompliant += (~plot_df[col]).sum()
                    total_assessments += len(plot_df)
            
            if total_assessments > 0:
                noncomp_pct = (total_noncompliant / total_assessments) * 100
                domain_noncomp.append(noncomp_pct)
                domain_labels.append(f"{domain}\n({len(domain_controls)} controls)")
        
        y_pos = np.arange(len(domain_labels))
        bars = ax.barh(y_pos, domain_noncomp, color=COLORS['primary'], alpha=0.8)
        
        # Add percentage labels
        for i, pct in enumerate(domain_noncomp):
            ax.text(pct + 2, i, f'{pct:.1f}%', va='center', fontsize=10, fontweight='bold')
        
        ax.set_yticks(y_pos)
        ax.set_yticklabels(domain_labels, fontsize=9)
        ax.set_xlabel('Noncompliance (%)', fontsize=11, fontweight='bold')
        ax.set_title(f'Noncompliance by Resilience Domain - {title_suffix}\n(n={len(plot_df)} devices)',
                    fontsize=13, fontweight='bold', pad=15)
        ax.set_xlim([0, max(domain_noncomp) * 1.15 if domain_noncomp else 100])
        ax.grid(axis='x', alpha=0.3, linestyle='--')
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        
        plt.tight_layout()
        output_file = OUTPUT_DIR / f'cpss_iso_{cpss_type.lower()}_noncompliance_by_domain.png'
        plt.savefig(output_file, dpi=300, bbox_inches='tight', facecolor=COLORS['background'])
        print(f"   Saved: {output_file.name}")
        plt.close()
    
    # ========================================================================
    # VISUALIZATION 3: Risk Level Distribution (per CPSS type)
    # ========================================================================
    
    for cpss_type in cpss_types:
        if cpss_type == 'all':
            plot_df = results_df.copy()
            title_suffix = "All CPSS Types"
        else:
            plot_df = results_df[results_df['cpss_primary_category'] == cpss_type]
            title_suffix = cpss_type
        
        if len(plot_df) == 0:
            continue
        
        fig, ax = plt.subplots(figsize=(8, 6), facecolor=COLORS['background'])
        ax.set_facecolor(COLORS['background'])
        
        risk_counts = plot_df['risk_level_record'].value_counts()
        risk_order = ['Low', 'Medium', 'High']
        risk_colors = [COLORS['low'], COLORS['medium'], COLORS['high']]
        
        counts = [risk_counts.get(level, 0) for level in risk_order]
        colors = [risk_colors[i] for i in range(len(risk_order))]
        
        bars = ax.bar(risk_order, counts, color=colors, alpha=0.8, edgecolor='black', linewidth=1.5)
        
        # Add count labels
        for bar, count in zip(bars, counts):
            height = bar.get_height()
            pct = (count / len(plot_df)) * 100 if len(plot_df) > 0 else 0
            ax.text(bar.get_x() + bar.get_width()/2., height + 0.5,
                   f'{count}\n({pct:.1f}%)',
                   ha='center', va='bottom', fontsize=11, fontweight='bold')
        
        ax.set_ylabel('Number of Devices', fontsize=11, fontweight='bold')
        ax.set_title(f'Risk Level Distribution - {title_suffix}\n(n={len(plot_df)} devices)',
                    fontsize=13, fontweight='bold', pad=15)
        ax.set_ylim([0, max(counts) * 1.2 if counts and max(counts) > 0 else 10])
        ax.grid(axis='y', alpha=0.3, linestyle='--')
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        
        plt.tight_layout()
        output_file = OUTPUT_DIR / f'cpss_iso_{cpss_type.lower()}_risk_distribution.png'
        plt.savefig(output_file, dpi=300, bbox_inches='tight', facecolor=COLORS['background'])
        print(f"   Saved: {output_file.name}")
        plt.close()
    
    # ========================================================================
    # VISUALIZATION 4: KEV and Scenario Impact (per CPSS type)
    # ========================================================================
    
    for cpss_type in cpss_types:
        if cpss_type == 'all':
            plot_df = results_df.copy()
            title_suffix = "All CPSS Types"
        else:
            plot_df = results_df[results_df['cpss_primary_category'] == cpss_type]
            title_suffix = cpss_type
        
        if len(plot_df) == 0:
            continue
        
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), facecolor=COLORS['background'])
        
        # KEV presence
        ax1.set_facecolor(COLORS['background'])
        kev_counts = plot_df['kev_present'].value_counts()
        kev_labels = ['No KEV', 'KEV Present']
        kev_values = [kev_counts.get(False, 0), kev_counts.get(True, 0)]
        kev_colors = [COLORS['low'], COLORS['high']]
        
        wedges, texts, autotexts = ax1.pie(kev_values, labels=kev_labels, autopct='%1.1f%%',
                                            colors=kev_colors, startangle=90,
                                            textprops={'fontsize': 11, 'fontweight': 'bold'})
        ax1.set_title(f'KEV CVE Presence\n(Likelihood Indicator)',
                     fontsize=12, fontweight='bold', pad=10)
        
        # Scenario impact
        ax2.set_facecolor(COLORS['background'])
        impact_counts = plot_df['scenario_impact'].value_counts()
        impact_labels = ['No Scenario Impact', 'Scenario Impact']
        impact_values = [impact_counts.get(False, 0), impact_counts.get(True, 0)]
        impact_colors = [COLORS['low'], COLORS['high']]
        
        wedges, texts, autotexts = ax2.pie(impact_values, labels=impact_labels, autopct='%1.1f%%',
                                            colors=impact_colors, startangle=90,
                                            textprops={'fontsize': 11, 'fontweight': 'bold'})
        ax2.set_title(f'Risk Scenario Relevance\n(Impact Indicator)',
                     fontsize=12, fontweight='bold', pad=10)
        
        fig.suptitle(f'Risk Indicators - {title_suffix} (n={len(plot_df)} devices)',
                    fontsize=14, fontweight='bold', y=0.98)
        
        plt.tight_layout(rect=[0, 0, 1, 0.96])
        output_file = OUTPUT_DIR / f'cpss_iso_{cpss_type.lower()}_risk_indicators.png'
        plt.savefig(output_file, dpi=300, bbox_inches='tight', facecolor=COLORS['background'])
        print(f"   Saved: {output_file.name}")
        plt.close()
    
    print("\n All visualizations created successfully")

print("Enhanced visualization functions loaded")



def generate_report(results_df):
    """Generate comprehensive text report with interpretations"""

    report_file = OUTPUT_DIR / 'cpss_iso27002_comprehensive_report.txt'
    
    with open(report_file, 'w', encoding='utf-8') as f:
        # Header
        f.write("="*80 + "\n")
        f.write("ISO/IEC 27002:2022 CPSS COMPLIANCE ASSESSMENT\n")
        f.write("COMPREHENSIVE ANALYSIS REPORT\n")
        f.write("="*80 + "\n\n")
        f.write(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
        f.write(f"Total Devices Assessed: {len(results_df):,}\n\n")
        
        # Executive Summary
        f.write("="*80 + "\n")
        f.write("EXECUTIVE SUMMARY\n")
        f.write("="*80 + "\n\n")
        
        avg_compliance = results_df['noncompliance_rate'].mean()
        avg_noncomp = results_df['noncompliance_pct'].mean()
        avg_risk = results_df['overall_risk_score'].mean()
        kev_count = results_df['kev_present'].sum()
        scenario_count = results_df['scenario_impact'].sum()
        
        f.write(f"Average Compliance Rate: {avg_compliance:.1f}%\n")
        f.write(f"Average Noncompliance: {avg_noncomp:.1f}%\n")
        f.write(f"Average Risk Score: {avg_risk:.2f}/10\n")
        f.write(f"Devices with KEV CVEs: {kev_count:,} ({kev_count/len(results_df)*100:.1f}%)\n")
        f.write(f"Devices with Scenario Impact: {scenario_count:,} ({scenario_count/len(results_df)*100:.1f}%)\n\n")
        
        # Risk level distribution
        f.write("Risk Level Distribution:\n")
        for level in ['Low', 'Medium', 'High']:
            count = (results_df['risk_level_record'] == level).sum()
            pct = count / len(results_df) * 100
            f.write(f"  {level:8s}: {count:4,} devices ({pct:5.1f}%)\n")
        f.write("\n")
        
        # CPSS Type Analysis
        f.write("="*80 + "\n")
        f.write("ANALYSIS BY CPSS TYPE\n")
        f.write("="*80 + "\n\n")
        
        cpss_types = ['EACS', 'VSS', 'IHAS']
        
        for cpss_type in cpss_types:
            cat_df = results_df[results_df['cpss_primary_category'] == cpss_type]
            if len(cat_df) == 0:
                f.write(f"{cpss_type}:\n  No devices of this type in dataset\n\n")
                continue
            
            f.write(f"{cpss_type} ({len(cat_df):,} devices):\n")
            f.write("-" * 70 + "\n")
            
            cat_avg_comp = cat_df['noncompliance_rate'].mean()
            cat_avg_noncomp = cat_df['noncompliance_pct'].mean()
            cat_avg_risk = cat_df['overall_risk_score'].mean()
            cat_kev = cat_df['kev_present'].sum()
            cat_scenario = cat_df['scenario_impact'].sum()
            
            f.write(f"  Average Compliance: {cat_avg_comp:.1f}%\n")
            f.write(f"  Average Noncompliance: {cat_avg_noncomp:.1f}%\n")
            f.write(f"  Average Risk Score: {cat_avg_risk:.2f}/10\n")
            f.write(f"  KEV CVEs Present: {cat_kev:,} devices ({cat_kev/len(cat_df)*100:.1f}%)\n")
            f.write(f"  Scenario Impact: {cat_scenario:,} devices ({cat_scenario/len(cat_df)*100:.1f}%)\n\n")
            
            # Risk levels
            f.write("  Risk Level Distribution:\n")
            for level in ['Low', 'Medium', 'High']:
                count = (cat_df['risk_level_record'] == level).sum()
                pct = count / len(cat_df) * 100 if len(cat_df) > 0 else 0
                f.write(f"    {level:8s}: {count:3,} ({pct:5.1f}%)\n")
            
            # Aggregate risk level
            has_kev = cat_df['kev_present'].any()
            has_scenario_impact = cat_df['scenario_impact'].any()
            
            if cat_avg_noncomp > 50 and has_kev and has_scenario_impact:
                aggregate_risk = 'HIGH'
            elif cat_avg_noncomp >= 20:
                aggregate_risk = 'MEDIUM'
            else:
                aggregate_risk = 'LOW'
            
            f.write(f"\n  AGGREGATE RISK LEVEL: {aggregate_risk}\n")
            
            # Top noncompliant controls for this type
            f.write(f"\n  Top Noncompliant Controls:\n")
            control_noncomp = []
            for control in ISO_CONTROLS.keys():
                col = f'{control}_noncompliant'
                if col in cat_df.columns:
                    noncomp_count = (~cat_df[col]).sum()
                    noncomp_pct = (noncomp_count / len(cat_df)) * 100
                    control_noncomp.append((control, noncomp_count, noncomp_pct))
            
            control_noncomp.sort(key=lambda x: x[2], reverse=True)
            for control, count, pct in control_noncomp[:5]:
                f.write(f"    {control} - {ISO_CONTROLS[control]['name']:40s}: {count:3,} ({pct:5.1f}%)\n")
            
            f.write("\n\n")
        
        # Domain Analysis with Interpretations
        f.write("="*80 + "\n")
        f.write("ANALYSIS BY RESILIENCE DOMAIN\n")
        f.write("="*80 + "\n\n")
        
        domains = sorted(set(ISO_CONTROLS[c]['domain'] for c in ISO_CONTROLS.keys()))
        
        for domain in domains:
            f.write(f"{domain.upper()} DOMAIN\n")
            f.write("-" * 70 + "\n")
            
            domain_controls = [c for c in ISO_CONTROLS.keys() if ISO_CONTROLS[c]['domain'] == domain]
            
            # Calculate domain-level noncompliance
            total_noncompliant = 0
            total_assessments = 0
            
            for control in domain_controls:
                col = f'{control}_noncompliant'
                if col in results_df.columns:
                    total_noncompliant += (~results_df[col]).sum()
                    total_assessments += len(results_df)
            
            domain_noncomp_pct = (total_noncompliant / total_assessments) * 100 if total_assessments > 0 else 0
            
            f.write(f"Noncompliance Rate: {domain_noncomp_pct:.1f}% ")
            f.write(f"({total_noncompliant:,}/{total_assessments:,} assessments)\n\n")
            
            # Domain interpretation (get from first control in domain)
            first_control = domain_controls[0]
            if 'interpretation' in ISO_CONTROLS[first_control]:
                # Use domain-specific interpretation
                domain_interp = f"Domain Overview: {ISO_CONTROLS[first_control]['interpretation']}"
            else:
                domain_interp = f"This domain covers {len(domain_controls)} ISO 27002:2022 controls related to {domain.lower()}."
            
            f.write(f"INTERPRETATION:\n{domain_interp}\n\n")
            
            # Controls in this domain
            f.write("Controls and Findings:\n")
            for control in sorted(domain_controls):
                col = f'{control}_noncompliant'
                if col in results_df.columns:
                    noncomp_count = (~results_df[col]).sum()
                    noncomp_pct = (noncomp_count / len(results_df)) * 100
                    
                    f.write(f"\n  {control} - {ISO_CONTROLS[control]['name']}\n")
                    f.write(f"  Noncompliance: {noncomp_count:,}/{len(results_df):,} ({noncomp_pct:.1f}%)\n")
                    
                    # Control-specific interpretation
                    if 'interpretation' in ISO_CONTROLS[control]:
                        f.write(f"  Interpretation: {ISO_CONTROLS[control]['interpretation']}\n")
                    
                    # Sample findings (up to 3 examples)
                    details_col = f'{control}_details'
                    if details_col in results_df.columns:
                        noncomp_examples = results_df[~results_df[col]][details_col].head(3)
                        if len(noncomp_examples) > 0:
                            f.write(f"  Sample Findings:\n")
                            for idx, example in enumerate(noncomp_examples, 1):
                                f.write(f"    {idx}. {example}\n")
            
            f.write("\n\n")
        
        # Individual Control Analysis
        f.write("="*80 + "\n")
        f.write("DETAILED CONTROL ANALYSIS\n")
        f.write("="*80 + "\n\n")
        
        for control in sorted(ISO_CONTROLS.keys()):
            col = f'{control}_noncompliant'
            if col not in results_df.columns:
                continue
            
            noncomp_count = (~results_df[col]).sum()
            comp_count = results_df[col].sum()
            noncomp_pct = (noncomp_count / len(results_df)) * 100
            comp_pct = (comp_count / len(results_df)) * 100
            
            f.write(f"{control} - {ISO_CONTROLS[control]['name']}\n")
            f.write("-" * 70 + "\n")
            f.write(f"Domain: {ISO_CONTROLS[control]['domain']}\n")
            f.write(f"Description: {ISO_CONTROLS[control]['description']}\n\n")
            
            f.write(f"COMPLIANCE SUMMARY:\n")
            f.write(f"  Compliant: {comp_count:,} devices ({comp_pct:.1f}%)\n")
            f.write(f"  Noncompliant: {noncomp_count:,} devices ({noncomp_pct:.1f}%)\n\n")
            
            # Interpretation
            if 'interpretation' in ISO_CONTROLS[control]:
                f.write(f"INTERPRETATION:\n{ISO_CONTROLS[control]['interpretation']}\n\n")
            
            # Breakdown by CPSS type
            f.write(f"NONCOMPLIANCE BY CPSS TYPE:\n")
            for cpss_type in ['EACS', 'VSS', 'IHAS']:
                cat_df = results_df[results_df['cpss_primary_category'] == cpss_type]
                if len(cat_df) > 0:
                    cat_noncomp = (~cat_df[col]).sum()
                    cat_noncomp_pct = (cat_noncomp / len(cat_df)) * 100
                    
                    # Check if in scenario mapping
                    in_scenario = cpss_type in SCENARIO_CONTROLS and control in SCENARIO_CONTROLS[cpss_type]
                    scenario_mark = " [SCENARIO CONTROL]" if in_scenario else ""
                    
                    f.write(f"  {cpss_type:6s}: {cat_noncomp:3,}/{len(cat_df):3,} ({cat_noncomp_pct:5.1f}%){scenario_mark}\n")
            
            f.write("\n\n")
        
        # Risk Scoring Methodology
        f.write("="*80 + "\n")
        f.write("RISK SCORING METHODOLOGY\n")
        f.write("="*80 + "\n\n")
        
        f.write("The risk scoring model comprises three components:\n\n")
        
        f.write("1. BASE INDICATOR - ISO 27002:2022 Compliance:\n")
        f.write("   - Measures scan-observable noncompliance per control\n")
        f.write("   - Aggregated by resilience domain and CPSS category\n")
        f.write("   - Score contribution: 0-6 points (60% of total)\n\n")
        
        f.write("2. LIKELIHOOD INDICATOR - KEV CVE Presence:\n")
        f.write("   - Flags CPSS records with CVEs in CISA KEV catalog\n")
        f.write("   - Indicates actively exploited vulnerabilities\n")
        f.write("   - Score contribution: +2 points if present\n\n")
        
        f.write("3. IMPACT INDICATOR - Risk Scenario Mapping:\n")
        f.write("   - Maps noncompliance to CPSS-specific risk scenarios\n")
        f.write("   - EACS scenarios: A.5.15, A.8.2, A.8.5, A.8.9, A.8.22, A.8.20\n")
        f.write("   - VSS scenarios: A.8.24, A.8.21, A.8.8, A.8.5, A.8.9\n")
        f.write("   - IHAS scenarios: A.5.15, A.8.5, A.8.8, A.8.16, A.8.22, A.8.20, A.8.24, A.8.21\n")
        f.write("   - Score contribution: +2 points if in scenario\n\n")
        
        f.write("RISK LEVEL DETERMINATION:\n")
        f.write("  - Low: <20% noncompliance, no KEV CVEs\n")
        f.write("  - Medium: 20-50% noncompliance OR isolated KEV CVEs\n")
        f.write("  - High: >50% noncompliance + KEV CVEs + scenario relevance\n\n")
        
        # Footer
        f.write("="*80 + "\n")
        f.write("END OF REPORT\n")
        f.write("="*80 + "\n")

    print(f"   Saved: {report_file.name}")

print("Enhanced report generation functions loaded")


Enhanced visualization functions loaded
Enhanced report generation functions loaded


## Execute Assessment

In [50]:
# ============================================================================
# MAIN EXECUTION
# ============================================================================

if __name__ == '__main__':
    print("\n" + "="*80)
    print("ANALYSES 3: ISO 27002:2022 ASSESSMENT")
    print("="*80)
    # print(f"Total controls assessed: {len(ISO_CONTROLS)}")

    # Load data
    print(f"\nLoading: {INPUT_FILE}")
    try:
        df = pd.read_csv(INPUT_FILE)
        print(f"Loaded {len(df):,} devices")
    except FileNotFoundError:
        print(f"ERROR: Input file not found: {INPUT_FILE}")
        print("Please ensure cpss_all_services_enhanced.csv exists in the expected location.")
        exit(1)

    # Run assessment
    results_df = run_assessment(df)

    # Save results
    output_file = OUTPUT_DIR / 'cpss_iso27002_assessment_complete.csv'
    results_df.to_csv(output_file, index=False)
    print(f"\nSaved assessment results: {output_file.name}")

    # Generate analytics
    generate_summary_statistics(results_df)
    generate_domain_table(results_df)
    create_visualizations(results_df)
    generate_report(results_df)

    print("\n" + "="*80)
    print("ASSESSMENT COMPLETE")
    print("="*80)
    print(f"\nAll outputs saved to: {OUTPUT_DIR}")
    print("\nFiles created:")
    print("  - cpss_iso27002_assessment_complete.csv (detailed results)")
    print("  - cpss_iso27002_assessment_complete.png (visualizations)")
    print("  - cpss_iso27002_report_complete.txt (summary report)")
    print("="*80 + "\n")


ANALYSES 3: ISO 27002:2022 ASSESSMENT

Loading: output\2_cpss_identification\cpss_all_services_enhanced.csv
Loaded 22 devices

RUNNING ISO 27002:2022 ASSESSMENT
Total devices: 22

Running 13 control assessments	
  A.5.15: Access control		  (0/22 noncompliant - 0.0%)
  A.8.2: Privileged access rights		  (0/22 noncompliant - 0.0%)
  A.8.3: Information access restriction		  (21/22 noncompliant - 95.5%)
  A.8.5: Secure authentication		  (0/22 noncompliant - 0.0%)
  A.8.9: Configuration management		  (3/22 noncompliant - 13.6%)
  A.8.20: Networks security		  (9/22 noncompliant - 40.9%)
  A.8.21: Security of network services		  (0/22 noncompliant - 0.0%)
  A.8.22: Segregation of networks		  (0/22 noncompliant - 0.0%)
  A.8.24: Use of cryptography		  (9/22 noncompliant - 40.9%)
  A.8.8: Management of technical vulnerabilities		  (16/22 noncompliant - 72.7%)
  A.5.23: Cloud services security		  (9/22 noncompliant - 40.9%)
  A.5.19: Supplier relationships		  (22/22 noncompliant - 100.0%)
  A.8