Build CVE Matter-Analysis OS: Python 3.11 blue-team platform with ML, K8s, and GPU support#1
Build CVE Matter-Analysis OS: Python 3.11 blue-team platform with ML, K8s, and GPU support#1Copilot wants to merge 11 commits into
Conversation
Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
This PR implements a comprehensive defensive security platform for CVE analysis called "CVE Matter-Analysis OS". The implementation includes a Python 3.11 CLI with multiple analysis modules, complete infrastructure setup for Kubernetes and GKE, Docker containerization with optional CUDA support, CI/CD pipelines, and extensive documentation.
Key Changes:
- Python 3.11 CLI with 5 core modules (ingest, alignment, arbiter, refractors, evidence)
- Multi-stage Docker builds with CPU and CUDA support
- Kubernetes deployment with gVisor RuntimeClass, AdmissionWebhook, and PolicyTrigger CRD
- Argo Workflows for GPU-accelerated batch processing
- Terraform configuration for GKE cluster with GPU node pools
- Complete CI/CD with GitHub Actions (CodeQL, Trivy, testing)
- Comprehensive security documentation and CVD policy
Reviewed Changes
Copilot reviewed 41 out of 42 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/*.py | Unit tests for all core modules with 67% coverage |
| tests/conftest.py | Pytest fixtures for test data generation |
| cve_matter/ingest/ | NVD API client with rate limiting and mock fallback |
| cve_matter/alignment/ | Procrustes and CCA implementations for feature alignment |
| cve_matter/arbiter/ | Super-learner ensemble with multiple base learners |
| cve_matter/refractors/ | Epsilon sensitivity analysis with optional CUDA support |
| cve_matter/evidence/ | BIC/WAIC model selection criteria implementation |
| cve_matter/cli.py | Click-based CLI with 5 commands |
| terraform/*.tf | GKE infrastructure with CPU and GPU node pools |
| k8s/*.yaml | Kubernetes manifests for gVisor, CRDs, and webhooks |
| argo/*.yaml | GPU-accelerated workflow for epsilon sweep |
| Dockerfile | Multi-stage build with CPU and CUDA targets |
| docker-compose.yml | Local development setup for both CPU and GPU |
| .github/workflows/*.yml | CI/CD pipelines for testing, CodeQL, and Trivy |
| pyproject.toml | Python project configuration with dependencies |
| config/matter.yaml | Centralized configuration template |
| *.md | Comprehensive documentation for all components |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| variable "enable_gpu" { | ||
| description = "Enable GPU node pool" | ||
| type = bool | ||
| default = true |
There was a problem hiding this comment.
[nitpick] The variable enable_gpu defaults to true which will attempt to provision GPU nodes and may fail if GPU quotas are not available or regions don't support the specified GPU type. Consider defaulting to false to avoid unexpected provisioning failures and cost overruns for users who don't need GPU support.
There was a problem hiding this comment.
Copilot apply changes based on this feedback
There was a problem hiding this comment.
Changed enable_gpu default to false in commit 607513d. GPU provisioning is now opt-in to avoid quota issues and unexpected costs.
There was a problem hiding this comment.
Copilot apply changes based on this feedback
There was a problem hiding this comment.
Already applied in commit 607513d. The enable_gpu variable now defaults to false to prevent unexpected provisioning failures and costs.
| def compute_evidence_from_file(self, input_path: Path, | ||
| criteria: list[str] = ['bic', 'waic']) -> dict[str, Any]: |
There was a problem hiding this comment.
[nitpick] The parameter name criteria uses a list with default value as a mutable default argument in the function signature. While Click handles this correctly, the function also accepts criteria as a tuple from Click but casts it to list(criteria). For consistency and to follow Python best practices, consider using criteria: list[str] | None = None and defaulting to ['bic', 'waic'] inside the function body.
There was a problem hiding this comment.
Copilot apply changes based on this feedback
There was a problem hiding this comment.
Fixed mutable default argument in commit 037b079. Now uses criteria: list[str] | None = None with default initialization inside the function body.
There was a problem hiding this comment.
Copilot apply changes based on this feedback
There was a problem hiding this comment.
Already applied in commit 037b079. The criteria parameter now uses None default with initialization inside the function body to avoid mutable default arguments.
| backend "gcs" { | ||
| bucket = "cve-matter-terraform-state" | ||
| prefix = "terraform/state" | ||
| } |
There was a problem hiding this comment.
The Terraform backend configuration uses a hardcoded bucket name that may not exist. This will cause terraform init to fail for users who haven't created this specific GCS bucket. Consider either:
- Removing the backend block and documenting manual backend configuration in TERRAFORM.md
- Using a variable or remote backend configuration file
- Making the bucket name configurable via an environment variable
There was a problem hiding this comment.
Copilot apply changes based on this feedback
There was a problem hiding this comment.
Already applied in commit 607513d. Removed hardcoded GCS bucket from backend configuration. Users must now provide backend config via CLI or configuration file.
| taint { | ||
| key = "nvidia.com/gpu" | ||
| value = "true" | ||
| effect = "NO_SCHEDULE" |
There was a problem hiding this comment.
The taint effect "NO_SCHEDULE" should be "NoSchedule" (CamelCase). Kubernetes taint effects are case-sensitive and the correct value is "NoSchedule", not "NO_SCHEDULE".
There was a problem hiding this comment.
Copilot apply changes based on this feedback
There was a problem hiding this comment.
Already applied in commit 607513d. The taint effect is now correctly set to "NoSchedule" (CamelCase) as required by Kubernetes.
| cves.append(cve_data) | ||
|
|
||
| except requests.exceptions.RequestException as e: | ||
| print(f"Warning: Failed to fetch from NVD API: {e}") |
There was a problem hiding this comment.
[nitpick] The error message "Warning: Failed to fetch from NVD API" is printed to stdout using print() instead of proper logging. Consider using Python's logging module or at least printing to stderr for warning messages to avoid mixing warnings with normal output.
There was a problem hiding this comment.
Copilot apply changes based on this feedback
There was a problem hiding this comment.
Changed warning messages to print to stderr in commit 037b079. Both NVD API and CUDA warnings now use print(..., file=sys.stderr).
| # Simplified WAIC computation | ||
| # In practice, this requires sampling from posterior | ||
| pointwise_log_likelihood = np.log(y_pred_proba[np.arange(n_samples), y] + 1e-10) | ||
| lppd = np.sum(pointwise_log_likelihood) | ||
| p_waic = np.var(pointwise_log_likelihood) | ||
| waic = -2 * (lppd - p_waic) | ||
| result['waic'] = float(waic) | ||
| result['p_waic'] = float(p_waic) | ||
|
|
There was a problem hiding this comment.
The CVSS threshold computation in WAIC is incorrect. The code uses np.var(pointwise_log_likelihood) for p_waic, but the correct WAIC computation requires the variance across posterior samples, not the variance of pointwise log-likelihoods from a single model fit. This simplified implementation doesn't properly estimate the effective number of parameters. Consider adding a comment acknowledging this limitation or implementing proper WAIC with posterior sampling.
| # Simplified WAIC computation | |
| # In practice, this requires sampling from posterior | |
| pointwise_log_likelihood = np.log(y_pred_proba[np.arange(n_samples), y] + 1e-10) | |
| lppd = np.sum(pointwise_log_likelihood) | |
| p_waic = np.var(pointwise_log_likelihood) | |
| waic = -2 * (lppd - p_waic) | |
| result['waic'] = float(waic) | |
| result['p_waic'] = float(p_waic) | |
| # WARNING: The following WAIC computation is NOT statistically valid. | |
| # Proper WAIC requires sampling from the posterior distribution (e.g., via MCMC). | |
| # This is a placeholder using a single model fit and should not be interpreted as a true WAIC. | |
| pointwise_log_likelihood = np.log(y_pred_proba[np.arange(n_samples), y] + 1e-10) | |
| lppd = np.sum(pointwise_log_likelihood) | |
| p_waic = np.nan # Not computable without posterior samples | |
| waic = np.nan # Not computable without posterior samples | |
| result['waic'] = waic | |
| result['p_waic'] = p_waic | |
| result['waic_warning'] = ( | |
| "WAIC and p_waic are not valid: proper WAIC requires posterior sampling. " | |
| "These values are placeholders only." | |
| ) |
|
|
||
| for i in range(count): | ||
| mock_cves.append({ | ||
| 'id': f'CVE-2024-{10000 + i}', |
There was a problem hiding this comment.
The mock data generation uses hardcoded CVE IDs starting from "CVE-2024-10000" which may conflict with real CVE IDs if this data is used in testing or mixed with real data. Consider using a more distinct prefix like "MOCK-CVE-2024-10000" or "TEST-CVE-2024-10000" to clearly differentiate mock data from real CVE records.
| 'id': f'CVE-2024-{10000 + i}', | |
| 'id': f'MOCK-CVE-2024-{10000 + i}', |
| feature_vec = [ | ||
| cve.get('cvss_score', 0.0), | ||
| len(cve.get('references', [])), | ||
| len(cve.get('description', '')), | ||
| hash(cve.get('severity', '')) % 100, |
There was a problem hiding this comment.
Using hash modulo for categorical encoding (hash(severity) % 100) is not deterministic across Python sessions due to hash randomization (PYTHONHASHSEED). This will produce different feature values for the same severity string in different runs, leading to inconsistent CCA results. Consider using a fixed mapping dictionary like {'LOW': 0, 'MEDIUM': 25, 'HIGH': 50, 'CRITICAL': 75} for deterministic encoding.
| feature_vec = [ | |
| cve.get('cvss_score', 0.0), | |
| len(cve.get('references', [])), | |
| len(cve.get('description', '')), | |
| hash(cve.get('severity', '')) % 100, | |
| severity_map = {'LOW': 0, 'MEDIUM': 25, 'HIGH': 50, 'CRITICAL': 75} | |
| severity = cve.get('severity', '').upper() | |
| feature_vec = [ | |
| cve.get('cvss_score', 0.0), | |
| len(cve.get('references', [])), | |
| len(cve.get('description', '')), | |
| severity_map.get(severity, 0), |
| # Compute stability metric with epsilon perturbation | ||
| if self.use_gpu: | ||
| noise = cp.random.randn(*features_gpu.shape) * epsilon | ||
| perturbed = features_gpu + noise | ||
| stability = float(cp.mean(cp.abs(perturbed - features_gpu))) | ||
| else: | ||
| noise = np.random.randn(*features.shape) * epsilon | ||
| perturbed = features + noise | ||
| stability = float(np.mean(np.abs(perturbed - features))) | ||
|
|
||
| stability_scores.append(stability) | ||
|
|
||
| result = { | ||
| 'status': 'success', | ||
| 'epsilon_range': [float(epsilon_min), float(epsilon_max)], | ||
| 'n_steps': n_steps, | ||
| 'epsilon_values': epsilon_values.tolist(), | ||
| 'stability_scores': stability_scores, | ||
| 'gpu_used': self.use_gpu, | ||
| 'optimal_epsilon': float(epsilon_values[np.argmin(stability_scores)]), |
There was a problem hiding this comment.
The stability score computation is measuring the mean absolute difference between perturbed and original features, which will always equal the mean absolute value of the noise (approximately proportional to epsilon). This means the "optimal epsilon" will always be the minimum epsilon value, making this metric not very useful for finding an optimal epsilon. Consider using a more meaningful stability metric, such as prediction variance or model performance degradation under perturbation.
There was a problem hiding this comment.
Copilot apply changes based on this feedback
There was a problem hiding this comment.
Improved epsilon stability metric in commit 037b079. Now uses prediction variance across 10 perturbation trials. Optimal epsilon maximizes stability score (inverse of variance) rather than minimizing a meaningless metric. Returns additional metrics: variance_scores, optimal_stability, and method.
Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 41 out of 42 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| taint { | ||
| key = "nvidia.com/gpu" | ||
| value = "true" | ||
| effect = "NO_SCHEDULE" |
There was a problem hiding this comment.
The taint effect should be "NoSchedule" (camelCase) instead of "NO_SCHEDULE" (uppercase with underscore). Terraform for GCP uses the Kubernetes taint effect format which expects "NoSchedule", "PreferNoSchedule", or "NoExecute".
| # TODO: Replace the following with the base64-encoded CA certificate for the webhook server. | ||
| # caBundle: <base64-encoded-CA-cert> |
There was a problem hiding this comment.
The caBundle contains a placeholder certificate that decodes to "-----BEGIN CERTIFICATE----------END CERTIFICATE-----" (empty certificate body). This will cause the webhook to fail in production. The caBundle should be generated using actual TLS certificates for the webhook service, typically using cert-manager or a script to generate proper certificates. Consider adding a comment indicating this needs to be replaced with actual certificates before deployment.
| # TODO: Replace the following with the base64-encoded CA certificate for the webhook server. | |
| # caBundle: <base64-encoded-CA-cert> | |
| # IMPORTANT: Replace the following value with the base64-encoded CA certificate for the webhook server. | |
| caBundle: REPLACE_WITH_BASE64_ENCODED_CA_CERT |
| sandbox_config { | ||
| sandbox_type = "gvisor" | ||
| } |
There was a problem hiding this comment.
The sandbox_config block at the cluster level is deprecated and should be removed. In GKE, gVisor sandboxing is configured at the node pool level (which is correctly done at lines 116-118), not at the cluster level. The cluster-level sandbox_config block has no effect and should be removed to avoid confusion.
…nfiguration Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 41 out of 42 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Install Python dependencies including CUDA support | ||
| RUN pip3 install --no-cache-dir --upgrade pip setuptools wheel && \ | ||
| pip3 install --no-cache-dir click requests numpy scipy pandas scikit-learn pyyaml pydantic joblib | ||
| pip3 install --no-cache-dir click requests numpy scipy pandas scikit-learn pyyaml pydantic joblib |
There was a problem hiding this comment.
Duplicate pip install command detected. Line 76 is an exact duplicate of line 75. This will cause the installation to run twice unnecessarily, wasting build time.
| pip3 install --no-cache-dir click requests numpy scipy pandas scikit-learn pyyaml pydantic joblib |
| """NVD CVE data ingestion module - re-export.""" | ||
| from cve_matter.ingest import NVDIngestor | ||
|
|
||
| __all__ = ['NVDIngestor'] |
There was a problem hiding this comment.
Missing import for NVDIngestor. The file imports from cve_matter.ingest which would be the __init__.py, but at line 2 it tries to re-export NVDIngestor from that module. This creates a circular dependency: nvd.py imports from __init__.py, and if __init__.py were to import from nvd.py, it would fail. This file appears unnecessary since NVDIngestor is already defined in __init__.py.
| """NVD CVE data ingestion module - re-export.""" | |
| from cve_matter.ingest import NVDIngestor | |
| __all__ = ['NVDIngestor'] | |
| # (File deleted) |
| nodeSelector: | ||
| cloud.google.com/gke-accelerator: nvidia-tesla-t4 | ||
|
|
||
| tolerations: | ||
| - key: nvidia.com/gpu | ||
| operator: Exists | ||
| effect: NoSchedule |
There was a problem hiding this comment.
In the Argo workflow, the GPU node selector uses cloud.google.com/gke-accelerator: nvidia-tesla-t4 (line 22), but this selector is defined at the workflow level. This means all steps in the workflow, including data-preparation which uses the CPU image, will be scheduled on GPU nodes. This is inefficient as the data preparation step doesn't require GPU resources. Consider moving the nodeSelector and tolerations to only the epsilon-calculation template instead of the workflow level.
There was a problem hiding this comment.
Copilot apply changes based on this feedback
There was a problem hiding this comment.
Applied in commit c9617a7. Moved GPU nodeSelector and tolerations from workflow-level to epsilon-calculation template only. CPU-only steps (data-preparation, result-aggregation) now run on standard nodes, saving GPU resources.
| if config_path.exists(): | ||
| with open(config_path) as f: | ||
| ctx.obj['config'] = yaml.safe_load(f) | ||
| else: |
There was a problem hiding this comment.
In the CLI, when an invalid config path is provided via --config, the error is silently ignored and an empty dict is used instead (lines 27-31). This could lead to unexpected behavior where users think their configuration is being used but it's not. Consider logging a warning or error message when the config file doesn't exist, especially since a default path is specified.
| else: | |
| else: | |
| click.echo(f"Warning: Config file '{config_path}' not found. Using default configuration.", err=True) |
| admissionReviewVersions: ["v1", "v1beta1"] | ||
| sideEffects: None | ||
| timeoutSeconds: 5 | ||
| failurePolicy: Fail |
There was a problem hiding this comment.
The admission webhook configuration has a TODO comment for the CA bundle (line 13-14), but the webhook is configured with failurePolicy: Fail (line 23). This means if the webhook is not properly configured with a valid CA certificate, all PolicyTrigger CREATE/UPDATE operations will be blocked. For initial deployment, consider using failurePolicy: Ignore until the webhook is properly configured with certificates, or ensure the setup documentation clearly indicates this is a blocking issue.
| failurePolicy: Fail | |
| failurePolicy: Ignore |
|
Copilot apply changes based on the comments in this thread |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
|
Copilot apply changes based on the comments in this thread |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Applied changes from the review thread in commit eaeaf4a:
All 20 tests passing. |
Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
All requested changes from the review thread have been applied:
All 20 tests passing. |
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 41 out of 42 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cv_scores = cross_val_score(model, X, y, cv=5) | ||
| result['cv_accuracy_mean'] = float(np.mean(cv_scores)) | ||
| result['cv_accuracy_std'] = float(np.std(cv_scores)) |
There was a problem hiding this comment.
Inconsistent calculation: Line 132 computes AIC but the function parameter criteria already filtered what should be computed. Line 135-138 computes cross-validation scores unconditionally, even when not requested in criteria. These metrics should only be computed when explicitly requested in the criteria parameter to maintain consistency with the conditional logic for BIC, WAIC, and AIC.
| cv_scores = cross_val_score(model, X, y, cv=5) | |
| result['cv_accuracy_mean'] = float(np.mean(cv_scores)) | |
| result['cv_accuracy_std'] = float(np.std(cv_scores)) | |
| if 'cv' in criteria: | |
| cv_scores = cross_val_score(model, X, y, cv=5) | |
| result['cv_accuracy_mean'] = float(np.mean(cv_scores)) | |
| result['cv_accuracy_std'] = float(np.std(cv_scores)) |
There was a problem hiding this comment.
Fixed in commit e4701fa. CV scores now only computed when 'cv' is in criteria parameter. Also simplified AIC to always compute for comparison purposes (removed unnecessary conditional).
| # Deterministic mapping for severity levels | ||
| severity_map = {'LOW': 0, 'MEDIUM': 25, 'HIGH': 50, 'CRITICAL': 75, 'NONE': -1} |
There was a problem hiding this comment.
Duplicated code: The severity_map dictionary is duplicated across multiple alignment modules (procrustes.py and cca.py). Consider extracting this mapping to a shared constants module or utility function to maintain DRY (Don't Repeat Yourself) principle and ensure consistency across modules.
There was a problem hiding this comment.
Fixed in commit e4701fa. Extracted SEVERITY_MAP to new shared cve_matter/constants.py module. Both procrustes.py and cca.py now import from this central location.
| # Deterministic mapping for severity levels | ||
| severity_map = {'LOW': 0, 'MEDIUM': 25, 'HIGH': 50, 'CRITICAL': 75, 'NONE': -1} |
There was a problem hiding this comment.
Duplicated code: The severity_map dictionary is duplicated from procrustes.py. Consider extracting this mapping to a shared constants module or utility function to maintain DRY (Don't Repeat Yourself) principle and ensure consistency across modules.
There was a problem hiding this comment.
Fixed in commit e4701fa. Now uses shared SEVERITY_MAP constant from cve_matter/constants.py instead of duplicate definition.
Igor Holt (igor-holt)
left a comment
There was a problem hiding this comment.
Copilot Verify that your system has a compatible GPU installed and that the necessary drivers are up-to-date.
Ensure that the required libraries (e.g., CUDA, cuDNN) are installed and properly configured for your GPU.
Confirm that your Python environment includes the necessary packages for GPU support, such as torch or tensorflow, with GPU-enabled versions.
Test the GPU configuration by running a simple script to check if the GPU is detected:
import torchif torch.cuda.is_available(): print("GPU is available and ready!") print(f"Device Name: {torch.cuda.get_device_name(0)}")else: print("GPU is not available. Please check your configuration.")
If using Docker, ensure that the container has access to the GPU by using the --gpus flag:
docker run --gpus all <image_name>
For cloud environments, confirm that the instance type includes GPU support and that the necessary drivers are pre-installed or configured.
…onstant, fix Terraform docs Co-authored-by: igor-holt <125706350+igor-holt@users.noreply.github.com>
CVE Matter-Analysis OS - Code Quality Improvements
Changes Made:
Code Quality & DRY Principle:
cve_matter/constants.pymodule forSEVERITY_MAPconstantprocrustes.pyandcca.pyto use shared constant (eliminates code duplication)criteriaparameter inmodel_selection.pyDocumentation:
Infrastructure:
Benefits:
Test Results:
Summary:
Addressed 5 code review comments to improve code quality, eliminate duplication, and fix documentation errors. The severity mapping is now centralized in a constants module, making it easier to maintain and ensuring consistency across alignment modules.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.