Skip to content
This repository was archived by the owner on Feb 8, 2026. It is now read-only.

🚀 Release v1.1.0: Major Performance Optimizations & Security Fixes#3

Merged
abevz merged 57 commits into
mainfrom
feature/improvements
Sep 2, 2025
Merged

🚀 Release v1.1.0: Major Performance Optimizations & Security Fixes#3
abevz merged 57 commits into
mainfrom
feature/improvements

Conversation

@abevz
Copy link
Copy Markdown
Owner

@abevz abevz commented Sep 2, 2025

🚀 Release v1.1.0: Major Performance Optimizations & Security Fixes

📋 Summary

This PR introduces significant performance improvements to the CPC cluster management tool, with cluster-info command optimized from 22+ seconds to under 0.5 seconds, plus critical security fixes for Kubernetes version pinning.

✨ New Features

  • cluster-info --quick mode: Ultra-fast cluster status (0.1s execution time)
  • Two-tier terraform caching: Short-term (30s) and long-term (5min) cache layers
  • Smart workspace detection: Avoids unnecessary terraform workspace switches
  • Context-aware cache management: Separate cache files per workspace

🔒 Security Fixes

  • Pinned Kubernetes versions: Fixed high-severity issue where kubelet, kubeadm, kubectl versions weren't pinned
  • Version consistency: Prevents automatic patch updates that could cause cluster instabilities
  • Role defaults: Changed from 'latest' to specific pinned versions for production safety

⚡ Performance Improvements

Command Before After Improvement
cluster-info (first run) 22s 7.2s 3x faster
cluster-info (cached) 22s 0.44s 50x faster
cluster-info --quick N/A 0.1s 220x faster

🧪 Testing

  • ✅ All tests passing (100% success rate)
  • ✅ Comprehensive test suite with 59 tests
  • ✅ Performance benchmarking validated
  • ✅ No breaking changes - fully backward compatible

🔧 Technical Changes

  • Optimized terraform operations: Smart workspace state management
  • Enhanced caching strategy: Multi-level cache with intelligent invalidation
  • Reduced I/O operations: Better cache file handling
  • Network efficiency: Fewer remote state API calls
  • Security hardening: Kubernetes component version pinning

🔧 Code Quality Improvements

  • Magic number elimination: Replaced hardcoded values with named constants in terraform
  • Hostname collision prevention: Added mandatory RELEASE_LETTER to all environments
  • Code consistency: Enhanced error handling and validation in scripts

📚 Documentation Updates

  • Updated CHANGELOG.md with detailed performance metrics
  • Enhanced RELEASE_NOTES.md with v1.1.0 changes
  • Updated help text to include --quick option
  • Added performance benchmarks

🔄 Migration

  • No migration needed - all existing commands work as before
  • New --quick flag available for ultra-fast cluster information
  • Kubernetes versions now properly pinned for consistency

🎯 Ready for Release

  • Version bumped to 1.1.0
  • All tests passing
  • Documentation updated
  • Performance benchmarks validated
  • Security fixes applied
  • Code review feedback addressed
  • Russian comments translated to English
  • Magic numbers replaced with constants
  • No breaking changes

abevz added 30 commits August 16, 2025 15:19
- Implemented Pi-hole API library for DNS record management (lib/pihole_api.sh)
- Developed SSH utilities for managing known hosts and connections (lib/ssh_utils.sh)
- Created core module for context management, secrets loading, and workspace handling (modules/00_core.sh)
- Added Terraform/OpenTofu management module for infrastructure deployment (modules/60_tofu.sh)
- Introduced test script to validate modular architecture and functionality (test_modules.sh)
- Added placeholder for Pi-hole DNS script (scripts/add_pihole_dns.py.new)
…arization

- Extracted Terraform/OpenTofu functionality into modules/60_tofu.sh
  - Implemented commands for deployment, VM management, hostname generation, kubeconfig retrieval, and environment loading.
  - Integrated modular functions into the main cpc script with zero breaking changes.
  - Enhanced code organization and documentation.

- Created modules/10_proxmox.sh for Proxmox VM management
  - Implemented interactive commands for adding and removing VMs with safety confirmations.
  - Updated environment management and Terraform integration.
  - Maintained backward compatibility and improved code quality.
…dule and update main script for modular architecture
- Created modules/30_k8s_cluster.sh for managing Kubernetes cluster lifecycle.
- Added functions for bootstrap, kubeconfig retrieval, upgrade, and node reset.
- Enhanced modules/60_tofu.sh with cluster-info command for infrastructure details.
- Updated test_modules.sh to include tests for new K8s cluster module functions.
- Documented the new module and its functionalities in k8s_cluster_module_30_completion_report.md.
- Ensured proper separation of concerns between Tofu and K8s modules.
- Introduced cpc_core() as the main entry point for core commands.
- Migrated existing commands to dedicated functions:
  - setup-cpc → core_setup_cpc()
  - ctx → core_ctx()
  - clone-workspace → core_clone_workspace()
  - delete-workspace → core_delete_workspace()
  - load_secrets → core_load_secrets_command()
- Updated load_secrets() to fully integrate SOPS for secrets management.
- Improved error handling and validation across all core functions.
- Enhanced documentation and help text for better user guidance.
- Clean separation of core functionality from operational commands.
- Removed empty static IP implementation summary file.
- Added commands to Proxmox module for creating VM templates and controlling VMs.
- Introduced advanced inventory cache update in Ansible module to fetch comprehensive cluster information.
- Added functionality to reset all Kubernetes nodes in the K8s nodes module.
- Updated Terraform locals to remove fixed VM ID ranges for specific workspaces.
- Implemented SSH management module for clearing known_hosts and managing SSH connections.
Translate Russian comments and documentation to English across the project to improve code readability and maintainability for a wider audience.

- Updated all shell scripts in `modules/`.

- Updated Ansible playbooks in `ansible/playbooks/`.

- Translated documentation file in `docs/`.

- Included user-requested changes to `cpc` and `install_kubernetes_cluster.yml`.
abevz added 23 commits August 21, 2025 20:28
- Translated Russian comments to English in all module files (00_core.sh through 80_ssh.sh)
- Translated Russian comments in all Ansible playbooks
- Translated Russian comments in Terraform configuration files
- Translated complete Russian documentation to English
- Improved project accessibility for international contributors

This completes the internationalization of the codebase as requested in Step 1.
…tation

- Redesigned README.md with modern formatting and visual appeal
- Added comprehensive table of contents with navigation
- Enhanced overview section with clear value propositions
- Expanded key features with detailed descriptions and icons
- Improved quick start guide with 5-minute setup process
- Added detailed installation and configuration sections
- Included extensive usage examples for common operations
- Enhanced architecture section with visual diagrams
- Added workspace system documentation with status indicators
- Included testing and validation procedures
- Added contribution guidelines and support information
- Improved project structure documentation
- Added badges for technology stack and status indicators

This creates a professional, user-friendly README that serves as both documentation and marketing material for the CPC project.
…s/60_tofu.sh

- Translated the last remaining Russian comment '# Добавить переменную в массив команды tofu' to '# Add variable to tofu command array'
- Verified complete removal of all Russian characters from codebase
- Phase 1 documentation improvements are now fully complete

All Russian comments have been successfully translated to English across:
- 7 core modules (00_core.sh through 80_ssh.sh)
- 5 Ansible playbooks
- 2 Terraform configuration files
- 1 documentation file (static_ip_configuration.md)
- Enhanced README.md with comprehensive project documentation
- Added 'if [[ "${BASH_SOURCE[0]}" == "${0}" ]];' checks to modules:
  - modules/00_core.sh - Core functionality module
  - modules/40_k8s_nodes.sh - Kubernetes node management module
  - modules/50_cluster_ops.sh - Cluster operations module

This ensures all modules follow the same pattern and cannot be executed directly,
preventing potential issues and maintaining consistency across the codebase.

All modules now have proper direct execution protection as required by the test suite.
- Added 'if [[ "${BASH_SOURCE[0]}" == "${0}" ]];' checks to all modules:
  - modules/20_ansible.sh - Ansible playbook management module
  - modules/30_k8s_cluster.sh - Kubernetes cluster lifecycle module
  - modules/70_dns_ssl.sh - DNS and SSL certificate management module
  - modules/80_ssh.sh - SSH management module

This ensures complete consistency across all modules and prevents direct execution,
which could cause issues with missing dependencies or incorrect environment setup.

All modules in the CPC project now follow the same security pattern.
- Removed empty modules/10_workspace.sh file that was causing test failures
- This file contained no code and was not being used by the system

All remaining modules now have proper structure and direct execution checks.
- ✅ All 42 unit tests passing
- ✅ All 8 integration tests passing
- ✅ Comprehensive test framework established
- ✅ Fixed module structure issues (added direct execution guards)
- ✅ Removed empty module file causing test failures
- ✅ Test automation with Makefile and run_tests.sh
- ✅ Virtual environment setup for dependency isolation
- ✅ Linting tools integrated (shellcheck, bashate, ansible-lint)

Phase 3 (Testing) is now complete and ready to support Phase 2 (Error Handling) development with robust regression detection capabilities.
- ✅ Fixed all critical ShellCheck errors (SC2145, SC2162, SC2181)
- ✅ Added -r flag to all read commands for better security
- ✅ Replaced $? checks with direct command execution checks
- ✅ Fixed array handling in log_debug statements
- ✅ Added error checking for pushd/popd operations
- ✅ Improved error handling patterns throughout codebase

Code quality improvements:
- Better error handling with if ! command patterns
- More secure read operations with -r flag
- Proper array expansion in logging statements
- Enhanced pushd/popd error checking

All tests still pass ✅, but code is now more robust and follows best practices.
…generation

- Add automatic secrets loading on CPC startup
- Fix RELEASE_LETTER handling for proper hostname generation
- Improve tofu command retry logic for plan/apply operations
- Add support for informational commands (--help, --version)
- Skip hostname generation for non-deployment commands
- Fix exit code handling for tofu plan command
- Add verbose/quiet modes for secrets loading
- Improve error handling and logging throughout the system
- Remove retry logic for tofu commands (max_retries=0) to prevent multiple runs
- Remove 'Retrying tofu command' messages for cleaner output
- Add proper interactive input handling for apply and destroy commands
- Fix hostname generation script variable name (SSH_USERNAME -> VM_USERNAME)
- Improve error handling and timeout management
- Commands now work correctly in both interactive and non-interactive modes
This commit includes all fixes for tofu command execution and testing:

🔧 Tofu Commands Fixes:
- Fixed hanging issues with apply/destroy commands
- Added proper interactive input handling for apply and destroy
- Removed retry logic to prevent multiple command executions
- Removed 'Retrying tofu command' messages for cleaner output
- Fixed hostname generation script variable name (SSH_USERNAME -> VM_USERNAME)

🧪 Testing Infrastructure:
- Installed pytest and bashate dependencies
- Fixed test framework to handle timeouts properly
- Added pipx PATH to test runner script
- Fixed workspace operations test
- All unit and integration tests now pass

📝 Code Quality:
- Improved error handling and timeout management
- Better interactive vs non-interactive command handling
- Cleaner output without unnecessary retry messages

✅ All systems working:
- tofu apply/destroy/plan commands work correctly
- Interactive input works in terminal, auto-approve in scripts
- Full test suite passes (42 unit + 8 integration tests)
- Linting tools working properly
✨ Features:
- Add cluster-info --quick mode (0.1s execution)
- Implement two-tier terraform caching system
- Smart workspace detection and switching
- Context-aware cache management

⚡ Performance:
- cluster-info: 22s → 0.44s (50x faster)
- cluster-info --quick: 0.1s (220x faster)
- First run: 22s → 7.2s (3x faster)

🧪 Testing:
- All tests passing (100% success rate)
- Comprehensive test suite with 59 tests
- Performance benchmarking included

🔧 Technical:
- Optimized terraform workspace operations
- Enhanced multi-level caching strategy
- Reduced I/O and network operations
- Better resource utilization

📚 Documentation:
- Updated CHANGELOG.md and RELEASE_NOTES.md
- Added performance benchmarks
- Enhanced help text with --quick option
Fix high-severity issue: Pin kubelet, kubeadm, and kubectl versions
to prevent inconsistencies from automatic patch updates.

Changes:
- Pin Debian/Ubuntu packages to KUBERNETES_LONG_VERSION-1.1 format
- Pin RedHat/Rocky packages to KUBERNETES_LONG_VERSION format
- Ensures cluster consistency across future playbook runs

Security Impact: Prevents unintended version drift that could
cause cluster instability or incompatibilities.

Refs: install_kubernetes_cluster.yml
Change role default from 'latest' to pinned version '1.29.0-1.1'
to prevent automatic patch updates that could cause cluster
inconsistencies.

This complements the previous fix in install_kubernetes_cluster.yml
and ensures both playbooks and roles use pinned versions by default.

Impact: Improves cluster stability and prevents version drift.
…TTER, remove magic numbers

- Translate Russian comments in containerd-config.toml.j2 to English
- Add RELEASE_LETTER to all environment files for hostname collision prevention
- Replace magic numbers in terraform/locals.tf with named constants
- Enforce mandatory RELEASE_LETTER in hostname generation script
- Add comprehensive static-ip-example.env template
@abevz abevz requested a review from Copilot September 2, 2025 18:56
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @abevz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This release focuses on significantly improving the performance and stability of the CPC cluster management tool. It introduces a multi-layered caching system for faster operations and implements critical security measures by pinning Kubernetes versions. The underlying architecture has been modularized for better maintainability, complemented by a robust error handling framework. These changes aim to provide a more reliable and efficient experience for deploying and managing Kubernetes clusters.

Highlights

  • Major Performance Optimizations: The cluster-info command has been significantly optimized, achieving up to a 220x speed improvement, reducing execution time from 22+ seconds to under 0.5 seconds, including a new ultra-fast --quick mode.
  • Critical Security Fixes: Kubernetes component versions (kubelet, kubeadm, kubectl) are now explicitly pinned, preventing automatic patch updates that could lead to instability and enhancing production safety.
  • Intelligent Caching System: A new two-tier caching mechanism for Terraform operations has been implemented, featuring short-term (30s) and long-term (5min) cache layers, smart workspace detection, and context-aware cache management for improved efficiency.
  • Extensive Code Modularization: The codebase has undergone a significant modularization effort, extracting core functionalities into dedicated shell modules (e.g., Proxmox, Ansible, K8s Cluster, DNS/SSL), greatly enhancing organization, maintainability, and testability.
  • Robust Error Handling & Recovery: A new, comprehensive error handling, retry, timeout, and recovery system has been integrated across the project, improving stability and resilience during operations.
  • Enhanced Ansible Playbooks: Ansible playbooks have been extensively updated for improved stability, idempotency, and new features, including streamlined Kubernetes initialization with Harbor integration and Helm-based addon installations.
  • Comprehensive Testing Framework: New bash and Python-based test suites have been added, covering unit, integration, and linting tests, ensuring high code quality and validating functionality across the project.
  • Dynamic IP Allocation System: The static IP allocation has been refactored to a dynamic, workspace-block-based system, providing more flexible and automated IP management for clusters.
  • Streamlined Documentation & Contribution: New GitHub issue and pull request templates have been added, contributing guidelines updated, and outdated documentation reports removed, simplifying community engagement and project understanding.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@abevz abevz merged commit 30d9013 into main Sep 2, 2025
@abevz abevz deleted the feature/improvements branch September 2, 2025 18:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces significant performance optimizations to the CPC cluster management tool, improving cluster-info command execution time from 22+ seconds to under 0.5 seconds. Additionally, it includes critical security fixes for Kubernetes version pinning and comprehensive code quality improvements with enhanced error handling.

Key changes include:

  • Performance optimization with two-tier caching system (30s/5min cache layers)
  • Security hardening through Kubernetes version pinning
  • Enhanced error handling and recovery mechanisms across scripts

Reviewed Changes

Copilot reviewed 103 out of 127 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/integration/test_cpc_workflows.py New comprehensive integration test suite with workflow validation
tests/__init__.py Test framework infrastructure for command execution and validation
test_*.sh Test scripts for modular architecture, error handling, and deep integration
terraform/variables.tf Added static IP configuration variables for improved network management
terraform/providers.tf Added AWS/MinIO provider configuration for S3 backend
terraform/outputs.tf Simplified outputs to cluster summary and Ansible inventory format
terraform/nodes.tf Enhanced IP configuration with dynamic assignment logic
terraform/locals.tf Major refactoring with named constants and improved workspace IP mapping
scripts/vm_template/*.sh Removed debug output messages for cleaner production logs
scripts/*.sh Enhanced error handling, recovery mechanisms, and validation functions
requirements-test.txt Test dependencies for Python testing framework
pytest.ini Python test configuration
run_tests.sh Unified test runner script
prepare_release.sh Release preparation and cleanup automation
Comments suppressed due to low confidence (1)

tests/integration/test_cpc_workflows.py:1

  • The 'os' module is imported and used at line 323 in the test_environment_isolation method but not declared at the top of the file with other imports. Move this import to line 10 with the other module imports.
#!/usr/bin/env python3

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment thread terraform/locals.tf
# 2. Otherwise, fall back to the map with the workspace name
# 3. If that fails too, use "x" as a fallback
release_letter = var.release_letter != "" ? var.release_letter : lookup(local.release_letters_map, local.effective_os_type, "x")
release_letter = var.release_letter
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release_letter assignment lacks validation and fallback logic. According to the PR description mentioning 'mandatory RELEASE_LETTER to all environments' and the requirement shown in generate_node_hostnames.sh lines 44-58, this should validate that release_letter is not empty and provide appropriate error handling.

Suggested change
release_letter = var.release_letter
release_letter_map = {
"test-auto-release" = "a"
"k8s-test" = "b"
"k8s133" = "c"
"debian" = "d"
"ubuntu" = "u"
"rocky" = "r"
"suse" = "s"
# Add other workspace names and their release letters as needed
}
release_letter = coalesce(
trimspace(var.release_letter),
lookup(local.release_letter_map, local.effective_os_type, ""),
"x"
)

Copilot uses AI. Check for mistakes.
Comment on lines +222 to +223
local node_ips
local node_names
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variables are declared as local but used outside their scope. The 'local' keyword should be removed since these variables are used in the while loop at line 303 and throughout the rest of the function.

Suggested change
local node_ips
local node_names
node_ips=
node_names=

Copilot uses AI. Check for mistakes.
Comment thread scripts/template.sh
Comment on lines +156 to 158
TEMPLATE_COMMAND="
export PROXMOX_HOST='$PROXMOX_HOST'
export PROXMOX_USERNAME='$PROXMOX_USERNAME'
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment variables containing sensitive credentials are being passed through SSH command strings, which could be exposed in process lists or logs. Consider using SSH environment forwarding or secure file transfer instead.

Copilot uses AI. Check for mistakes.
Comment on lines +267 to +270
if ! pushd "$terraform_dir" >/dev/null; then
error_handle "$ERROR_EXECUTION" "Failed to change to terraform directory: $terraform_dir" "$SEVERITY_HIGH" "abort"
return 1
fi
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling calls 'abort' action but then uses 'return 1' which is redundant since 'abort' should exit the script. Either remove the return statement or change the action to 'continue'.

Copilot uses AI. Check for mistakes.
Comment thread scripts/add_pihole_dns.py
Comment on lines +524 to +525
# Error message is already printed inside the function
sys.exit(1)
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 524 is misleading and unhelpful. Replace with a descriptive error message explaining what failed, such as 'Failed to retrieve VM information from Terraform outputs'.

Suggested change
# Error message is already printed inside the function
sys.exit(1)
print("Error: Failed to retrieve VM information from Terraform outputs.", file=sys.stderr)
sys.exit(1)

Copilot uses AI. Check for mistakes.
Comment thread terraform/outputs.tf
Comment on lines +7 to +10
for key, node_data in local.final_nodes_map : key => {
VM_ID = node_data.vm_id
hostname = node_data.hostname
IP = node_data.static_ip_address # This field is now available
Copy link

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment '# This field is now available' is vague and doesn't explain what changed or why it's significant. Either remove it or provide more context about what was changed.

Suggested change
for key, node_data in local.final_nodes_map : key => {
VM_ID = node_data.vm_id
hostname = node_data.hostname
IP = node_data.static_ip_address # This field is now available
IP = node_data.static_ip_address

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent and substantial pull request that significantly improves the project's performance, security, code quality, and documentation. The modularization of shell scripts into libraries, the addition of a comprehensive test suite, and the major refactoring of Ansible playbooks are all fantastic enhancements. My review identifies a few potential issues to further improve the safety and maintainability of the new code.

# ansible/playbooks/pb_delete_node.yml

- name: Delete Node from Kubernetes Cluster
hosts: all
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using hosts: all in this playbook is potentially dangerous, as it could lead to accidental operations on all nodes if the playbook is run without a --limit flag. To make it safer, it's better to make the host target explicit by using a variable.

  hosts: "{{ node_to_delete }}"

Comment on lines 7 to +52
vars:
addon_name: "{{ addon_name | default('all') }}" # calico, coredns, metallb, metrics-server, cert-manager, kubelet-serving-cert-approver, argocd, ingress-nginx, or 'all'
requested_version: "{{ addon_version | default('') }}" # Specific version or empty for defaults
addon_name: "{{ addon_name | default('all') }}"
requested_version: "{{ addon_version | default('') }}"
control_plane_node: "{{ groups['control_plane'][0] }}"

# Default addon versions from environment

# Cloudflare and Let's Encrypt settings
cloudflare_api_token: "{{ lookup('env', 'CLOUDFLARE_DNS_API_TOKEN') }}"
cloudflare_email: "{{ lookup('env', 'CLOUDFLARE_EMAIL') }}"
letsencrypt_server: "https://acme-v02.api.letsencrypt.org/directory"

# Default addon versions
default_calico_version: "{{ calico_version | default('v3.28.0') }}"
default_metallb_version: "{{ metallb_version | default('v0.14.8') }}"
default_metrics_server_version: "{{ metrics_server_version | default('v0.7.2') }}"
default_coredns_version: "{{ coredns_version | default('v1.11.3') }}"
default_cert_manager_version: "{{ cert_manager_version | default('v1.16.2') }}"
default_kubelet_serving_cert_approver_version: "{{ kubelet_serving_cert_approver_version | default('v0.9.2') }}"
default_kubelet_serving_cert_approver_version: >-
{{ kubelet_serving_cert_approver_version | default('v0.9.2') }}
default_argocd_version: "{{ argocd_version | default('v2.13.2') }}"
default_ingress_nginx_version: "{{ ingress_nginx_version | default('v1.12.0') }}"

default_traefik_gateway_version: "{{ traefik_gateway_version | default('37.0.0') }}"
default_gateway_api_version: "{{ gateway_api_version | default('v1.1.0') }}"

# Resolved versions
calico_target_version: "{{ requested_version if requested_version != '' else default_calico_version }}"
metallb_target_version: "{{ requested_version if requested_version != '' else default_metallb_version }}"
metrics_server_target_version: "{{ requested_version if requested_version != '' else default_metrics_server_version }}"
coredns_target_version: "{{ requested_version if requested_version != '' else default_coredns_version }}"
cert_manager_target_version: "{{ requested_version if requested_version != '' else default_cert_manager_version }}"
kubelet_serving_cert_approver_target_version: "{{ requested_version if requested_version != '' else default_kubelet_serving_cert_approver_version }}"
argocd_target_version: "{{ requested_version if requested_version != '' else default_argocd_version }}"
ingress_nginx_target_version: "{{ requested_version if requested_version != '' else default_ingress_nginx_version }}"
calico_target_version: >-
{{ requested_version if requested_version != '' else default_calico_version }}
metallb_target_version: >-
{{ requested_version if requested_version != '' else default_metallb_version }}
metrics_server_target_version: >-
{{ requested_version if requested_version != '' else default_metrics_server_version }}
coredns_target_version: >-
{{ requested_version if requested_version != '' else default_coredns_version }}
cert_manager_target_version: >-
{{ requested_version if requested_version != '' else default_cert_manager_version }}
kubelet_serving_cert_approver_target_version: >-
{{ requested_version if requested_version != ''
else default_kubelet_serving_cert_approver_version }}
argocd_target_version: >-
{{ requested_version if requested_version != '' else default_argocd_version }}
ingress_nginx_target_version: >-
{{ requested_version if requested_version != '' else default_ingress_nginx_version }}
traefik_gateway_target_version: >-
{{ requested_version if requested_version != '' else default_traefik_gateway_version }}
gateway_api_target_version: >-
{{ requested_version if requested_version != '' else default_gateway_api_version }}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This playbook has been significantly refactored, which is great. However, it seems that support for several addons has been removed compared to the previous version. The old version handled calico, coredns, metallb, metrics-server, cert-manager, kubelet-serving-cert-approver, argocd, and ingress-nginx. The new version only seems to handle metrics-server, calico, metallb, cert-manager, and argocd.

Could you please clarify if the removal of coredns, kubelet-serving-cert-approver, and ingress-nginx was intentional?

Additionally, the variables section defines default_traefik_gateway_version, but there are no tasks to install or manage Traefik. This seems inconsistent.

Comment thread Makefile

push:
@echo "Pushing to remote..."
git push origin feature/improvements
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The push target is hardcoded to the feature/improvements branch. This makes the target inflexible and will fail if used on any other branch. It's better to dynamically get the current branch name.

git push origin $(shell git rev-parse --abbrev-ref HEAD)

Comment on lines +89 to 94
echo "Warning: Could not get node information from terraform output. Falling back to default node definitions."
# Fallback logic for new workspaces
HOSTNAMES=() # Ensure it's empty
ROLES=("c" "w" "w")
INDICES=("1" "2" "3") # Note: Terraform logic uses original_index 1, 1, 2. Let's stick to simple logic here for fallback.
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback logic when tofu output fails defaults to a static configuration of one control plane and two worker nodes. This might not reflect the user's intended configuration, especially in a new workspace where tofu apply hasn't been run yet. This could lead to generating incorrect or incomplete hostname snippets.

Consider failing the script with a clear error message if the Terraform output is unavailable, instructing the user to run tofu apply first. This would be safer than generating a potentially incorrect configuration.

Suggested change
echo "Warning: Could not get node information from terraform output. Falling back to default node definitions."
# Fallback logic for new workspaces
HOSTNAMES=() # Ensure it's empty
ROLES=("c" "w" "w")
INDICES=("1" "2" "3") # Note: Terraform logic uses original_index 1, 1, 2. Let's stick to simple logic here for fallback.
fi
else
echo "Error: Could not get node information from terraform output. Please run 'cpc deploy apply' first."
exit 1
fi

abevz added a commit that referenced this pull request Sep 8, 2025
🚀 Release v1.1.0: Major Performance Optimizations & Security Fixes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants