🚀 Release v1.1.0: Major Performance Optimizations & Security Fixes by abevz · Pull Request #3 · abevz/CreatePersonalCluster

abevz · 2025-09-02T18:56:36Z

🚀 Release v1.1.0: Major Performance Optimizations & Security Fixes

📋 Summary

This PR introduces significant performance improvements to the CPC cluster management tool, with cluster-info command optimized from 22+ seconds to under 0.5 seconds, plus critical security fixes for Kubernetes version pinning.

✨ New Features

cluster-info --quick mode: Ultra-fast cluster status (0.1s execution time)
Two-tier terraform caching: Short-term (30s) and long-term (5min) cache layers
Smart workspace detection: Avoids unnecessary terraform workspace switches
Context-aware cache management: Separate cache files per workspace

🔒 Security Fixes

Pinned Kubernetes versions: Fixed high-severity issue where kubelet, kubeadm, kubectl versions weren't pinned
Version consistency: Prevents automatic patch updates that could cause cluster instabilities
Role defaults: Changed from 'latest' to specific pinned versions for production safety

⚡ Performance Improvements

Command	Before	After	Improvement
`cluster-info` (first run)	22s	7.2s	3x faster
`cluster-info` (cached)	22s	0.44s	50x faster
`cluster-info --quick`	N/A	0.1s	220x faster

🧪 Testing

✅ All tests passing (100% success rate)
✅ Comprehensive test suite with 59 tests
✅ Performance benchmarking validated
✅ No breaking changes - fully backward compatible

🔧 Technical Changes

Optimized terraform operations: Smart workspace state management
Enhanced caching strategy: Multi-level cache with intelligent invalidation
Reduced I/O operations: Better cache file handling
Network efficiency: Fewer remote state API calls
Security hardening: Kubernetes component version pinning

🔧 Code Quality Improvements

Magic number elimination: Replaced hardcoded values with named constants in terraform
Hostname collision prevention: Added mandatory RELEASE_LETTER to all environments
Code consistency: Enhanced error handling and validation in scripts

📚 Documentation Updates

Updated CHANGELOG.md with detailed performance metrics
Enhanced RELEASE_NOTES.md with v1.1.0 changes
Updated help text to include --quick option
Added performance benchmarks

🔄 Migration

No migration needed - all existing commands work as before
New --quick flag available for ultra-fast cluster information
Kubernetes versions now properly pinned for consistency

🎯 Ready for Release

- Implemented Pi-hole API library for DNS record management (lib/pihole_api.sh) - Developed SSH utilities for managing known hosts and connections (lib/ssh_utils.sh) - Created core module for context management, secrets loading, and workspace handling (modules/00_core.sh) - Added Terraform/OpenTofu management module for infrastructure deployment (modules/60_tofu.sh) - Introduced test script to validate modular architecture and functionality (test_modules.sh) - Added placeholder for Pi-hole DNS script (scripts/add_pihole_dns.py.new)

…arization - Extracted Terraform/OpenTofu functionality into modules/60_tofu.sh - Implemented commands for deployment, VM management, hostname generation, kubeconfig retrieval, and environment loading. - Integrated modular functions into the main cpc script with zero breaking changes. - Enhanced code organization and documentation. - Created modules/10_proxmox.sh for Proxmox VM management - Implemented interactive commands for adding and removing VMs with safety confirmations. - Updated environment management and Terraform integration. - Maintained backward compatibility and improved code quality.

…dule and update main script for modular architecture

- Created modules/30_k8s_cluster.sh for managing Kubernetes cluster lifecycle. - Added functions for bootstrap, kubeconfig retrieval, upgrade, and node reset. - Enhanced modules/60_tofu.sh with cluster-info command for infrastructure details. - Updated test_modules.sh to include tests for new K8s cluster module functions. - Documented the new module and its functionalities in k8s_cluster_module_30_completion_report.md. - Ensured proper separation of concerns between Tofu and K8s modules.

…sive operations

… related commands for enhanced architecture

- Introduced cpc_core() as the main entry point for core commands. - Migrated existing commands to dedicated functions: - setup-cpc → core_setup_cpc() - ctx → core_ctx() - clone-workspace → core_clone_workspace() - delete-workspace → core_delete_workspace() - load_secrets → core_load_secrets_command() - Updated load_secrets() to fully integrate SOPS for secrets management. - Improved error handling and validation across all core functions. - Enhanced documentation and help text for better user guidance. - Clean separation of core functionality from operational commands.

…nd DNS operations

…vironments

- Removed empty static IP implementation summary file. - Added commands to Proxmox module for creating VM templates and controlling VMs. - Introduced advanced inventory cache update in Ansible module to fetch comprehensive cluster information. - Added functionality to reset all Kubernetes nodes in the K8s nodes module. - Updated Terraform locals to remove fixed VM ID ranges for specific workspaces. - Implemented SSH management module for clearing known_hosts and managing SSH connections.

Translate Russian comments and documentation to English across the project to improve code readability and maintainability for a wider audience. - Updated all shell scripts in `modules/`. - Updated Ansible playbooks in `ansible/playbooks/`. - Translated documentation file in `docs/`. - Included user-requested changes to `cpc` and `install_kubernetes_cluster.yml`.

- Translated Russian comments to English in all module files (00_core.sh through 80_ssh.sh) - Translated Russian comments in all Ansible playbooks - Translated Russian comments in Terraform configuration files - Translated complete Russian documentation to English - Improved project accessibility for international contributors This completes the internationalization of the codebase as requested in Step 1.

…tation - Redesigned README.md with modern formatting and visual appeal - Added comprehensive table of contents with navigation - Enhanced overview section with clear value propositions - Expanded key features with detailed descriptions and icons - Improved quick start guide with 5-minute setup process - Added detailed installation and configuration sections - Included extensive usage examples for common operations - Enhanced architecture section with visual diagrams - Added workspace system documentation with status indicators - Included testing and validation procedures - Added contribution guidelines and support information - Improved project structure documentation - Added badges for technology stack and status indicators This creates a professional, user-friendly README that serves as both documentation and marketing material for the CPC project.

…s/60_tofu.sh - Translated the last remaining Russian comment '# Добавить переменную в массив команды tofu' to '# Add variable to tofu command array' - Verified complete removal of all Russian characters from codebase - Phase 1 documentation improvements are now fully complete All Russian comments have been successfully translated to English across: - 7 core modules (00_core.sh through 80_ssh.sh) - 5 Ansible playbooks - 2 Terraform configuration files - 1 documentation file (static_ip_configuration.md) - Enhanced README.md with comprehensive project documentation

- Added 'if [[ "${BASH_SOURCE[0]}" == "${0}" ]];' checks to modules: - modules/00_core.sh - Core functionality module - modules/40_k8s_nodes.sh - Kubernetes node management module - modules/50_cluster_ops.sh - Cluster operations module This ensures all modules follow the same pattern and cannot be executed directly, preventing potential issues and maintaining consistency across the codebase. All modules now have proper direct execution protection as required by the test suite.

- Added 'if [[ "${BASH_SOURCE[0]}" == "${0}" ]];' checks to all modules: - modules/20_ansible.sh - Ansible playbook management module - modules/30_k8s_cluster.sh - Kubernetes cluster lifecycle module - modules/70_dns_ssl.sh - DNS and SSL certificate management module - modules/80_ssh.sh - SSH management module This ensures complete consistency across all modules and prevents direct execution, which could cause issues with missing dependencies or incorrect environment setup. All modules in the CPC project now follow the same security pattern.

- Removed empty modules/10_workspace.sh file that was causing test failures - This file contained no code and was not being used by the system All remaining modules now have proper structure and direct execution checks.

- ✅ All 42 unit tests passing - ✅ All 8 integration tests passing - ✅ Comprehensive test framework established - ✅ Fixed module structure issues (added direct execution guards) - ✅ Removed empty module file causing test failures - ✅ Test automation with Makefile and run_tests.sh - ✅ Virtual environment setup for dependency isolation - ✅ Linting tools integrated (shellcheck, bashate, ansible-lint) Phase 3 (Testing) is now complete and ready to support Phase 2 (Error Handling) development with robust regression detection capabilities.

- ✅ Fixed all critical ShellCheck errors (SC2145, SC2162, SC2181) - ✅ Added -r flag to all read commands for better security - ✅ Replaced $? checks with direct command execution checks - ✅ Fixed array handling in log_debug statements - ✅ Added error checking for pushd/popd operations - ✅ Improved error handling patterns throughout codebase Code quality improvements: - Better error handling with if ! command patterns - More secure read operations with -r flag - Proper array expansion in logging statements - Enhanced pushd/popd error checking All tests still pass ✅, but code is now more robust and follows best practices.

…generation - Add automatic secrets loading on CPC startup - Fix RELEASE_LETTER handling for proper hostname generation - Improve tofu command retry logic for plan/apply operations - Add support for informational commands (--help, --version) - Skip hostname generation for non-deployment commands - Fix exit code handling for tofu plan command - Add verbose/quiet modes for secrets loading - Improve error handling and logging throughout the system

- Remove retry logic for tofu commands (max_retries=0) to prevent multiple runs - Remove 'Retrying tofu command' messages for cleaner output - Add proper interactive input handling for apply and destroy commands - Fix hostname generation script variable name (SSH_USERNAME -> VM_USERNAME) - Improve error handling and timeout management - Commands now work correctly in both interactive and non-interactive modes

This commit includes all fixes for tofu command execution and testing: 🔧 Tofu Commands Fixes: - Fixed hanging issues with apply/destroy commands - Added proper interactive input handling for apply and destroy - Removed retry logic to prevent multiple command executions - Removed 'Retrying tofu command' messages for cleaner output - Fixed hostname generation script variable name (SSH_USERNAME -> VM_USERNAME) 🧪 Testing Infrastructure: - Installed pytest and bashate dependencies - Fixed test framework to handle timeouts properly - Added pipx PATH to test runner script - Fixed workspace operations test - All unit and integration tests now pass 📝 Code Quality: - Improved error handling and timeout management - Better interactive vs non-interactive command handling - Cleaner output without unnecessary retry messages ✅ All systems working: - tofu apply/destroy/plan commands work correctly - Interactive input works in terminal, auto-approve in scripts - Full test suite passes (42 unit + 8 integration tests) - Linting tools working properly

✨ Features: - Add cluster-info --quick mode (0.1s execution) - Implement two-tier terraform caching system - Smart workspace detection and switching - Context-aware cache management ⚡ Performance: - cluster-info: 22s → 0.44s (50x faster) - cluster-info --quick: 0.1s (220x faster) - First run: 22s → 7.2s (3x faster) 🧪 Testing: - All tests passing (100% success rate) - Comprehensive test suite with 59 tests - Performance benchmarking included 🔧 Technical: - Optimized terraform workspace operations - Enhanced multi-level caching strategy - Reduced I/O and network operations - Better resource utilization 📚 Documentation: - Updated CHANGELOG.md and RELEASE_NOTES.md - Added performance benchmarks - Enhanced help text with --quick option

Fix high-severity issue: Pin kubelet, kubeadm, and kubectl versions to prevent inconsistencies from automatic patch updates. Changes: - Pin Debian/Ubuntu packages to KUBERNETES_LONG_VERSION-1.1 format - Pin RedHat/Rocky packages to KUBERNETES_LONG_VERSION format - Ensures cluster consistency across future playbook runs Security Impact: Prevents unintended version drift that could cause cluster instability or incompatibilities. Refs: install_kubernetes_cluster.yml

Change role default from 'latest' to pinned version '1.29.0-1.1' to prevent automatic patch updates that could cause cluster inconsistencies. This complements the previous fix in install_kubernetes_cluster.yml and ensures both playbooks and roles use pinned versions by default. Impact: Improves cluster stability and prevents version drift.

…TTER, remove magic numbers - Translate Russian comments in containerd-config.toml.j2 to English - Add RELEASE_LETTER to all environment files for hostname collision prevention - Replace magic numbers in terraform/locals.tf with named constants - Enforce mandatory RELEASE_LETTER in hostname generation script - Add comprehensive static-ip-example.env template

gemini-code-assist

Summary of Changes

Hello @abevz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This release focuses on significantly improving the performance and stability of the CPC cluster management tool. It introduces a multi-layered caching system for faster operations and implements critical security measures by pinning Kubernetes versions. The underlying architecture has been modularized for better maintainability, complemented by a robust error handling framework. These changes aim to provide a more reliable and efficient experience for deploying and managing Kubernetes clusters.

Highlights

Major Performance Optimizations: The cluster-info command has been significantly optimized, achieving up to a 220x speed improvement, reducing execution time from 22+ seconds to under 0.5 seconds, including a new ultra-fast --quick mode.
Critical Security Fixes: Kubernetes component versions (kubelet, kubeadm, kubectl) are now explicitly pinned, preventing automatic patch updates that could lead to instability and enhancing production safety.
Intelligent Caching System: A new two-tier caching mechanism for Terraform operations has been implemented, featuring short-term (30s) and long-term (5min) cache layers, smart workspace detection, and context-aware cache management for improved efficiency.
Extensive Code Modularization: The codebase has undergone a significant modularization effort, extracting core functionalities into dedicated shell modules (e.g., Proxmox, Ansible, K8s Cluster, DNS/SSL), greatly enhancing organization, maintainability, and testability.
Robust Error Handling & Recovery: A new, comprehensive error handling, retry, timeout, and recovery system has been integrated across the project, improving stability and resilience during operations.
Enhanced Ansible Playbooks: Ansible playbooks have been extensively updated for improved stability, idempotency, and new features, including streamlined Kubernetes initialization with Harbor integration and Helm-based addon installations.
Comprehensive Testing Framework: New bash and Python-based test suites have been added, covering unit, integration, and linting tests, ensuring high code quality and validating functionality across the project.
Dynamic IP Allocation System: The static IP allocation has been refactored to a dynamic, workspace-block-based system, providing more flexible and automated IP management for clusters.
Streamlined Documentation & Contribution: New GitHub issue and pull request templates have been added, contributing guidelines updated, and outdated documentation reports removed, simplifying community engagement and project understanding.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull Request Overview

This PR introduces significant performance optimizations to the CPC cluster management tool, improving cluster-info command execution time from 22+ seconds to under 0.5 seconds. Additionally, it includes critical security fixes for Kubernetes version pinning and comprehensive code quality improvements with enhanced error handling.

Key changes include:

Performance optimization with two-tier caching system (30s/5min cache layers)
Security hardening through Kubernetes version pinning
Enhanced error handling and recovery mechanisms across scripts

Reviewed Changes

Copilot reviewed 103 out of 127 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`tests/integration/test_cpc_workflows.py`	New comprehensive integration test suite with workflow validation
`tests/__init__.py`	Test framework infrastructure for command execution and validation
`test_*.sh`	Test scripts for modular architecture, error handling, and deep integration
`terraform/variables.tf`	Added static IP configuration variables for improved network management
`terraform/providers.tf`	Added AWS/MinIO provider configuration for S3 backend
`terraform/outputs.tf`	Simplified outputs to cluster summary and Ansible inventory format
`terraform/nodes.tf`	Enhanced IP configuration with dynamic assignment logic
`terraform/locals.tf`	Major refactoring with named constants and improved workspace IP mapping
`scripts/vm_template/*.sh`	Removed debug output messages for cleaner production logs
`scripts/*.sh`	Enhanced error handling, recovery mechanisms, and validation functions
`requirements-test.txt`	Test dependencies for Python testing framework
`pytest.ini`	Python test configuration
`run_tests.sh`	Unified test runner script
`prepare_release.sh`	Release preparation and cleanup automation

Comments suppressed due to low confidence (1)

tests/integration/test_cpc_workflows.py:1

The 'os' module is imported and used at line 323 in the test_environment_isolation method but not declared at the top of the file with other imports. Move this import to line 10 with the other module imports.

#!/usr/bin/env python3

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-02T18:59:10Z

  # 2. Otherwise, fall back to the map with the workspace name
  # 3. If that fails too, use "x" as a fallback
-  release_letter = var.release_letter != "" ? var.release_letter : lookup(local.release_letters_map, local.effective_os_type, "x")
+  release_letter = var.release_letter


The release_letter assignment lacks validation and fallback logic. According to the PR description mentioning 'mandatory RELEASE_LETTER to all environments' and the requirement shown in generate_node_hostnames.sh lines 44-58, this should validate that release_letter is not empty and provide appropriate error handling.

Suggested change

release_letter = var.release_letter

release_letter_map = {

"test-auto-release" = "a"

"k8s-test" = "b"

"k8s133" = "c"

"debian" = "d"

"ubuntu" = "u"

"rocky" = "r"

"suse" = "s"

# Add other workspace names and their release letters as needed

}

release_letter = coalesce(

trimspace(var.release_letter),

lookup(local.release_letter_map, local.effective_os_type, ""),

"x"

)

Copilot · 2025-09-02T18:59:11Z

+local node_ips
+local node_names


Variables are declared as local but used outside their scope. The 'local' keyword should be removed since these variables are used in the while loop at line 303 and throughout the rest of the function.

Suggested change

local node_ips

local node_names

node_ips=

node_names=

Copilot · 2025-09-02T18:59:11Z

+TEMPLATE_COMMAND="
 export PROXMOX_HOST='$PROXMOX_HOST'
 export PROXMOX_USERNAME='$PROXMOX_USERNAME'


Environment variables containing sensitive credentials are being passed through SSH command strings, which could be exposed in process lists or logs. Consider using SSH environment forwarding or secure file transfer instead.

Copilot · 2025-09-02T18:59:12Z

+  if ! pushd "$terraform_dir" >/dev/null; then
+    error_handle "$ERROR_EXECUTION" "Failed to change to terraform directory: $terraform_dir" "$SEVERITY_HIGH" "abort"
+    return 1
+  fi


The error handling calls 'abort' action but then uses 'return 1' which is redundant since 'abort' should exit the script. Either remove the return statement or change the action to 'continue'.

Copilot · 2025-09-02T18:59:12Z

+       # Error message is already printed inside the function
+       sys.exit(1)


The comment on line 524 is misleading and unhelpful. Replace with a descriptive error message explaining what failed, such as 'Failed to retrieve VM information from Terraform outputs'.

Suggested change

# Error message is already printed inside the function

sys.exit(1)

print("Error: Failed to retrieve VM information from Terraform outputs.", file=sys.stderr)

sys.exit(1)

Copilot · 2025-09-02T18:59:12Z

+    for key, node_data in local.final_nodes_map : key => {
+      VM_ID    = node_data.vm_id
+      hostname = node_data.hostname
+      IP       = node_data.static_ip_address # This field is now available


The inline comment '# This field is now available' is vague and doesn't explain what changed or why it's significant. Either remove it or provide more context about what was changed.

Suggested change

for key, node_data in local.final_nodes_map : key => {

VM_ID = node_data.vm_id

hostname = node_data.hostname

IP = node_data.static_ip_address # This field is now available

IP = node_data.static_ip_address

gemini-code-assist

Code Review

This is an excellent and substantial pull request that significantly improves the project's performance, security, code quality, and documentation. The modularization of shell scripts into libraries, the addition of a comprehensive test suite, and the major refactoring of Ansible playbooks are all fantastic enhancements. My review identifies a few potential issues to further improve the safety and maintainability of the new code.

gemini-code-assist · 2025-09-02T19:00:07Z

+# ansible/playbooks/pb_delete_node.yml

+- name: Delete Node from Kubernetes Cluster
+  hosts: all


Using hosts: all in this playbook is potentially dangerous, as it could lead to accidental operations on all nodes if the playbook is run without a --limit flag. To make it safer, it's better to make the host target explicit by using a variable.

hosts: "{{ node_to_delete }}"

gemini-code-assist · 2025-09-02T19:00:08Z

  vars:
-    addon_name: "{{ addon_name | default('all') }}" # calico, coredns, metallb, metrics-server, cert-manager, kubelet-serving-cert-approver, argocd, ingress-nginx, or 'all'
-    requested_version: "{{ addon_version | default('') }}" # Specific version or empty for defaults
+    addon_name: "{{ addon_name | default('all') }}"
+    requested_version: "{{ addon_version | default('') }}"
    control_plane_node: "{{ groups['control_plane'][0] }}"
-
-    # Default addon versions from environment
+
+    # Cloudflare and Let's Encrypt settings
+    cloudflare_api_token: "{{ lookup('env', 'CLOUDFLARE_DNS_API_TOKEN') }}"
+    cloudflare_email: "{{ lookup('env', 'CLOUDFLARE_EMAIL') }}"
+    letsencrypt_server: "https://acme-v02.api.letsencrypt.org/directory"
+
+    # Default addon versions
    default_calico_version: "{{ calico_version | default('v3.28.0') }}"
    default_metallb_version: "{{ metallb_version | default('v0.14.8') }}"
    default_metrics_server_version: "{{ metrics_server_version | default('v0.7.2') }}"
    default_coredns_version: "{{ coredns_version | default('v1.11.3') }}"
    default_cert_manager_version: "{{ cert_manager_version | default('v1.16.2') }}"
-    default_kubelet_serving_cert_approver_version: "{{ kubelet_serving_cert_approver_version | default('v0.9.2') }}"
+    default_kubelet_serving_cert_approver_version: >-
+      {{ kubelet_serving_cert_approver_version | default('v0.9.2') }}
    default_argocd_version: "{{ argocd_version | default('v2.13.2') }}"
    default_ingress_nginx_version: "{{ ingress_nginx_version | default('v1.12.0') }}"
-
+    default_traefik_gateway_version: "{{ traefik_gateway_version | default('37.0.0') }}"
+    default_gateway_api_version: "{{ gateway_api_version | default('v1.1.0') }}"
+
    # Resolved versions
-    calico_target_version: "{{ requested_version if requested_version != '' else default_calico_version }}"
-    metallb_target_version: "{{ requested_version if requested_version != '' else default_metallb_version }}"
-    metrics_server_target_version: "{{ requested_version if requested_version != '' else default_metrics_server_version }}"
-    coredns_target_version: "{{ requested_version if requested_version != '' else default_coredns_version }}"
-    cert_manager_target_version: "{{ requested_version if requested_version != '' else default_cert_manager_version }}"
-    kubelet_serving_cert_approver_target_version: "{{ requested_version if requested_version != '' else default_kubelet_serving_cert_approver_version }}"
-    argocd_target_version: "{{ requested_version if requested_version != '' else default_argocd_version }}"
-    ingress_nginx_target_version: "{{ requested_version if requested_version != '' else default_ingress_nginx_version }}"
+    calico_target_version: >-
+      {{ requested_version if requested_version != '' else default_calico_version }}
+    metallb_target_version: >-
+      {{ requested_version if requested_version != '' else default_metallb_version }}
+    metrics_server_target_version: >-
+      {{ requested_version if requested_version != '' else default_metrics_server_version }}
+    coredns_target_version: >-
+      {{ requested_version if requested_version != '' else default_coredns_version }}
+    cert_manager_target_version: >-
+      {{ requested_version if requested_version != '' else default_cert_manager_version }}
+    kubelet_serving_cert_approver_target_version: >-
+      {{ requested_version if requested_version != ''
+        else default_kubelet_serving_cert_approver_version }}
+    argocd_target_version: >-
+      {{ requested_version if requested_version != '' else default_argocd_version }}
+    ingress_nginx_target_version: >-
+      {{ requested_version if requested_version != '' else default_ingress_nginx_version }}
+    traefik_gateway_target_version: >-
+      {{ requested_version if requested_version != '' else default_traefik_gateway_version }}
+    gateway_api_target_version: >-
+      {{ requested_version if requested_version != '' else default_gateway_api_version }}
+


This playbook has been significantly refactored, which is great. However, it seems that support for several addons has been removed compared to the previous version. The old version handled calico, coredns, metallb, metrics-server, cert-manager, kubelet-serving-cert-approver, argocd, and ingress-nginx. The new version only seems to handle metrics-server, calico, metallb, cert-manager, and argocd.

Could you please clarify if the removal of coredns, kubelet-serving-cert-approver, and ingress-nginx was intentional?

Additionally, the variables section defines default_traefik_gateway_version, but there are no tasks to install or manage Traefik. This seems inconsistent.

gemini-code-assist · 2025-09-02T19:00:08Z

+
+push:
+	@echo "Pushing to remote..."
+	git push origin feature/improvements


The push target is hardcoded to the feature/improvements branch. This makes the target inflexible and will fail if used on any other branch. It's better to dynamically get the current branch name.

git push origin $(shell git rev-parse --abbrev-ref HEAD)

gemini-code-assist · 2025-09-02T19:00:08Z

+  echo "Warning: Could not get node information from terraform output. Falling back to default node definitions."
+  # Fallback logic for new workspaces
+  HOSTNAMES=() # Ensure it's empty
+  ROLES=("c" "w" "w")
+  INDICES=("1" "2" "3") # Note: Terraform logic uses original_index 1, 1, 2. Let's stick to simple logic here for fallback.
 fi


The fallback logic when tofu output fails defaults to a static configuration of one control plane and two worker nodes. This might not reflect the user's intended configuration, especially in a new workspace where tofu apply hasn't been run yet. This could lead to generating incorrect or incomplete hostname snippets.

Consider failing the script with a clear error message if the Terraform output is unavailable, instructing the user to run tofu apply first. This would be safer than generating a potentially incorrect configuration.

Suggested change

echo "Warning: Could not get node information from terraform output. Falling back to default node definitions."

# Fallback logic for new workspaces

HOSTNAMES=() # Ensure it's empty

ROLES=("c" "w" "w")

INDICES=("1" "2" "3") # Note: Terraform logic uses original_index 1, 1, 2. Let's stick to simple logic here for fallback.

fi

else

echo "Error: Could not get node information from terraform output. Please run 'cpc deploy apply' first."

exit 1

fi

🚀 Release v1.1.0: Major Performance Optimizations & Security Fixes

abevz added 30 commits August 16, 2025 15:19

feat(modularization): Extract Ansible functionality into dedicated mo…

45cf0af

…dule and update main script for modular architecture

feat(k8s): Implement Kubernetes node management module with comprehen…

0c691a0

…sive operations

feat(modularization): Implement cluster operations module and migrate…

6030f0a

… related commands for enhanced architecture

feat(dns_ssl): Implement DNS/SSL management module with certificate a…

794c978

…nd DNS operations

refactor(locals): Remove unused workspace IP mappings for k8s test en…

dfcf878

…vironments

feat(k8s): Add cluster status check command to monitor Kubernetes health

37bea1f

refactor(core): Change env CLUSTER_CONTEXT_FILE on CPC_CONTEXT_FILE

c344775

refactor(core): Change final_node_map and node naming

f49d619

refactor(core): output.tf and locals.tf

8efc71e

refactor(core): some fixes

6833365

refactor(core): some fixes

3756b0c

refactor(core): some fixes

51b0034

refactor(core): some fixes

bf0e76c

refactor(core): some fixes 18.08

04088bb

fixe bootstrap

0ab999e

fixe dns-pihole functionality

0f3d176

fixe get-config

371f0dc

some fixes in ansible_run_playbook

c93986a

some fixes in functionality for work with nodes

2957b91

fixed add_nodes functionality

31420c5

fixed delete_workspace

a9a1ec0

fixed deploy apply and bootstrap

2e5edf4

fixed ./cpc clone-workspace and ./cpc ctx

b37047f

fixed modules/80_ssh.sh -> ssh_clear_hosts()

67a68e0

abevz added 23 commits August 21, 2025 20:28

fixed playbook validate_cluster

41d9afe

add installation traefik as addon

01c1ba2

didnt fix proxy protocol

7b17ee9

didnt fix proxy protocol

0ffacbd

fixed ansible-lint recommendations in pb_upgrade_addons_extended.yml

e1544da

Fix: Corrected generate_node_hostname and kubeconfig merge

40be38a

Fix: upgrade-addons in part Metrics Server

42beafb

fix: Remove empty module file

8833266

- Removed empty modules/10_workspace.sh file that was causing test failures - This file contained no code and was not being used by the system All remaining modules now have proper structure and direct execution checks.

Update PR description with code review fixes

8880d5a

abevz requested a review from Copilot September 2, 2025 18:56

gemini-code-assist Bot reviewed Sep 2, 2025

View reviewed changes

abevz merged commit 30d9013 into main Sep 2, 2025

abevz deleted the feature/improvements branch September 2, 2025 18:59

Copilot AI reviewed Sep 2, 2025

View reviewed changes

gemini-code-assist Bot reviewed Sep 2, 2025

View reviewed changes

abevz added a commit that referenced this pull request Sep 8, 2025

Merge pull request #3 from abevz/feature/improvements

0eee3c6

🚀 Release v1.1.0: Major Performance Optimizations & Security Fixes

-  release_letter = var.release_letter
+  release_letter_map = {
+    "test-auto-release" = "a"
+    "k8s-test" = "b"
+    "k8s133" = "c"
+    "debian" = "d"
+    "ubuntu" = "u"
+    "rocky" = "r"
+    "suse" = "s"
+    # Add other workspace names and their release letters as needed
+  }
+  release_letter = coalesce(
+    trimspace(var.release_letter),
+    lookup(local.release_letter_map, local.effective_os_type, ""),
+    "x"
+  )

		# Error message is already printed inside the function
		sys.exit(1)

Conversation

abevz commented Sep 2, 2025

🚀 Release v1.1.0: Major Performance Optimizations & Security Fixes

📋 Summary

✨ New Features

🔒 Security Fixes

⚡ Performance Improvements

🧪 Testing

🔧 Technical Changes

🔧 Code Quality Improvements

📚 Documentation Updates

🔄 Migration

🎯 Ready for Release

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants