Skip to content

feat(skills): add iac-ansible-roles-dev skill #813

@aRustyDev

Description

@aRustyDev

Summary

Create a new skill iac-ansible-roles-dev for Ansible role development patterns, variable scoping, and common pitfalls.

Context & Motivation

This skill gap was identified during a homelab Kubernetes cluster deployment using Ansible. During the implementation of a storage configuration role with LUKS encryption and etcd encryption at rest, several critical bugs were encountered that stemmed from Ansible-specific behavior that is not intuitive and not well-documented in common resources.

The Triggering Incident

A bug caused kube-apiserver to fail repeatedly with the error:

secret is not of the expected length, got 512, expected one of [16 24 32]

The root cause was variable pollution between include_role calls in Ansible. A role that retrieves secrets from 1Password was called twice:

  1. First to retrieve a LUKS keyfile (512 bytes)
  2. Then to retrieve an etcd encryption key (32 bytes, base64 encoded to 44 chars)

Because the role used set_fact to store the retrieved value, and the second invocation had a conditional skip ("value already set"), the LUKS keyfile was incorrectly used as the etcd encryption key.

This class of bug:

  • Is not obvious from reading the code
  • Doesn't produce clear error messages pointing to the cause
  • Requires understanding of Ansible's fact persistence across role invocations
  • Is easily introduced when refactoring roles for reuse

Additional Bug: Double Base64 Encoding

The same debugging session uncovered a second pattern: Ansible's slurp module base64-encodes file content, but this wasn't being decoded before use, resulting in double-encoding when the content was later base64-encoded for Kubernetes secrets.

Use Cases

This skill should support:

  1. Role Development

    • Designing reusable roles with clean interfaces
    • Understanding variable scope (play vars, role defaults, role vars, set_fact)
    • Implementing idempotent role behavior
    • Testing roles in isolation and in combination
  2. Debugging Role Interactions

    • Identifying variable pollution between role invocations
    • Understanding fact persistence and clearing
    • Debugging conditional execution logic
    • Tracing variable values through complex plays
  3. Galaxy Collection Limitations

    • Identifying when a collection has limitations (e.g., kubernetes.core.helm limited to Helm 3 when Helm 4 is current)
    • Checking for known issues in collection repositories
    • Determining if fixes are underway (PRs, roadmap)
    • Evaluating whether to:
      • Wait for upstream fix
      • Fork and patch locally
      • Contribute upstream fix
      • Work around the limitation
    • Understanding collection versioning and compatibility
  4. Common Pitfalls & Patterns

    • include_role vs import_role behavior differences
    • Variable precedence surprises
    • Handler notification across roles
    • Loop variable scoping
    • Async task patterns
    • Error handling and rescue blocks

Key Requirements

Must Include

  1. Variable Scope Reference

    • Complete precedence hierarchy with examples
    • set_fact persistence behavior
    • How to clear facts between role invocations
    • When to use vars: vs role defaults vs play vars
  2. Role Design Patterns

    • State clearing at role entry points
    • Defensive programming for shared variables
    • Role interface design (inputs, outputs, side effects)
    • Testing strategies for reusable roles
  3. Galaxy Collection Workflow

    • How to check collection issue trackers
    • Evaluating PR/fix status
    • Forking and local patching workflow
    • Contributing back upstream
    • Pinning collection versions
  4. Debugging Techniques

    • Using -vvv effectively
    • Debug module patterns
    • Fact inspection between tasks
    • Tracing variable sources
  5. Anti-Patterns to Avoid

    • Relying on fact persistence without explicit clearing
    • Conditional execution based on "already set" without validation
    • Mixing include_role and import_role without understanding differences
    • Not accounting for slurp base64 encoding

References to Include

  • Ansible documentation on variable precedence
  • Galaxy collection development guide
  • kubernetes.core collection repository
  • Common collection issue patterns

Templates/Examples

  • Role skeleton with proper state management
  • Reusable secret-retrieval role pattern
  • Galaxy collection limitation workaround template
  • Role testing playbook template

Related Issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions