Skip to content

An intelligent n8n workflow that automatically maps security controls between compliance frameworks using AI-powered semantic similarity. Uses Google Gemini embeddings to find the best matches and generates mapping reports with similarity scores.

Notifications You must be signed in to change notification settings

JoshDoesIT/AI-Control-Mapper

Repository files navigation

AI Control Mapping with Embeddings

An intelligent n8n workflow that automatically maps security controls between any two compliance frameworks or standards using AI-powered semantic similarity. The system uses vector embeddings to find the best matches between source and target framework controls, generating a comprehensive mapping report with similarity scores.

Framework Mapping Visualization

🌟 Overview

This workflow uses Google Gemini embeddings to create vector representations of security controls, enabling semantic similarity search. Instead of simple keyword matching, the system understands the meaning and context of controls, allowing it to find related controls even when the wording differs.

✨ Features

  • Semantic Similarity Matching: Uses AI embeddings to find controls based on meaning, not just keywords
  • Flexible CSV Support: Handles various CSV formats with configurable column mappings
  • Similarity Scoring: Each match receives a score from 0.0 to 1.0 indicating how similar controls are
  • Ranked Results: Automatically ranks matches by similarity score (best match first)
  • Threshold Filtering: Only includes matches above your configured similarity threshold
  • Dynamic Output: CSV output with customizable framework names and column headers

πŸ“‹ Prerequisites

  • n8n instance (self-hosted using Docker)
  • Google Gemini API key for embeddings
  • CSV files containing your source and target framework controls

πŸš€ Setup

1. Set Up n8n

  1. Ensure Docker and Docker Compose are installed on your system

  2. Create a directory for your n8n installation:

    mkdir n8n
    cd n8n
  3. Create a docker-compose.yml file:

    version: '3.8'
    services:
      n8n:
        image: docker.n8n.io/n8nio/n8n
        container_name: n8n
        ports:
          - "5678:5678"
        environment:
          - GENERIC_TIMEZONE=America/New_York
          - TZ=America/New_York
          - N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
          - N8N_RUNNERS_ENABLED=true
        volumes:
          - n8n_data:/home/node/.n8n
          - /Users/joshdoesit/Library/CloudStorage/Dev/AI Control Mapper:/data/csv
        restart: unless-stopped
    
    volumes:
      n8n_data:

    Note: Update the CSV volume path to match your local directory where CSV files are stored.

  4. Start n8n:

    docker-compose up -d
  5. Access n8n at http://localhost:5678

  6. Set up your n8n account on first launch

Note: Ensure your n8n instance has access to the file system where your CSV files are stored. Mount the CSV directory as shown in the docker-compose.yml example above.

2. Configure Google Gemini Credentials

  1. Visit Google Cloud Console
  2. Enable the Generative Language API
  3. Create an API key for the Generative Language API
  4. In n8n, go to Settings > Credentials
  5. Click Add Credential and select Google Gemini (PaLM) API
  6. Enter your API key and save

3. Prepare Your CSV Files

Your CSV files should have the following structure:

  • First row must contain column headers
  • One column containing control IDs (e.g., "ID", "Control ID", "SCF #")
  • One column containing control descriptions (e.g., "Description", "Control Description", "Requirement")

Place your CSV files in the mounted volume directory specified in your docker-compose.yml file (e.g., /Users/joshdoesit/Library/CloudStorage/Dev/AI Control Mapper). Ensure n8n has read access to input CSV files and write access to the output directory.

4. Import the Workflow

  1. In your n8n instance, click Workflows > Start from Scratch > Import from File
  2. Select the workflow JSON file (AI Control Mapping With Embeddings.json)
  3. The workflow will be imported with all nodes and connections
  4. Configure the "Workflow Configuration" node with your settings (see Configuration section)

βš™οΈ Configuration

The workflow is configured through the "Workflow Configuration" node. Update the following parameters:

Framework Information

  • sourceFrameworkName (string): Name of your source framework (e.g., "PCI DSS v4.0.1")
  • targetFrameworkName (string): Name of your target framework (e.g., "SCF 2025.3.1")

File Paths

  • sourceCsvPath (string): Full path to your source framework CSV file
    • Default: /data/csv/source.csv
  • targetCsvPath (string): Full path to your target framework CSV file
    • Default: /data/csv/target.csv
  • outputCsvPath (string): Path where the mapping results CSV will be saved
    • Default: /data/csv/mappings.csv

Column Mappings

If your CSV files use column names other than "ID" and "Description", specify them here:

  • sourceIdColumn (string): Column name containing source control IDs
    • Default: "ID"
  • sourceDescriptionColumn (string): Column name containing source control descriptions
    • Default: "Description"
  • targetIdColumn (string): Column name containing target control IDs
    • Default: "ID"
  • targetDescriptionColumn (string): Column name containing target control descriptions
    • Default: "Description"

Matching Settings

  • similarityThreshold (number): Minimum similarity score to include in results (0.0 to 1.0)
    • Default: 0.7
    • Lower values = more matches (may include less relevant results)
    • Higher values = fewer matches (only very similar controls)
  • topMatchesCount (number): Maximum number of top matches to return per source control
    • Default: 3

πŸ”„ How It Works

Workflow Process

  1. Configuration: The workflow reads configuration settings from the "Workflow Configuration" node

  2. Load CSV Files:

    • Reads source and target framework CSV files
    • Parses and extracts control IDs and descriptions based on configured column mappings
  3. Build Vector Store:

    • Prepares target framework controls as searchable documents
    • Generates vector embeddings using the Google Gemini text-embedding-004 model for each target control
    • Stores all target controls in an in-memory vector database
  4. Find Matches:

    • Loops through each source framework control and generates vector embeddings
    • Queries the vector store to find the most similar target controls
    • Calculates similarity scores for each match
    • Filters results by similarity threshold
  5. Format and Rank:

    • Formats mapping results with source and target control information
    • Groups matches by source control ID
    • Ranks matches by similarity score (highest first)
    • Assigns rank numbers (1 = best match, 2 = second best, etc.)
  6. Export Results:

    • Prepares CSV data with dynamic headers based on framework names
    • Converts to CSV format
    • Writes the mapping results to the configured output file

Output Format

The output CSV contains the following columns:

  • [Source Framework Name] Control ID: The source control identifier
  • [Source Framework Name] Description: The source control description
  • [Target Framework Name] Control ID: The matched target control identifier
  • [Target Framework Name] Description: The matched target control description
  • Similarity Score: The similarity score (0.0 to 1.0, rounded to 4 decimal places)
  • Rank: The rank of this match for the source control (1 = best match)

πŸ’» Usage

  1. Configure the workflow: Update the "Workflow Configuration" node with your framework details, file paths, and column mappings

  2. Ensure CSV files are accessible: Place your CSV files in the configured paths or update the paths in the configuration

  3. Verify credentials: Ensure your Google Gemini API credentials are properly configured

  4. Execute the workflow: Click "Execute Workflow" or trigger it manually

  5. Review results: Check the output CSV file at the configured output path

πŸ”§ Troubleshooting

No Matches Found

  • Check similarity threshold: Lower the similarityThreshold value if you're getting no results
  • Verify CSV format: Ensure your CSV files have the correct column headers
  • Check column mappings: Verify that sourceIdColumn, sourceDescriptionColumn, targetIdColumn, and targetDescriptionColumn match your CSV headers exactly

Incorrect Column Detection

  • Case sensitivity: Column matching is case-insensitive, but ensure spelling matches
  • Special characters: The workflow handles special characters in column names (e.g., "#", spaces)
  • BOM handling: The workflow automatically handles UTF-8 BOM characters

API Errors

  • Verify API key: Ensure your Google Gemini API key is valid and has the Generative Language API enabled
  • Check API quotas: Verify you haven't exceeded your API quota limits
  • Network connectivity: Ensure your n8n instance can reach Google's API endpoints

File Path Issues

  • File permissions: Ensure n8n has read access to input CSV files and write access to the output directory
  • Path format: Use absolute paths or paths relative to your n8n data directory
  • File existence: Verify that CSV files exist at the configured paths
  • Docker volumes: Ensure CSV directories are properly mounted in your docker-compose.yml file

Low Quality Matches

  • Increase threshold: Raise the similarityThreshold to get more precise matches
  • Adjust top matches: Increase topMatchesCount if you want to see more potential matches per control
  • Review descriptions: Ensure control descriptions are detailed and meaningful

🎨 Customization

Modifying Similarity Threshold

The similarity threshold can be adjusted based on your needs:

  • Stricter matching (0.8-0.9): Only very similar controls
  • Balanced matching (0.6-0.7): Good balance of relevance and coverage
  • Broader matching (0.4-0.5): More matches, may include less relevant results

Adding Custom Filtering

You can add custom filtering logic in the "Format Mapping Results" node to filter matches based on additional criteria (e.g., control categories, specific keywords, etc.).

πŸ”¬ Technical Details

Vector Embeddings

The workflow uses Google Gemini's text-embedding-004 model to convert control descriptions into high-dimensional vectors. These vectors capture semantic meaning, allowing the system to find similar controls even when exact wording differs.

Similarity Scoring

Similarity scores are calculated using cosine similarity between embedding vectors. Scores range from 0.0 (completely different) to 1.0 (identical meaning).

Performance Considerations

  • Vector store is built in-memory for fast querying
  • Embeddings are generated in batches (batch size: 1500)
  • Processing is done individually for each source control to ensure the most accurate matching

πŸ“„ License

This workflow is provided as-is for use in your n8n instance.

About

An intelligent n8n workflow that automatically maps security controls between compliance frameworks using AI-powered semantic similarity. Uses Google Gemini embeddings to find the best matches and generates mapping reports with similarity scores.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published