An intelligent n8n workflow that automatically maps security controls between any two compliance frameworks or standards using AI-powered semantic similarity. The system uses vector embeddings to find the best matches between source and target framework controls, generating a comprehensive mapping report with similarity scores.
This workflow uses Google Gemini embeddings to create vector representations of security controls, enabling semantic similarity search. Instead of simple keyword matching, the system understands the meaning and context of controls, allowing it to find related controls even when the wording differs.
- Semantic Similarity Matching: Uses AI embeddings to find controls based on meaning, not just keywords
- Flexible CSV Support: Handles various CSV formats with configurable column mappings
- Similarity Scoring: Each match receives a score from 0.0 to 1.0 indicating how similar controls are
- Ranked Results: Automatically ranks matches by similarity score (best match first)
- Threshold Filtering: Only includes matches above your configured similarity threshold
- Dynamic Output: CSV output with customizable framework names and column headers
- n8n instance (self-hosted using Docker)
- Google Gemini API key for embeddings
- CSV files containing your source and target framework controls
-
Ensure Docker and Docker Compose are installed on your system
-
Create a directory for your n8n installation:
mkdir n8n cd n8n -
Create a
docker-compose.ymlfile:version: '3.8' services: n8n: image: docker.n8n.io/n8nio/n8n container_name: n8n ports: - "5678:5678" environment: - GENERIC_TIMEZONE=America/New_York - TZ=America/New_York - N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true - N8N_RUNNERS_ENABLED=true volumes: - n8n_data:/home/node/.n8n - /Users/joshdoesit/Library/CloudStorage/Dev/AI Control Mapper:/data/csv restart: unless-stopped volumes: n8n_data:
Note: Update the CSV volume path to match your local directory where CSV files are stored.
-
Start n8n:
docker-compose up -d
-
Access n8n at
http://localhost:5678 -
Set up your n8n account on first launch
Note: Ensure your n8n instance has access to the file system where your CSV files are stored. Mount the CSV directory as shown in the docker-compose.yml example above.
- Visit Google Cloud Console
- Enable the Generative Language API
- Create an API key for the Generative Language API
- In n8n, go to Settings > Credentials
- Click Add Credential and select Google Gemini (PaLM) API
- Enter your API key and save
Your CSV files should have the following structure:
- First row must contain column headers
- One column containing control IDs (e.g., "ID", "Control ID", "SCF #")
- One column containing control descriptions (e.g., "Description", "Control Description", "Requirement")
Place your CSV files in the mounted volume directory specified in your docker-compose.yml file (e.g., /Users/joshdoesit/Library/CloudStorage/Dev/AI Control Mapper). Ensure n8n has read access to input CSV files and write access to the output directory.
- In your n8n instance, click Workflows > Start from Scratch > Import from File
- Select the workflow JSON file (
AI Control Mapping With Embeddings.json) - The workflow will be imported with all nodes and connections
- Configure the "Workflow Configuration" node with your settings (see Configuration section)
The workflow is configured through the "Workflow Configuration" node. Update the following parameters:
- sourceFrameworkName (string): Name of your source framework (e.g., "PCI DSS v4.0.1")
- targetFrameworkName (string): Name of your target framework (e.g., "SCF 2025.3.1")
- sourceCsvPath (string): Full path to your source framework CSV file
- Default:
/data/csv/source.csv
- Default:
- targetCsvPath (string): Full path to your target framework CSV file
- Default:
/data/csv/target.csv
- Default:
- outputCsvPath (string): Path where the mapping results CSV will be saved
- Default:
/data/csv/mappings.csv
- Default:
If your CSV files use column names other than "ID" and "Description", specify them here:
- sourceIdColumn (string): Column name containing source control IDs
- Default:
"ID"
- Default:
- sourceDescriptionColumn (string): Column name containing source control descriptions
- Default:
"Description"
- Default:
- targetIdColumn (string): Column name containing target control IDs
- Default:
"ID"
- Default:
- targetDescriptionColumn (string): Column name containing target control descriptions
- Default:
"Description"
- Default:
- similarityThreshold (number): Minimum similarity score to include in results (0.0 to 1.0)
- Default:
0.7 - Lower values = more matches (may include less relevant results)
- Higher values = fewer matches (only very similar controls)
- Default:
- topMatchesCount (number): Maximum number of top matches to return per source control
- Default:
3
- Default:
-
Configuration: The workflow reads configuration settings from the "Workflow Configuration" node
-
Load CSV Files:
- Reads source and target framework CSV files
- Parses and extracts control IDs and descriptions based on configured column mappings
-
Build Vector Store:
- Prepares target framework controls as searchable documents
- Generates vector embeddings using the Google Gemini text-embedding-004 model for each target control
- Stores all target controls in an in-memory vector database
-
Find Matches:
- Loops through each source framework control and generates vector embeddings
- Queries the vector store to find the most similar target controls
- Calculates similarity scores for each match
- Filters results by similarity threshold
-
Format and Rank:
- Formats mapping results with source and target control information
- Groups matches by source control ID
- Ranks matches by similarity score (highest first)
- Assigns rank numbers (1 = best match, 2 = second best, etc.)
-
Export Results:
- Prepares CSV data with dynamic headers based on framework names
- Converts to CSV format
- Writes the mapping results to the configured output file
The output CSV contains the following columns:
[Source Framework Name] Control ID: The source control identifier[Source Framework Name] Description: The source control description[Target Framework Name] Control ID: The matched target control identifier[Target Framework Name] Description: The matched target control descriptionSimilarity Score: The similarity score (0.0 to 1.0, rounded to 4 decimal places)Rank: The rank of this match for the source control (1 = best match)
-
Configure the workflow: Update the "Workflow Configuration" node with your framework details, file paths, and column mappings
-
Ensure CSV files are accessible: Place your CSV files in the configured paths or update the paths in the configuration
-
Verify credentials: Ensure your Google Gemini API credentials are properly configured
-
Execute the workflow: Click "Execute Workflow" or trigger it manually
-
Review results: Check the output CSV file at the configured output path
- Check similarity threshold: Lower the
similarityThresholdvalue if you're getting no results - Verify CSV format: Ensure your CSV files have the correct column headers
- Check column mappings: Verify that
sourceIdColumn,sourceDescriptionColumn,targetIdColumn, andtargetDescriptionColumnmatch your CSV headers exactly
- Case sensitivity: Column matching is case-insensitive, but ensure spelling matches
- Special characters: The workflow handles special characters in column names (e.g., "#", spaces)
- BOM handling: The workflow automatically handles UTF-8 BOM characters
- Verify API key: Ensure your Google Gemini API key is valid and has the Generative Language API enabled
- Check API quotas: Verify you haven't exceeded your API quota limits
- Network connectivity: Ensure your n8n instance can reach Google's API endpoints
- File permissions: Ensure n8n has read access to input CSV files and write access to the output directory
- Path format: Use absolute paths or paths relative to your n8n data directory
- File existence: Verify that CSV files exist at the configured paths
- Docker volumes: Ensure CSV directories are properly mounted in your docker-compose.yml file
- Increase threshold: Raise the
similarityThresholdto get more precise matches - Adjust top matches: Increase
topMatchesCountif you want to see more potential matches per control - Review descriptions: Ensure control descriptions are detailed and meaningful
The similarity threshold can be adjusted based on your needs:
- Stricter matching (0.8-0.9): Only very similar controls
- Balanced matching (0.6-0.7): Good balance of relevance and coverage
- Broader matching (0.4-0.5): More matches, may include less relevant results
You can add custom filtering logic in the "Format Mapping Results" node to filter matches based on additional criteria (e.g., control categories, specific keywords, etc.).
The workflow uses Google Gemini's text-embedding-004 model to convert control descriptions into high-dimensional vectors. These vectors capture semantic meaning, allowing the system to find similar controls even when exact wording differs.
Similarity scores are calculated using cosine similarity between embedding vectors. Scores range from 0.0 (completely different) to 1.0 (identical meaning).
- Vector store is built in-memory for fast querying
- Embeddings are generated in batches (batch size: 1500)
- Processing is done individually for each source control to ensure the most accurate matching
This workflow is provided as-is for use in your n8n instance.
