AI Control Mapping with Embeddings

An intelligent n8n workflow that automatically maps security controls between any two compliance frameworks or standards using AI-powered semantic similarity. The system uses vector embeddings to find the best matches between source and target framework controls, generating a comprehensive mapping report with similarity scores.

🌟 Overview

This workflow uses Google Gemini embeddings to create vector representations of security controls, enabling semantic similarity search. Instead of simple keyword matching, the system understands the meaning and context of controls, allowing it to find related controls even when the wording differs.

✨ Features

Semantic Similarity Matching: Uses AI embeddings to find controls based on meaning, not just keywords
Flexible CSV Support: Handles various CSV formats with configurable column mappings
Similarity Scoring: Each match receives a score from 0.0 to 1.0 indicating how similar controls are
Ranked Results: Automatically ranks matches by similarity score (best match first)
Threshold Filtering: Only includes matches above your configured similarity threshold
Dynamic Output: CSV output with customizable framework names and column headers

📋 Prerequisites

n8n instance (self-hosted using Docker)
Google Gemini API key for embeddings
CSV files containing your source and target framework controls

🚀 Setup

1. Set Up n8n

Ensure Docker and Docker Compose are installed on your system
Create a directory for your n8n installation:
```
mkdir n8n
cd n8n
```

Create a docker-compose.yml file:

version: '3.8'
services:
  n8n:
    image: docker.n8n.io/n8nio/n8n
    container_name: n8n
    ports:
      - "5678:5678"
    environment:
      - GENERIC_TIMEZONE=America/New_York
      - TZ=America/New_York
      - N8N_ENFORCE_SETTINGS_FILE_PERMISSIONS=true
      - N8N_RUNNERS_ENABLED=true
    volumes:
      - n8n_data:/home/node/.n8n
      - /Users/joshdoesit/Library/CloudStorage/Dev/AI Control Mapper:/data/csv
    restart: unless-stopped

volumes:
  n8n_data:

Note: Update the CSV volume path to match your local directory where CSV files are stored.

Start n8n:
```
docker-compose up -d
```
Access n8n at http://localhost:5678
Set up your n8n account on first launch

Note: Ensure your n8n instance has access to the file system where your CSV files are stored. Mount the CSV directory as shown in the docker-compose.yml example above.

2. Configure Google Gemini Credentials

Visit Google Cloud Console
Enable the Generative Language API
Create an API key for the Generative Language API
In n8n, go to Settings > Credentials
Click Add Credential and select Google Gemini (PaLM) API
Enter your API key and save

3. Prepare Your CSV Files

Your CSV files should have the following structure:

First row must contain column headers
One column containing control IDs (e.g., "ID", "Control ID", "SCF #")
One column containing control descriptions (e.g., "Description", "Control Description", "Requirement")

Place your CSV files in the mounted volume directory specified in your docker-compose.yml file (e.g., /Users/joshdoesit/Library/CloudStorage/Dev/AI Control Mapper). Ensure n8n has read access to input CSV files and write access to the output directory.

4. Import the Workflow

In your n8n instance, click Workflows > Start from Scratch > Import from File
Select the workflow JSON file (AI Control Mapping With Embeddings.json)
The workflow will be imported with all nodes and connections
Configure the "Workflow Configuration" node with your settings (see Configuration section)

⚙️ Configuration

The workflow is configured through the "Workflow Configuration" node. Update the following parameters:

Framework Information

sourceFrameworkName (string): Name of your source framework (e.g., "PCI DSS v4.0.1")
targetFrameworkName (string): Name of your target framework (e.g., "SCF 2025.3.1")

File Paths

sourceCsvPath (string): Full path to your source framework CSV file
- Default: /data/csv/source.csv
targetCsvPath (string): Full path to your target framework CSV file
- Default: /data/csv/target.csv
outputCsvPath (string): Path where the mapping results CSV will be saved
- Default: /data/csv/mappings.csv

Column Mappings

If your CSV files use column names other than "ID" and "Description", specify them here:

sourceIdColumn (string): Column name containing source control IDs
- Default: "ID"
sourceDescriptionColumn (string): Column name containing source control descriptions
- Default: "Description"
targetIdColumn (string): Column name containing target control IDs
- Default: "ID"
targetDescriptionColumn (string): Column name containing target control descriptions
- Default: "Description"

Matching Settings

similarityThreshold (number): Minimum similarity score to include in results (0.0 to 1.0)
- Default: 0.7
- Lower values = more matches (may include less relevant results)
- Higher values = fewer matches (only very similar controls)
topMatchesCount (number): Maximum number of top matches to return per source control
- Default: 3

🔄 How It Works

Workflow Process

Configuration: The workflow reads configuration settings from the "Workflow Configuration" node
Load CSV Files:
- Reads source and target framework CSV files
- Parses and extracts control IDs and descriptions based on configured column mappings
Build Vector Store:
- Prepares target framework controls as searchable documents
- Generates vector embeddings using the Google Gemini text-embedding-004 model for each target control
- Stores all target controls in an in-memory vector database
Find Matches:
- Loops through each source framework control and generates vector embeddings
- Queries the vector store to find the most similar target controls
- Calculates similarity scores for each match
- Filters results by similarity threshold
Format and Rank:
- Formats mapping results with source and target control information
- Groups matches by source control ID
- Ranks matches by similarity score (highest first)
- Assigns rank numbers (1 = best match, 2 = second best, etc.)
Export Results:
- Prepares CSV data with dynamic headers based on framework names
- Converts to CSV format
- Writes the mapping results to the configured output file

Output Format

The output CSV contains the following columns:

[Source Framework Name] Control ID: The source control identifier
[Source Framework Name] Description: The source control description
[Target Framework Name] Control ID: The matched target control identifier
[Target Framework Name] Description: The matched target control description
Similarity Score: The similarity score (0.0 to 1.0, rounded to 4 decimal places)
Rank: The rank of this match for the source control (1 = best match)

💻 Usage

Configure the workflow: Update the "Workflow Configuration" node with your framework details, file paths, and column mappings
Ensure CSV files are accessible: Place your CSV files in the configured paths or update the paths in the configuration
Verify credentials: Ensure your Google Gemini API credentials are properly configured
Execute the workflow: Click "Execute Workflow" or trigger it manually
Review results: Check the output CSV file at the configured output path

🔧 Troubleshooting

No Matches Found

Check similarity threshold: Lower the similarityThreshold value if you're getting no results
Verify CSV format: Ensure your CSV files have the correct column headers
Check column mappings: Verify that sourceIdColumn, sourceDescriptionColumn, targetIdColumn, and targetDescriptionColumn match your CSV headers exactly

Incorrect Column Detection

Case sensitivity: Column matching is case-insensitive, but ensure spelling matches
Special characters: The workflow handles special characters in column names (e.g., "#", spaces)
BOM handling: The workflow automatically handles UTF-8 BOM characters

API Errors

Verify API key: Ensure your Google Gemini API key is valid and has the Generative Language API enabled
Check API quotas: Verify you haven't exceeded your API quota limits
Network connectivity: Ensure your n8n instance can reach Google's API endpoints

File Path Issues

File permissions: Ensure n8n has read access to input CSV files and write access to the output directory
Path format: Use absolute paths or paths relative to your n8n data directory
File existence: Verify that CSV files exist at the configured paths
Docker volumes: Ensure CSV directories are properly mounted in your docker-compose.yml file

Low Quality Matches

Increase threshold: Raise the similarityThreshold to get more precise matches
Adjust top matches: Increase topMatchesCount if you want to see more potential matches per control
Review descriptions: Ensure control descriptions are detailed and meaningful

🎨 Customization

Modifying Similarity Threshold

The similarity threshold can be adjusted based on your needs:

Stricter matching (0.8-0.9): Only very similar controls
Balanced matching (0.6-0.7): Good balance of relevance and coverage
Broader matching (0.4-0.5): More matches, may include less relevant results

Adding Custom Filtering

You can add custom filtering logic in the "Format Mapping Results" node to filter matches based on additional criteria (e.g., control categories, specific keywords, etc.).

🔬 Technical Details

Vector Embeddings

The workflow uses Google Gemini's text-embedding-004 model to convert control descriptions into high-dimensional vectors. These vectors capture semantic meaning, allowing the system to find similar controls even when exact wording differs.

Similarity Scoring

Similarity scores are calculated using cosine similarity between embedding vectors. Scores range from 0.0 (completely different) to 1.0 (identical meaning).

Performance Considerations

Vector store is built in-memory for fast querying
Embeddings are generated in batches (batch size: 1500)
Processing is done individually for each source control to ensure the most accurate matching

📄 License

This workflow is provided as-is for use in your n8n instance.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
AI Control Mapping With Embeddings.json		AI Control Mapping With Embeddings.json
Framework_Mappings.csv		Framework_Mappings.csv
PCI DSS v4.0.1 - Requirements Only.csv		PCI DSS v4.0.1 - Requirements Only.csv
README.md		README.md
SCF 2025.3.1 - Controls Only.csv		SCF 2025.3.1 - Controls Only.csv
framework-mapping-visualization.png		framework-mapping-visualization.png

JoshDoesIT/AI-Control-Mapper

Folders and files

Latest commit

History

Repository files navigation