Skip to content

This repo show how to use all variants of MongoDB's client side encryption (CSFLE Deterministic, CSFLE Random, QE) and their effect on the encypted data.

License

Notifications You must be signed in to change notification settings

bartpolot/mongodb_clientsideencryption_demo

Repository files navigation

Telco MongoDB Client-Side Encryption Demo

A demonstration project for MongoDB Client-Side Field Level Encryption (CSFLE) and Queryable Encryption with telecommunications data. The demo generates a number of very simplified customers and a number of calls among those customers. It stores copies the calls in 3 different collections:

  • One encrypted with CSFLE-Deterministic
  • One encrypted with CSFLE-Random
  • One encrypted with Queryable Encryption

Important: This code is for demonstration purposes only and includes simplifications that should never be used in production. In a real deployment, always store the Customer Master Key (CMK) in a secure Key Management System (KMS), use a strong random number generator for key creation, and enforce strict access controls (RBAC) on key vault access, implement robust error handling, etc.

Table of Contents

Project Structure

  • generate_data.py - Generates customer and call data with multiple encryption strategies
  • secure_query.py - Demonstrates querying encrypted data with proper decryption
  • unsecure_query.py - Shows what encrypted data looks like without decryption keys
  • benchmark.py - Performance benchmarking for encryption operations
  • schemas.py - Centralized encryption schema definitions
  • test.py - Test utilities
  • group.py - Performs a frequency analysis grouping the (encrypted) calls by caller.

Setup

1. Install Dependencies

pip install -r req.txt

2. Configure Environment

Copy the example environment file and configure it:

cp env.example .env

Edit .env with your values:

  • MongoDB connection string
  • Database and collection names
  • Path to MongoDB crypt shared library
  • Customer Master Key path

3. Generate Customer Master Key

🚨 BIG FAT DISCLAIMER 🚨
This is for demo purposes only.
Never generate production keys on a random CLI.
Never store production keys in a file.
Never trust a random person on the internet with the security of you application.
Oh, wait...

That being said, create a local demo encryption key:

openssl rand 96 | base64 > cmk.txt

Or if you don't have the openssl CLI:

python -c "import os, base64; print(base64.b64encode(os.urandom(96)).decode())" > cmk.txt

Configuration

All configuration is managed through environment variables in .env:

MongoDB Connection

  • MONGODB_URL - Your MongoDB connection string

Data Generation

  • NUM_CUSTOMERS - Number of customers to generate (default: 4)
  • NUM_CALLS - Number of call records to generate (default: 100)

Database Configuration

  • DB_NAME - Database name (default: telco_encryption)
  • KEY_VAULT_COLLECTION_NAME - Key vault collection (default: __keyVault)

Collection Names

  • CUSTOMERS_COLLECTION - Plaintext customers collection
  • CUSTOMERS_ENC_COLLECTION - Encrypted customers collection
  • CALLS_ENC_DET_COLLECTION - Deterministic encrypted calls
  • CALLS_ENC_RND_COLLECTION - Random encrypted calls
  • CALLS_ENC_QE_COLLECTION - Queryable encrypted calls
  • TEST_COLLECTION - Test collection for benchmarking
  • CALLS_COLLECTION - Call records collection

Encryption Configuration

  • CMK_PATH - Path to Customer Master Key file (default: ./cmk.txt)
  • CRYPT_SHARED_LIB_PATH - Path to MongoDB crypt shared library

Encryption Strategies

The project demonstrates three MongoDB encryption approaches:

  1. Deterministic Encryption (calls_enc_det)

    • Allows equality queries on encrypted data
    • Same plaintext always produces same ciphertext
    • Used when you need to query encrypted fields
  2. Random Encryption (calls_enc_rnd)

    • More secure but not queryable
    • Same plaintext produces different ciphertext each time
    • Used for sensitive data that doesn't need querying
  3. Queryable Encryption (calls_enc_qe)

    • MongoDB's newer encryption feature
    • Allows equality queries with enhanced security

Usage

Generate Data

python generate_data.py

Query Encrypted Data (with decryption)

python secure_query.py [collection_name]

View Raw Encrypted Data (without decryption)

python unsecure_query.py [collection_name]

Run Benchmarks

python benchmark.py

Demo Walkthrough

Once you've generated data, follow this walkthrough to understand the security implications of each encryption strategy:

1. Deterministic Encryption: Data is Encrypted

First, view encrypted data without decryption keys:

python unsecure_query.py

What you'll see: Phone numbers appear as binary blobs (shown with colored hex IDs), demonstrating that data is encrypted at rest:

'phone': Binary(b'\x01\xaeqY1\xf9\x8eF\x07...', 6)
Loaded 16 calls from 01ae...654bf0daaf
  01ae...654bf0daaf called: 01ae...11752fdb29, Duration: 115

We also see that we can identify calls this encrypted number has done, this will be very relevant soon...

2. Deterministic Encryption: Data is Queryable

Now query the same data WITH decryption keys:

python secure_query.py

What you'll see: Phone numbers are decrypted and queries work:

'phone': '001-627-491-5972x25896'
Loaded 16 calls from 001-627-491-5972x25896
  001-627-491-5972x25896 called: (770)234-4074x05097, Duration: 115

Benefit: Encrypted data remains queryable
⚠️ Security Risk: But is it really secure? Let's find out...

3. Deterministic Encryption: Frequency Analysis Reveals Patterns

Run aggregation queries to group calls by caller:

python group.py

What you'll see: Even without decryption, you can identify the most frequent callers:

Top 25 callers (calls_enc_det):
  Caller: 01ae...c2032c7073, Calls: 32, Duration: 9067
  Caller: 01ae...11752fdb29, Calls: 29, Duration: 9247
  Caller: 01ae...9a643f9bd6, Calls: 23, Duration: 7194

Which can be linked to specific customers:

Customers (det):
  Phone: 01ae...c2032c7073, Name: Michelle Goodman, Email: lunamarc@example.com
  Phone: 01ae...9a643f9bd6, Name: Jennifer Lin, Email: qsutton@example.org
  Phone: 01ae...11752fdb29, Name: Joshua Cook, Email: albertking@example.com

⚠️ Security Vulnerability: Deterministic encryption leaks frequency information! Same plaintext always produces the same ciphertext, so attackers can:

  • Identify most active users
  • Build social graphs
  • Perform statistical attacks

4. Random Encryption: Data is Encrypted

Check the randomly encrypted collection:

python unsecure_query.py calls_enc_rnd

What you'll see: Data is encrypted (no calls found because same plaintext has different ciphertext):

Loaded 0 calls from 01ae...654bf0daaf

5. Random Encryption: No Frequency Leakage

Run aggregation on random encrypted data:

python group.py calls_enc_rnd

What you'll see: Every encrypted value is unique - no frequency patterns:

Top 25 callers (calls_enc_rnd):
  Caller: 02ae...433d276ff1, Calls: 1, Duration: 50
  Caller: 02ae...2a9963562b, Calls: 1, Duration: 573
  Caller: 02ae...e43f6c20a1, Calls: 1, Duration: 320
  ...each caller appears only once

Security Benefit: Random encryption prevents frequency analysis
⚠️ Limitation: But can we query it?

6. Random Encryption: Not Queryable

Try querying randomly encrypted data:

python secure_query.py calls_enc_rnd

What you'll see: Query fails with error:

pymongo.errors.EncryptionError: Cannot query on fields encrypted 
with the randomized encryption algorithm

Trade-off: Random encryption is secure but not queryable - you must choose between security and functionality.

7. Queryable Encryption: The Best of Both Worlds

MongoDB's Queryable Encryption provides both security AND query functionality.

Data is Encrypted

python unsecure_query.py calls_enc_qe

What you'll see: Data is encrypted (no calls found):

Loaded 0 calls from 01ae...654bf0daaf

No Frequency Leakage

python group.py calls_enc_qe

What you'll see: Each encrypted value is unique, preventing frequency analysis:

Top 25 callers (calls_enc_qe):
  Caller: 0e30...90c9e7bf91, Calls: 1, Duration: 169
  Caller: 0e30...7ea3bc25f1, Calls: 1, Duration: 52
  ...each caller appears only once

But Still Queryable!

python secure_query.py calls_enc_qe

What you'll see: Queries work with proper decryption keys:

Loaded 16 calls from 001-627-491-5972x25896
  001-627-491-5972x25896 called: (690)220-9920x09693, Duration: 514
  001-627-491-5972x25896 called: (770)234-4074x05097, Duration: 55
  ...

Perfect Balance: Queryable Encryption provides:

  • Strong encryption (data encrypted at rest)
  • Query capability (equality searches work)
  • No frequency leakage (same plaintext → different ciphertext each time)

Encryption Strategy Comparison

Feature Deterministic Random Queryable Encryption
Encrypted end-to-end
Queryable
Prevents frequency analysis
Use case Legacy apps needing queries Maximum security, no queries needed Modern apps: secure AND queryable

Implementation Details

⚠️ Never ever use this approach of storing the master key on a file, or this code, in production. ⚠️
This setup is for demonstration and educational purposes only.
This demo doesn't use a cloud KMS and stores the master key on a file, accessible to anyone. NEVER do this in production.
In production, always use a secure KMS (like AWS KMS, Azure Key Vault, GCP KMS, any KMIP based ones) and properly secure your master key with strict access controls.

Encrypted MongoDB Client

The encrypted MongoDB client is the core component that handles automatic encryption and decryption transparently. Here's how it works:

Client Setup Process

  1. Regular Client: First, create a standard MongoDB client to retrieve the data key:
mongo_client = pymongo.MongoClient(connection_string)
client_encryption = pymongo.encryption.ClientEncryption(
    get_kms_providers(),
    key_vault_namespace,
    mongo_client,
    pymongo.encryption.CodecOptions(uuid_representation=STANDARD)
)
data_key_id = client_encryption.get_key_by_alt_name("telco_encryption")["_id"]
  1. Encrypted Client: Create a client with AutoEncryptionOpts for automatic encryption/decryption:
from pymongo.encryption import AutoEncryptionOpts
from schemas import Schemas

schemas = Schemas(data_key_id)
fle_opts = AutoEncryptionOpts(
    get_kms_providers(),           # KMS provider configuration
    key_vault_namespace,            # Where encryption keys are stored
    schema_map=schemas.get_encryption_schema(...)  # Encryption schema
)

encrypted_mongo_client = pymongo.MongoClient(
    connection_string,
    auto_encryption_opts=fle_opts
)

How It Works

Automatic Encryption: When you write data using the encrypted client, fields specified in the schema are automatically encrypted before being sent to MongoDB:

# Your code
customers_collection.insert_one({"phone": "001-627-491-5972"})

# What MongoDB stores
{"phone": Binary(b'\x01\xaeqY1\xf9\x8eF\x07\x8bk~...')}

Automatic Decryption: When you read data, encrypted fields are automatically decrypted:

# MongoDB has
{"phone": Binary(b'\x01\xaeqY1\xf9\x8eF\x07\x8bk~...')}

# Your code receives
{"phone": "001-627-491-5972"}

Query Encryption: For deterministic encryption, queries are also automatically encrypted:

# Your code
calls_collection.find({"from": "001-627-491-5972"})

# Query sent to MongoDB
{"from": Binary(b'\x01\xaeqY1\xf9\x8eF\x07\x8bk~...')}

Key Components

  • KMS Providers: Manages the Customer Master Key (CMK) used to encrypt/decrypt Data Encryption Keys
  • Key Vault: MongoDB collection storing encrypted Data Encryption Keys
  • Schema Map: Defines which fields to encrypt and with which algorithm
  • Crypt Shared Library: MongoDB library that performs the actual encryption operations

Usage in Code

Once configured, use the encrypted client like a normal MongoDB client:

db = encrypted_mongo_client[db_name]
collection = db[collection_name]

# All operations work transparently
customer = collection.find_one()
print(customer['phone'])  # Automatically decrypted!

calls = collection.find({'from': phone})  # Query automatically encrypted!

Note: The encrypted client handles CSFLE (deterministic and random) but NOT Queryable Encryption. QE collections use a different approach with create_encrypted_collection().

Schema Definitions

All encryption schemas are encapsulated in the Schemas class in schemas.py:

Class Overview

from schemas import Schemas

# Initialize with data key ID
schemas = Schemas(data_key_id)

# Get schemas for different use cases
encryption_schema = schemas.get_encryption_schema(db_name, ...)
query_schema = schemas.get_query_encryption_schema(db_name, ...)
benchmark_schema = schemas.get_benchmark_schema(db_name, ...)

# Static method for Queryable Encryption
qe_fields = Schemas.get_encrypted_fields_map()

Instance Methods

  • get_call_schema_deterministic() - Call records with deterministic encryption
  • get_call_schema_random() - Call records with random encryption
  • get_customer_schema_deterministic() - Customer records with deterministic encryption
  • get_test_schema_deterministic(n_test_fields) - Test schema with configurable number of encrypted fields
  • get_encryption_schema(db_name, calls_det, calls_rnd, customers=None) - Complete schema map (used by both generate_data and secure_query)
  • get_benchmark_schema(db_name, test_coll, n_fields) - Schema map for benchmarking

Static Methods

  • get_encrypted_fields_map() - Queryable Encryption field definitions (doesn't require data_key_id)

Security Notes

🚨 Super-duper mega important 🚨
Again, NEVER, ever ever, use this code or CMK storage for production.

⚠️ Important: Never commit these files to version control:

  • .env - Contains your MongoDB credentials
  • cmk.txt - Your encryption master key

These are already excluded in .gitignore.

License

This project is licensed under the WTFPL.
For details, see: http://www.wtfpl.net/about/

About

This repo show how to use all variants of MongoDB's client side encryption (CSFLE Deterministic, CSFLE Random, QE) and their effect on the encypted data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages