A demonstration project for MongoDB Client-Side Field Level Encryption (CSFLE) and Queryable Encryption with telecommunications data. The demo generates a number of very simplified customers and a number of calls among those customers. It stores copies the calls in 3 different collections:
- One encrypted with CSFLE-Deterministic
- One encrypted with CSFLE-Random
- One encrypted with Queryable Encryption
Important: This code is for demonstration purposes only and includes simplifications that should never be used in production. In a real deployment, always store the Customer Master Key (CMK) in a secure Key Management System (KMS), use a strong random number generator for key creation, and enforce strict access controls (RBAC) on key vault access, implement robust error handling, etc.
- Project Structure
- Setup
- Configuration
- Encryption Strategies
- Usage
- Demo Walkthrough
- 1. Deterministic Encryption: Data is Encrypted
- 2. Deterministic Encryption: Data is Queryable
- 3. Deterministic Encryption: Frequency Analysis Reveals Patterns
- 4. Random Encryption: Data is Encrypted
- 5. Random Encryption: No Frequency Leakage
- 6. Random Encryption: Not Queryable
- 7. Queryable Encryption: The Best of Both Worlds
- Encryption Strategy Comparison
- Implementation Details
- Security Notes
- License
generate_data.py- Generates customer and call data with multiple encryption strategiessecure_query.py- Demonstrates querying encrypted data with proper decryptionunsecure_query.py- Shows what encrypted data looks like without decryption keysbenchmark.py- Performance benchmarking for encryption operationsschemas.py- Centralized encryption schema definitionstest.py- Test utilitiesgroup.py- Performs a frequency analysis grouping the (encrypted) calls by caller.
pip install -r req.txtCopy the example environment file and configure it:
cp env.example .envEdit .env with your values:
- MongoDB connection string
- Database and collection names
- Path to MongoDB crypt shared library
- Customer Master Key path
🚨 BIG FAT DISCLAIMER 🚨
This is for demo purposes only.
Never generate production keys on a random CLI.
Never store production keys in a file.
Never trust a random person on the internet with the security of you application.
Oh, wait...
That being said, create a local demo encryption key:
openssl rand 96 | base64 > cmk.txt
Or if you don't have the openssl CLI:
python -c "import os, base64; print(base64.b64encode(os.urandom(96)).decode())" > cmk.txtAll configuration is managed through environment variables in .env:
MONGODB_URL- Your MongoDB connection string
NUM_CUSTOMERS- Number of customers to generate (default: 4)NUM_CALLS- Number of call records to generate (default: 100)
DB_NAME- Database name (default: telco_encryption)KEY_VAULT_COLLECTION_NAME- Key vault collection (default: __keyVault)
CUSTOMERS_COLLECTION- Plaintext customers collectionCUSTOMERS_ENC_COLLECTION- Encrypted customers collectionCALLS_ENC_DET_COLLECTION- Deterministic encrypted callsCALLS_ENC_RND_COLLECTION- Random encrypted callsCALLS_ENC_QE_COLLECTION- Queryable encrypted callsTEST_COLLECTION- Test collection for benchmarkingCALLS_COLLECTION- Call records collection
CMK_PATH- Path to Customer Master Key file (default: ./cmk.txt)CRYPT_SHARED_LIB_PATH- Path to MongoDB crypt shared library
The project demonstrates three MongoDB encryption approaches:
-
Deterministic Encryption (
calls_enc_det)- Allows equality queries on encrypted data
- Same plaintext always produces same ciphertext
- Used when you need to query encrypted fields
-
Random Encryption (
calls_enc_rnd)- More secure but not queryable
- Same plaintext produces different ciphertext each time
- Used for sensitive data that doesn't need querying
-
Queryable Encryption (
calls_enc_qe)- MongoDB's newer encryption feature
- Allows equality queries with enhanced security
python generate_data.pypython secure_query.py [collection_name]python unsecure_query.py [collection_name]python benchmark.pyOnce you've generated data, follow this walkthrough to understand the security implications of each encryption strategy:
First, view encrypted data without decryption keys:
python unsecure_query.pyWhat you'll see: Phone numbers appear as binary blobs (shown with colored hex IDs), demonstrating that data is encrypted at rest:
'phone': Binary(b'\x01\xaeqY1\xf9\x8eF\x07...', 6)
Loaded 16 calls from 01ae...654bf0daaf
01ae...654bf0daaf called: 01ae...11752fdb29, Duration: 115
We also see that we can identify calls this encrypted number has done, this will be very relevant soon...
Now query the same data WITH decryption keys:
python secure_query.pyWhat you'll see: Phone numbers are decrypted and queries work:
'phone': '001-627-491-5972x25896'
Loaded 16 calls from 001-627-491-5972x25896
001-627-491-5972x25896 called: (770)234-4074x05097, Duration: 115
✅ Benefit: Encrypted data remains queryable
Run aggregation queries to group calls by caller:
python group.pyWhat you'll see: Even without decryption, you can identify the most frequent callers:
Top 25 callers (calls_enc_det):
Caller: 01ae...c2032c7073, Calls: 32, Duration: 9067
Caller: 01ae...11752fdb29, Calls: 29, Duration: 9247
Caller: 01ae...9a643f9bd6, Calls: 23, Duration: 7194
Which can be linked to specific customers:
Customers (det):
Phone: 01ae...c2032c7073, Name: Michelle Goodman, Email: lunamarc@example.com
Phone: 01ae...9a643f9bd6, Name: Jennifer Lin, Email: qsutton@example.org
Phone: 01ae...11752fdb29, Name: Joshua Cook, Email: albertking@example.com
- Identify most active users
- Build social graphs
- Perform statistical attacks
Check the randomly encrypted collection:
python unsecure_query.py calls_enc_rndWhat you'll see: Data is encrypted (no calls found because same plaintext has different ciphertext):
Loaded 0 calls from 01ae...654bf0daaf
Run aggregation on random encrypted data:
python group.py calls_enc_rndWhat you'll see: Every encrypted value is unique - no frequency patterns:
Top 25 callers (calls_enc_rnd):
Caller: 02ae...433d276ff1, Calls: 1, Duration: 50
Caller: 02ae...2a9963562b, Calls: 1, Duration: 573
Caller: 02ae...e43f6c20a1, Calls: 1, Duration: 320
...each caller appears only once
✅ Security Benefit: Random encryption prevents frequency analysis
Try querying randomly encrypted data:
python secure_query.py calls_enc_rndWhat you'll see: Query fails with error:
pymongo.errors.EncryptionError: Cannot query on fields encrypted
with the randomized encryption algorithm
❌ Trade-off: Random encryption is secure but not queryable - you must choose between security and functionality.
MongoDB's Queryable Encryption provides both security AND query functionality.
python unsecure_query.py calls_enc_qeWhat you'll see: Data is encrypted (no calls found):
Loaded 0 calls from 01ae...654bf0daaf
python group.py calls_enc_qeWhat you'll see: Each encrypted value is unique, preventing frequency analysis:
Top 25 callers (calls_enc_qe):
Caller: 0e30...90c9e7bf91, Calls: 1, Duration: 169
Caller: 0e30...7ea3bc25f1, Calls: 1, Duration: 52
...each caller appears only once
python secure_query.py calls_enc_qeWhat you'll see: Queries work with proper decryption keys:
Loaded 16 calls from 001-627-491-5972x25896
001-627-491-5972x25896 called: (690)220-9920x09693, Duration: 514
001-627-491-5972x25896 called: (770)234-4074x05097, Duration: 55
...
✅ Perfect Balance: Queryable Encryption provides:
- Strong encryption (data encrypted at rest)
- Query capability (equality searches work)
- No frequency leakage (same plaintext → different ciphertext each time)
| Feature | Deterministic | Random | Queryable Encryption |
|---|---|---|---|
| Encrypted end-to-end | ✅ | ✅ | ✅ |
| Queryable | ✅ | ❌ | ✅ |
| Prevents frequency analysis | ❌ | ✅ | ✅ |
| Use case | Legacy apps needing queries | Maximum security, no queries needed | Modern apps: secure AND queryable |
⚠️ Never ever use this approach of storing the master key on a file, or this code, in production.⚠️
This setup is for demonstration and educational purposes only.
This demo doesn't use a cloud KMS and stores the master key on a file, accessible to anyone. NEVER do this in production.
In production, always use a secure KMS (like AWS KMS, Azure Key Vault, GCP KMS, any KMIP based ones) and properly secure your master key with strict access controls.
The encrypted MongoDB client is the core component that handles automatic encryption and decryption transparently. Here's how it works:
- Regular Client: First, create a standard MongoDB client to retrieve the data key:
mongo_client = pymongo.MongoClient(connection_string)
client_encryption = pymongo.encryption.ClientEncryption(
get_kms_providers(),
key_vault_namespace,
mongo_client,
pymongo.encryption.CodecOptions(uuid_representation=STANDARD)
)
data_key_id = client_encryption.get_key_by_alt_name("telco_encryption")["_id"]- Encrypted Client: Create a client with
AutoEncryptionOptsfor automatic encryption/decryption:
from pymongo.encryption import AutoEncryptionOpts
from schemas import Schemas
schemas = Schemas(data_key_id)
fle_opts = AutoEncryptionOpts(
get_kms_providers(), # KMS provider configuration
key_vault_namespace, # Where encryption keys are stored
schema_map=schemas.get_encryption_schema(...) # Encryption schema
)
encrypted_mongo_client = pymongo.MongoClient(
connection_string,
auto_encryption_opts=fle_opts
)Automatic Encryption: When you write data using the encrypted client, fields specified in the schema are automatically encrypted before being sent to MongoDB:
# Your code
customers_collection.insert_one({"phone": "001-627-491-5972"})
# What MongoDB stores
{"phone": Binary(b'\x01\xaeqY1\xf9\x8eF\x07\x8bk~...')}Automatic Decryption: When you read data, encrypted fields are automatically decrypted:
# MongoDB has
{"phone": Binary(b'\x01\xaeqY1\xf9\x8eF\x07\x8bk~...')}
# Your code receives
{"phone": "001-627-491-5972"}Query Encryption: For deterministic encryption, queries are also automatically encrypted:
# Your code
calls_collection.find({"from": "001-627-491-5972"})
# Query sent to MongoDB
{"from": Binary(b'\x01\xaeqY1\xf9\x8eF\x07\x8bk~...')}- KMS Providers: Manages the Customer Master Key (CMK) used to encrypt/decrypt Data Encryption Keys
- Key Vault: MongoDB collection storing encrypted Data Encryption Keys
- Schema Map: Defines which fields to encrypt and with which algorithm
- Crypt Shared Library: MongoDB library that performs the actual encryption operations
Once configured, use the encrypted client like a normal MongoDB client:
db = encrypted_mongo_client[db_name]
collection = db[collection_name]
# All operations work transparently
customer = collection.find_one()
print(customer['phone']) # Automatically decrypted!
calls = collection.find({'from': phone}) # Query automatically encrypted!Note: The encrypted client handles CSFLE (deterministic and random) but NOT Queryable Encryption. QE collections use a different approach with create_encrypted_collection().
All encryption schemas are encapsulated in the Schemas class in schemas.py:
from schemas import Schemas
# Initialize with data key ID
schemas = Schemas(data_key_id)
# Get schemas for different use cases
encryption_schema = schemas.get_encryption_schema(db_name, ...)
query_schema = schemas.get_query_encryption_schema(db_name, ...)
benchmark_schema = schemas.get_benchmark_schema(db_name, ...)
# Static method for Queryable Encryption
qe_fields = Schemas.get_encrypted_fields_map()get_call_schema_deterministic()- Call records with deterministic encryptionget_call_schema_random()- Call records with random encryptionget_customer_schema_deterministic()- Customer records with deterministic encryptionget_test_schema_deterministic(n_test_fields)- Test schema with configurable number of encrypted fieldsget_encryption_schema(db_name, calls_det, calls_rnd, customers=None)- Complete schema map (used by both generate_data and secure_query)get_benchmark_schema(db_name, test_coll, n_fields)- Schema map for benchmarking
get_encrypted_fields_map()- Queryable Encryption field definitions (doesn't require data_key_id)
🚨 Super-duper mega important 🚨
Again, NEVER, ever ever, use this code or CMK storage for production.
.env- Contains your MongoDB credentialscmk.txt- Your encryption master key
These are already excluded in .gitignore.
This project is licensed under the WTFPL.
For details, see: http://www.wtfpl.net/about/