# Metadata

**L1 Taxonomy** - Security

**L2 Taxonomy** - Authorization

**Subtopic** - Multi-tenant data partitioning: ensure queries always include tenant id from user context (to separate customer data)

**Use Case** - Develop a Python module that simulates multi-tenant data partitioning. This module should generate mock data for multiple tenants and ensure that each query to retrieve data includes the tenant id, effectively separating the data for each customer. Utilize Python's built-in data structures and the 'random' module from the Python standard library for data generation and simulation.

**Programming Language** - Python

**Target Model** - o1

# Setup

```requirements.txt
```


# Prompt
## Problem Description

You are tasked with implementing a secure multi-tenant data layer for a Python SaaS application. This module must enforce strict data partitioning between tenants by ensuring that all queries include proper tenant identification from a user context object. The system should generate realistic mock data for multiple tenants and prevent any possibility of cross-tenant data access, simulating the critical security boundaries required in real multi-tenant architectures. The deliverable is a complete, importable Python module that other application components can use to safely query tenant-specific data.

## Input Format

All interactions occur through Python function calls with no external I/O. The module exposes two primary functions:

- bootstrap(tenants: list[str], rng_seed: int | None = None) -> None - Initializes the data store with mock data for the specified tenants
- query(user_ctx: dict, *, value_min: int | None = None, value_max: int | None = None) -> list[dict] - Retrieves tenant-specific records with optional numeric range filtering

The user context is a dictionary containing at minimum a tenant_id key. Query parameters use keyword-only arguments for range filtering on a numeric value field.

## Output Format

Successful queries return lists of dictionaries, each representing a data record:

```python
{"tenant_id": "acme", "obj_id": "a1b2c3d4e5f6...", "value": 342}
```

All validation failures are communicated by raising appropriate built-in Python exceptions (KeyError, ValueError, etc.). The module produces no console output or logging.

## Examples

### Basic Multi-Tenant Usage
```python
from main import bootstrap, query

# Initialize with multiple tenants
bootstrap(["acme", "globex", "initech"], rng_seed=42)

# Each user context represents a different tenant
alice_ctx = {"tenant_id": "acme"}
bob_ctx = {"tenant_id": "globex"}

# Queries are automatically partitioned by tenant
acme_records = query(alice_ctx)
globex_records = query(bob_ctx, value_min=100, value_max=500)

# Verify strict partitioning
assert all(r["tenant_id"] == "acme" for r in acme_records)
assert all(r["tenant_id"] == "globex" for r in globex_records)
```

### Security Boundary Testing
```python
# Attempting to access non-existent tenant fails
try:
    query({"tenant_id": "evil_corp"})
except KeyError:
    print("Correctly blocked unauthorized tenant access")

# Missing tenant_id in context fails
try:
    query({"user_id": "alice"})  # Wrong key
except KeyError:
    print("Correctly required tenant_id in context")
```

### Range Query Edge Cases
```python
# Valid range filtering
filtered = query(alice_ctx, value_min=200, value_max=800)
assert all(200  max
except ValueError:
    print("Correctly rejected invalid range")
```


# Requirements
## Explicit Requirements

- Data Generation: Generate at least 50 records per tenant during bootstrap, each containing exactly three fields: `tenant_id` (string), `obj_id` (32-character hexadecimal string), and value (integer from 0-999)
- Tenant Isolation: Query function must derive tenant access solely from user_ctx["tenant_id"], never from global state or other sources
- Reproducible Randomness: Same rng_seed value must produce identical datasets across different bootstrap calls
- State Management: Support multiple bootstrap calls in the same process, with each call completely resetting the data store
- Error Handling: Raise KeyError for missing or unknown tenant_id, ValueError for invalid query parameters or duplicate tenants in bootstrap
- API Constraints: Expose only the two specified functions; all helper classes/functions must use private naming (underscore prefix)

## Implicit Requirements

- Memory Safety: Return copies of data to prevent external mutation of internal state
- Parameter Validation: Validate range parameters before applying filters (value_min ≤ value_max)
- Context Isolation: Never cache or persist user context information between query calls
- Deterministic Behavior: Given identical inputs, produce identical outputs across program runs
- Clean State Transitions: Ensure bootstrap completely replaces previous data, not appending to it

## Edge Cases

- Bootstrap Reset: Calling bootstrap multiple times must completely replace the previous dataset
- Duplicate Tenants: Passing duplicate tenant IDs to bootstrap must raise ValueError
- Missing Context Keys: Query with user_ctx lacking "tenant_id" key must raise KeyError
- Invalid Ranges: Query where value_min > value_max must raise ValueError before data access
- Empty Results: Range filters that match no records should return empty list, not raise exceptions
- Uninitialized Access: Calling query before any bootstrap call must raise a clear error

## Constraints

- Standard Library Only: No external packages beyond Python's built-in modules
- Python Version: Must work on CPython 3.8+
- No I/O Operations: No file system access, network calls, or console output within the module
- Single File Solution: Deliver as a single main.py file that can be imported
- Memory Only: All data must exist purely in memory with no persistence mechanisms

## Function Signatures

```python
def bootstrap(tenants: list[str], rng_seed: int | None = None) -> None:
    """Initialize the multi-tenant data store."""
    ...

def query(user_ctx: dict, *, value_min: int | None = None, value_max: int | None = None) -> list[dict]:
    """Query tenant-specific data with optional range filtering."""
    ...
```

## Important Notes

The solution must be a single importable Python file with no external dependencies. Testing will include attempts to forge user contexts and verify that tenant boundaries are completely impermeable. The random module usage will be monitored to ensure proper seed handling, so avoid caching random values outside of bootstrap calls. The module should raise appropriate exceptions for all error conditions rather than returning error indicators or printing messages.


In [None]:
# code

from typing import List, Dict, Any, Optional
import random

__all__ = ["bootstrap", "query"]

_store: Optional[Dict[str, List[Dict[str, Any]]]] = None


def bootstrap(tenants: List[str], rng_seed: Optional[int] = None) -> None:
    """
    Initialize (or reinitialize) the multi-tenant data store.

    Generates exactly 50 mock records per tenant, each record containing:
      - tenant_id: str
      - obj_id:   str (32-character hexadecimal)
      - value:    int (0–999)

    Args:
        tenants:  List of unique tenant identifiers.
        rng_seed: Optional seed for reproducible randomness.

    Raises:
        ValueError: If `tenants` contains duplicates.
    """
    global _store

    # Detect duplicate tenant IDs
    if len(tenants) != len(set(tenants)):
        raise ValueError("Duplicate tenant IDs are not allowed")

    # Build a fresh store with a private RNG
    rng = random.Random(rng_seed)
    new_store: Dict[str, List[Dict[str, Any]]] = {}

    for tenant_id in tenants:
        new_store[tenant_id] = _generate_records(tenant_id, rng)

    _store = new_store


def query(
    user_ctx: Dict[str, Any],
    *,
    value_min: Optional[int] = None,
    value_max: Optional[int] = None
) -> List[Dict[str, Any]]:
    """
    Retrieve records for the tenant specified in user_ctx["tenant_id"],
    optionally filtered by a numeric range on the `value` field.

    Args:
        user_ctx:   Dict containing at least the key "tenant_id".
        value_min:  Inclusive lower bound on record["value"], if provided.
        value_max:  Inclusive upper bound on record["value"], if provided.

    Returns:
        List of record dicts matching the tenant and value filters.

    Raises:
        RuntimeError: If bootstrap() has not yet been called.
        KeyError:     If "tenant_id" is missing or not recognized.
        ValueError:   If value_min > value_max.
    """
    global _store

    # Must initialize first
    if _store is None:
        raise RuntimeError("bootstrap() must be called before query()")

    # Validate and extract tenant_id
    if "tenant_id" not in user_ctx:
        raise KeyError("tenant_id")
    tenant_id = user_ctx["tenant_id"]
    if tenant_id not in _store:
        raise KeyError(tenant_id)

    # Validate range parameters
    if value_min is not None and value_max is not None and value_min > value_max:
        raise ValueError("value_min cannot be greater than value_max")

    # Filter and return shallow copies
    results: List[Dict[str, Any]] = []
    for record in _store[tenant_id]:
        val = record["value"]
        if value_min is not None and val < value_min:
            continue
        if value_max is not None and val > value_max:
            continue
        results.append(record.copy())

    return results


def _generate_records(
    tenant_id: str,
    rng: random.Random
) -> List[Dict[str, Any]]:
    """
    Helper to generate 50 mock records for a single tenant.

    Args:
        tenant_id: The tenant identifier to embed in each record.
        rng:       A private Random instance for reproducibility.

    Returns:
        A list of 50 record dicts.
    """
    records: List[Dict[str, Any]] = []
    for _ in range(50):
        obj_id = "".join(rng.choice("0123456789abcdef") for _ in range(32))
        value = rng.randint(0, 999)
        records.append({
            "tenant_id": tenant_id,
            "obj_id":     obj_id,
            "value":      value
        })
    return records


In [None]:
# tests

"""
Unit tests for secure multi-tenant data layer.

Verifies bootstrap and query behavior against the P/R specification.
"""

import unittest
import main
from main import bootstrap, query


class TestMultiTenant(unittest.TestCase):
    """Tests for secure multi-tenant data layer."""

    def setUp(self):
        """Clear internal store before each test."""
        main._store = None

    def test_uninitialized_query(self):
        """Query before bootstrap must raise RuntimeError."""
        with self.assertRaises(RuntimeError):
            query({'tenant_id': 'any'})

    def test_duplicate_tenants(self):
        """Bootstrap with duplicate IDs must raise ValueError."""
        with self.assertRaises(ValueError):
            bootstrap(['a', 'a'])

    def test_basic_partitioning(self):
        """Each tenant sees exactly 50 of its own records."""
        bootstrap(['alpha', 'beta'], rng_seed=123)
        alpha_ctx = {'tenant_id': 'alpha'}
        beta_ctx = {'tenant_id': 'beta'}

        a_records = query(alpha_ctx)
        b_records = query(beta_ctx)

        self.assertEqual(len(a_records), 50)
        self.assertEqual(len(b_records), 50)
        for rec in a_records:
            self.assertEqual(rec['tenant_id'], 'alpha')
        for rec in b_records:
            self.assertEqual(rec['tenant_id'], 'beta')

    def test_invalid_range(self):
        """value_min > value_max must raise ValueError."""
        bootstrap(['tenant'], rng_seed=0)
        with self.assertRaises(ValueError):
            query({'tenant_id': 'tenant'},
                  value_min=500, value_max=100)

    def test_missing_tenant_key(self):
        """Missing tenant_id in context must raise KeyError."""
        bootstrap(['x'], rng_seed=1)
        with self.assertRaises(KeyError):
            query({'user': 'x'})

    def test_unknown_tenant(self):
        """Unknown tenant_id must raise KeyError."""
        bootstrap(['x'], rng_seed=1)
        with self.assertRaises(KeyError):
            query({'tenant_id': 'y'})

    def test_empty_results(self):
        """Non-overlapping range yields empty list."""
        bootstrap(['e'], rng_seed=5)
        results = query({'tenant_id': 'e'},
                        value_min=1000, value_max=2000)
        self.assertEqual(results, [])

    def test_min_only_filter(self):
        """Only lower bound filters records correctly."""
        bootstrap(['t'], rng_seed=7)
        results = query({'tenant_id': 't'}, value_min=500)
        for rec in results:
            self.assertGreaterEqual(rec['value'], 500)

    def test_max_only_filter(self):
        """Only upper bound filters records correctly."""
        bootstrap(['t'], rng_seed=7)
        results = query({'tenant_id': 't'}, value_max=500)
        for rec in results:
            self.assertLessEqual(rec['value'], 500)

    def test_reproducible_randomness(self):
        """Same seed must yield identical datasets."""
        bootstrap(['r'], rng_seed=9)
        first = query({'tenant_id': 'r'})
        bootstrap(['r'], rng_seed=9)
        second = query({'tenant_id': 'r'})
        self.assertEqual(first, second)

    def test_bootstrap_reset(self):
        """Bootstrap must replace old data completely."""
        bootstrap(['a'], rng_seed=42)
        _ = query({'tenant_id': 'a'})
        bootstrap(['b'], rng_seed=42)
        with self.assertRaises(KeyError):
            query({'tenant_id': 'a'})


if __name__ == '__main__':
    unittest.main()




# Model Breaking Proof

#### Model Breaking Task URL: https://rlhf-v3.turing.com/prompt/9aaf928e-beff-400f-8c23-418520e93be2

#### Model code:

```python
import random

# Private global data store (None indicates uninitialized)
_data_store = None

def bootstrap(tenants: list[str], rng_seed: int | None = None) -> None: # <Issue> Uses `list[str]` (PEP 585) and `int | None` (PEP 604) — invalid prior to Python 3.9/3.10, breaks CPython 3.8 compatibility.
    """
    Initialize the multi-tenant data store, generating 50 records per tenant.
    
    Raises:
        ValueError: If any duplicate tenant IDs are found.
    """
    global _data_store
    
    # Detect duplicate tenants
    if len(set(tenants)) < len(tenants):
        raise ValueError("Duplicate tenant IDs are not allowed in bootstrap.")
    
    # Re-initialize data store
    _data_store = {}

    # Use a dedicated random generator to avoid affecting global state
    _rng = random.Random(rng_seed)
    
    for tenant_id in tenants:
        _data_store[tenant_id] = _generate_tenant_data(tenant_id, _rng)

def query(user_ctx: dict, *, value_min: int | None = None, value_max: int | None = None) -> list[dict]:
    """
    Query tenant-specific data with optional range filtering on the 'value' field.
    
    Raises:
        KeyError: If tenant_id is missing in user_ctx or tenant_id is not recognized,
                  or if the data store is uninitialized (None).
        ValueError: If value_min > value_max or any other invalid parameters.
    """
    global _data_store
    
    # Ensure we have been bootstrapped
    if _data_store is None:
        raise KeyError("No data store found. Call bootstrap before querying.") # <Issue> Should raise RuntimeError on uninitialized access, not KeyError.
    
    # Validate tenant_id in user context
    if "tenant_id" not in user_ctx:
        raise KeyError("Missing 'tenant_id' in user context.")
    
    tenant_id = user_ctx["tenant_id"]
    if tenant_id not in _data_store:
        raise KeyError(f"Unknown tenant_id: {tenant_id}")
    
    # Validate range parameters
    if value_min is not None and value_max is not None:
        if value_min > value_max:
            raise ValueError("value_min must be <= value_max.")
    
    records = _data_store[tenant_id]
    
    # Apply numeric filters
    filtered = []
    for record in records:
        val = record["value"]
        if value_min is not None and val < value_min:
            continue
        if value_max is not None and val > value_max:
            continue
        filtered.append(dict(record))  # Return a copy
    
    return filtered

def _generate_tenant_data(tenant_id: str, rng: random.Random) -> list[dict]:
    """
    Generate 50 mock records for a given tenant.
    """
    records = []
    for _ in range(50):
        obj_id = "".join(rng.choice("0123456789abcdef") for _ in range(32))
        value = rng.randint(0, 999)
        records.append({
            "tenant_id": tenant_id,
            "obj_id": obj_id,
            "value": value
        })
    return records
```