## 1. High-Level Architecture

### 1.1 System Overview

Croupier is a multi-tenant organization management service built using a layered architecture pattern. The system separates concerns across multiple layers to ensure maintainability, testability, and scalability.

### 1.2 Architecture Layers

- **Client Layer:** Web/Mobile applications consuming REST APIs
- **API Layer (FastAPI):** HTTP request handling, validation, response formatting
- **Service Layer:** Business logic orchestration and transaction management
- **Repository Layer:** Data access abstraction and CRUD operations
- **Data Layer (MongoDB):** Master database + dynamic per-tenant collections

### 1.3 Data Flow

**Request Flow:** Client → Router → Service → Repository → Database

**Response Flow:** Database → Repository → Service → Router → Client

## 2. Directory Structure

The project follows a modular, production-ready structure with clear separation of concerns:

```
app/
  models/schemas.py       - Pydantic models
  repositories/           - Data access layer
    organization_repository.py
    admin_repository.py
  routers/               - API endpoints
    organization.py
    admin.py
  security/              - JWT & password hashing
    jwt_handler.py
    password_handler.py
    dependencies.py
  services/              - Business logic
    organization_service.py
    auth_service.py
  config.py              - Configuration
  db.py                  - Database manager
main.py                  - Application entry
```

### 2.1 Layer Responsibilities

- **Routers:** HTTP request handling, input validation, response formatting
- **Services:** Business logic, orchestration, transaction management
- **Repositories:** Database operations, data access abstraction
- **Models:** Data validation, serialization/deserialization
- **Security:** Authentication, authorization, encryption

## 3. Design Choices & Rationale

### 3.1 Technology Stack

**FastAPI Framework**
- High performance comparable to Node.js and Go
- Automatic API documentation (Swagger UI & ReDoc)
- Built-in validation with Pydantic
- Native async support

**MongoDB Database**
- Flexible schema supports dynamic collection creation
- Easy horizontal scaling with sharding
- Fast collection operations (create, drop)

**JWT Authentication**
- Stateless - no server-side session storage
- Scalable across multiple servers
- Self-contained tokens with embedded user context

### 3.2 Multi-Tenancy Approach: Collection per Tenant

Each organization gets its own MongoDB collection (`org_<organization_name>`) providing complete data isolation. Master database stores global metadata while dynamic collections store tenant-specific data.

## 4. Key Code Implementations

### 4.1 Dynamic Collection Creation (db.py)

In [None]:
class DatabaseManager:
    def get_org_collection(self, organization_name: str):
        """Get or create organization-specific collection"""
        collection_name = f"org_{organization_name}"
        return self.master_db[collection_name]
    
    def drop_org_collection(self, organization_name: str):
        """Delete organization collection and all its data"""
        collection_name = f"org_{organization_name}"
        self.master_db.drop_collection(collection_name)

### 4.2 Repository Pattern (organization_repository.py)

In [None]:
class OrganizationRepository:
    def create(self, organization_data: Dict[str, Any]):
        """Create new organization with metadata"""
        organization_data['created_at'] = datetime.utcnow()
        result = self.collection.insert_one(organization_data)
        return self._serialize_document(organization_data)
    
    def find_by_name(self, organization_name: str):
        """Find organization by name"""
        doc = self.collection.find_one(
            {"organization_name": organization_name})
        return self._serialize_document(doc) if doc else None

### 4.3 Service Layer (organization_service.py)

In [None]:
class OrganizationService:
    def create_organization(self, org_data):
        """Complete organization creation workflow"""
        # 1. Validate uniqueness
        if self.org_repo.exists(org_data.organization_name):
            raise HTTPException(400, "Organization exists")
        
        # 2. Create organization metadata
        created_org = self.org_repo.create({...})
        
        # 3. Create admin user with hashed password
        self.admin_repo.create({...})
        
        # 4. Create dynamic collection with indexes
        org_collection = self.db_manager.get_org_collection(...)
        org_collection.create_index("created_at")

### 4.4 JWT Token Creation (security/jwt_handler.py)

In [None]:
class JWTHandler:
    @staticmethod
    def create_access_token(data: Dict[str, Any]):
        """Create JWT token with expiration"""
        to_encode = data.copy()
        expire = datetime.utcnow() + timedelta(minutes=60)
        to_encode.update({
            "exp": expire,
            "iat": datetime.utcnow()
        })
        return jwt.encode(to_encode, SECRET_KEY, algorithm="HS256")

### 4.5 Password Hashing (security/password_handler.py)

In [None]:
class PasswordHandler:
    @staticmethod
    def hash_password(password: str) -> str:
        """Hash password using bcrypt with 12 rounds"""
        salt = bcrypt.gensalt(rounds=12)
        hashed = bcrypt.hashpw(password.encode(), salt)
        return hashed.decode('utf-8')
    
    @staticmethod
    def verify_password(plain: str, hashed: str) -> bool:
        """Verify password against hash"""
        return bcrypt.checkpw(plain.encode(), hashed.encode())

## 5. Architectural Analysis

### Question 1: Is this architecture scalable?

**Answer:** Yes, to a significant extent, but with important caveats and limitations.

**Scalability Strengths**
- **Horizontal Scaling:** Stateless JWT allows adding multiple API servers without session synchronization
- **Data Isolation:** Each organization has its own collection enabling targeted optimization and sharding
- **Independent Growth:** Organizations can grow independently without affecting others
- **FastAPI Performance:** Async capabilities handle high concurrency efficiently

**Scalability Challenges**
- **Collection Explosion:** MongoDB has practical limits at 100,000+ collections; metadata management becomes challenging
- **Namespace Limits:** MongoDB namespaces (database.collection) have size constraints
- **Schema Migrations:** Changes must be applied to N collections, increasing complexity
- **Resource Overhead:** Each collection has index metadata and file descriptors

**Mitigation Strategies**
- Collection Pooling: Group small/inactive organizations into shared collections
- Archival: Move inactive organizations to cold storage
- Monitoring: Track collection count and implement alerts at thresholds
- Hybrid Approach: Use per-collection for large tenants, shared for small ones

### Question 2: Trade-offs of the Chosen Stack and Architecture

**FastAPI Trade-offs**
- *Pros:* Excellent performance, automatic documentation, type safety with Pydantic, modern async/await support, strong ecosystem
- *Cons:* Relatively newer framework, smaller talent pool compared to Django, async paradigm complexity for simple CRUD

**MongoDB Trade-offs**
- *Pros:* Schema flexibility, easy horizontal scaling, fast write operations, rich query language, document-oriented data
- *Cons:* No native ACID transactions across collections (standalone mode), higher memory footprint, collection limits at scale, application-level integrity

**Collection-per-Tenant Trade-offs**
- *Pros:* Strong data isolation, no tenant_id filtering needed, fast deletion (drop collection), per-tenant optimization possible, straightforward billing/usage tracking
- *Cons:* Collection explosion at scale, schema changes require N migrations, cross-tenant analytics complexity, higher operational complexity, resource overhead per collection

**JWT Trade-offs**
- *Pros:* Stateless architecture, horizontal scaling without sticky sessions, self-contained tokens, cross-domain authentication
- *Cons:* Cannot revoke tokens before expiration (without additional infrastructure), token size increases with payload, requires careful secret key management

### Question 3: Alternative Scalable Designs

There are three main approaches to multi-tenancy in database architecture, each with distinct trade-offs depending on scale, isolation requirements, and operational capacity.

**Option 1: Single Shared Collection with Tenant ID**

All tenant data stored in one collection with `tenant_id` field. Every query must include tenant_id filter.

*Pros:* Simple management, easy schema migrations, no collection count limits, straightforward cross-tenant analytics, lower operational overhead

*Cons:* Weak isolation (data leakage risk), noisy neighbor problem, all queries require tenant_id filtering (performance overhead), index bloat, difficult to move specific tenants

*When to Use:* Small to medium tenants (<1000), similar-sized tenants with predictable workloads, lower security/isolation requirements, frequent cross-tenant analytics

**Option 2: Database per Tenant**

Each tenant gets their own database with complete physical separation.

*Pros:* Maximum isolation, independent backups/restore, per-tenant performance tuning, easy premium tiers with dedicated infrastructure, database-level security

*Cons:* Very high operational complexity, connection pool management challenges, MongoDB database limits, schema migrations across N databases, cross-tenant queries extremely difficult, higher infrastructure costs

*When to Use:* Enterprise customers requiring strict compliance (HIPAA, SOC2), small number of large high-value tenants, geographic data residency requirements, vastly different resource needs

**Option 3: Collection per Tenant (Current Implementation)**

Middle ground between shared collection and database per tenant.

*Pros:* Good isolation without database overhead, faster than shared collection (no tenant_id filtering), fast tenant deletion, can shard individual collections

*Cons:* Collection count limits at scale, schema migrations across N collections, higher complexity than shared collection

*When to Use:* Medium number of tenants (100-10,000), moderate isolation requirements, balance between security and operational simplicity, need for per-tenant optimization

**Hybrid Recommendation: Tiered Multi-Tenancy**

For production at scale, a hybrid approach combining multiple strategies based on tenant characteristics is recommended:

- **Tier 1 (Enterprise):** Database per tenant, dedicated infrastructure, premium pricing
- **Tier 2 (Mid-Market):** Collection per tenant, shared infrastructure, standard pricing
- **Tier 3 (Small/Free):** Shared collection with tenant_id, shared infrastructure, free/low pricing

This provides:
- Flexibility to scale different customer segments appropriately
- Cost optimization for small tenants
- Revenue optimization for enterprise
- Operational efficiency at scale

## 6. Performance & Security Considerations

### 6.1 Current Index Strategy

- **organizations collection:** organization_name (unique index)
- **admin_users collection:** email (unique index), organization_id (index)
- **Per-organization collections:** created_at (index) + additional indexes as needed

### 6.2 Performance Optimization Recommendations

- Use compound indexes for common query patterns
- Implement caching layer (Redis) for frequently accessed organization metadata
- Use MongoDB aggregation pipeline for complex queries
- Monitor slow queries and add indexes accordingly
- Implement connection pooling with appropriate pool sizes
- Use projection to return only required fields

### 6.3 Implemented Security Measures

- bcrypt password hashing with configurable rounds (default: 12)
- JWT token-based authentication with expiration
- Token verification on protected endpoints
- Password complexity validation (uppercase, lowercase, digit, minimum 8 characters)
- Organization name validation to prevent injection attacks

### 6.4 Additional Security Recommendations

- Implement rate limiting to prevent brute force attacks
- Add refresh token mechanism for long-lived sessions
- Implement token blacklist for logout functionality
- Use HTTPS only in production
- Implement audit logging for sensitive operations
- Store JWT secret in secure vault (AWS Secrets Manager, HashiCorp Vault)

## 7. Conclusion & Summary

The Croupier architecture represents a well-balanced approach to multi-tenant organization management. The collection-per-tenant strategy provides good isolation and performance for the target scale of hundreds to thousands of organizations.

**Key Strengths**
- Clean layered architecture with clear separation of concerns
- Type-safe implementation with Pydantic models throughout
- Modern async-capable technology stack
- Good balance between isolation and operational complexity
- Scalable within reasonable tenant count limits

**Areas for Future Enhancement**
- Implement hybrid multi-tenancy strategy for extreme scale
- Add comprehensive monitoring and observability
- Implement caching layer for improved performance
- Add asynchronous collection migration for large datasets
- Implement advanced security features (2FA, SSO)

This architecture is production-ready for initial launch and can scale to thousands of organizations. With the recommended enhancements and hybrid approach, it can support much larger scale while maintaining security, performance, and operational efficiency.