# Module 09 - Access Control & Security: RBAC, SAS, Azure Key Vault

## Overview

Security is critical in data engineering. This module covers access control mechanisms, authentication, and secure storage of credentials in Azure.

## Learning Objectives

By the end of this module, you will understand:
- Types of access control in Azure
- Role-Based Access Control (RBAC)
- Shared Access Signatures (SAS)
- Azure Key Vault for secret management
- Security best practices
- Azure Storage vs Data Lake security


## Types of Access Control

Azure provides multiple mechanisms for controlling access to resources and data.

### 1. Authentication vs Authorization

**Authentication**: Verifying who you are (identity)
- Username/password
- Azure AD authentication
- Managed identities
- Service principals

**Authorization**: What you're allowed to do (permissions)
- RBAC roles
- Access policies
- ACLs (Access Control Lists)

### 2. Access Control Methods

#### Azure RBAC (Role-Based Access Control)
- **Purpose**: Control access to Azure resources
- **Scope**: Subscription, Resource Group, Resource
- **Granularity**: Resource-level permissions

#### Shared Access Signatures (SAS)
- **Purpose**: Delegated access to storage resources
- **Scope**: Storage account, container, blob
- **Granularity**: Fine-grained, time-limited access

#### Access Control Lists (ACLs)
- **Purpose**: File and directory-level permissions (ADLS Gen2)
- **Scope**: Files and directories
- **Granularity**: POSIX-style permissions

#### Azure Key Vault
- **Purpose**: Secure storage of secrets, keys, certificates
- **Scope**: Secrets, keys, certificates
- **Granularity**: Per-secret access policies


## Role-Based Access Control (RBAC)

**RBAC** is Azure's authorization system for managing access to Azure resources.

### Key Concepts

#### Roles
- **Definition**: Collection of permissions
- **Built-in Roles**: Predefined roles (Reader, Contributor, Owner)
- **Custom Roles**: Create your own roles

#### Principals
- **Users**: Individual users
- **Groups**: Azure AD groups
- **Service Principals**: Applications/services
- **Managed Identities**: Azure-managed identities

#### Scope
- **Management Group**: Multiple subscriptions
- **Subscription**: All resources in subscription
- **Resource Group**: Resources in group
- **Resource**: Individual resource

### Common Built-in Roles

#### Storage Account Roles
- **Storage Blob Data Reader**: Read blob data
- **Storage Blob Data Contributor**: Read/write blob data
- **Storage Blob Data Owner**: Full access to blob data

#### Data Factory Roles
- **Data Factory Contributor**: Manage Data Factory
- **Data Factory Operator**: Run pipelines

#### Synapse Roles
- **Synapse Contributor**: Manage Synapse workspace
- **Synapse SQL Administrator**: Administer SQL pools
- **Synapse Apache Spark Administrator**: Administer Spark pools

### RBAC Example

```json
{
  "roleDefinitionId": "/subscriptions/.../roleDefinitions/...",
  "principalId": "user-or-service-principal-id",
  "scope": "/subscriptions/.../resourceGroups/.../storageAccounts/..."
}
```

### Best Practices

✅ **Least Privilege**: Grant minimum required permissions
✅ **Use Groups**: Assign roles to groups, not individuals
✅ **Regular Reviews**: Review and audit access regularly
✅ **Custom Roles**: Create custom roles for specific needs
✅ **Resource-Level**: Use resource-level scope when possible


## Shared Access Signatures (SAS)

**SAS** provides secure, delegated access to Azure Storage resources without sharing account keys.

### Types of SAS

#### 1. Account SAS
- **Scope**: Storage account level
- **Access**: Multiple services (Blob, File, Queue, Table)
- **Use Case**: Administrative operations

#### 2. Service SAS
- **Scope**: Single service (Blob, File, etc.)
- **Access**: Specific service resources
- **Use Case**: Service-specific access

#### 3. User Delegation SAS
- **Scope**: Blob storage with Azure AD credentials
- **Access**: Uses Azure AD authentication
- **Use Case**: Modern, secure access

### SAS Parameters

- **Permissions**: Read, Write, Delete, List, etc.
- **Start Time**: When access begins
- **Expiry Time**: When access expires
- **IP Address**: Restrict to specific IPs
- **Protocol**: HTTPS only or both

### SAS Example

```
https://storageaccount.blob.core.windows.net/container/blob?
  sv=2021-06-08&
  ss=b&
  srt=sco&
  sp=rwdlacx&
  se=2024-12-31T23:59:59Z&
  st=2024-01-01T00:00:00Z&
  spr=https&
  sig=...
```

### Best Practices

✅ **Short Expiry**: Use short expiration times
✅ **HTTPS Only**: Require HTTPS protocol
✅ **Least Privilege**: Grant minimum required permissions
✅ **IP Restrictions**: Restrict to specific IPs when possible
✅ **Monitor Usage**: Track SAS usage
✅ **Revoke When Needed**: Store SAS tokens securely, revoke if compromised


## Azure Key Vault

**Azure Key Vault** is a cloud service for securely storing and accessing secrets, keys, and certificates.

### What Can Be Stored?

#### Secrets
- **Passwords**: Database passwords, API keys
- **Connection Strings**: Storage connection strings
- **Tokens**: Access tokens, SAS tokens

#### Keys
- **Encryption Keys**: Data encryption keys
- **Signing Keys**: Digital signature keys

#### Certificates
- **SSL/TLS Certificates**: Web server certificates
- **Code Signing Certificates**: Application signing

### Key Features

- **Secure Storage**: Encrypted at rest
- **Access Control**: RBAC and access policies
- **Audit Logging**: Track all access
- **Versioning**: Multiple versions of secrets
- **Automatic Rotation**: Auto-rotate secrets

### Use Cases

✅ **Store Credentials**: Database passwords, API keys
✅ **Connection Strings**: Secure storage connection strings
✅ **Application Secrets**: Application configuration secrets
✅ **Certificate Management**: SSL/TLS certificates

### Accessing Key Vault

#### From Applications
```python
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

credential = DefaultAzureCredential()
client = SecretClient(vault_url="https://vault.vault.azure.net/", credential=credential)

# Retrieve secret
secret = client.get_secret("storage-connection-string")
connection_string = secret.value
```

#### From Azure Data Factory
- Use Key Vault linked service
- Reference secrets in datasets and linked services
- Automatic credential retrieval

### Best Practices

✅ **Separate Vaults**: Use different vaults for different environments
✅ **Access Policies**: Limit access to necessary users/apps
✅ **Managed Identities**: Use managed identities for access
✅ **Rotation**: Regularly rotate secrets
✅ **Monitoring**: Monitor access and usage


## Azure Storage vs Data Lake Security

### Azure Blob Storage Security

#### Access Methods
- **Storage Account Keys**: Full access to account
- **SAS Tokens**: Delegated, time-limited access
- **Azure AD**: Azure AD authentication (preview)
- **RBAC**: Role-based access control

#### Limitations
- **Container-Level**: Access control at container level
- **No File-Level**: Cannot set permissions on individual files
- **Flat Structure**: No hierarchical permissions

### Azure Data Lake Storage Gen2 Security

#### Access Methods
- **RBAC**: Role-based access control
- **ACLs**: Access Control Lists (file/directory level)
- **Azure AD**: Native Azure AD integration
- **SAS Tokens**: Also supported

#### Advantages
- **File-Level ACLs**: Permissions on individual files
- **Directory-Level ACLs**: Permissions on directories
- **POSIX-Style**: Familiar POSIX permissions
- **Fine-Grained**: More granular control

### Comparison

| Feature | Blob Storage | Data Lake Gen2 |
|---------|-------------|----------------|
| **RBAC** | ✅ Yes | ✅ Yes |
| **SAS** | ✅ Yes | ✅ Yes |
| **File-Level ACLs** | ❌ No | ✅ Yes |
| **Directory ACLs** | ❌ No | ✅ Yes |
| **POSIX Permissions** | ❌ No | ✅ Yes |
| **Azure AD Integration** | Limited | ✅ Full |

### When to Use Which?

**Use Blob Storage when:**
- Simple object storage needs
- Container-level access is sufficient
- Cost optimization is priority

**Use Data Lake Gen2 when:**
- Need file/directory-level permissions
- Big data analytics workloads
- Require fine-grained access control


## Security Best Practices

### 1. Authentication

✅ **Use Managed Identities**: Prefer managed identities over keys
✅ **Azure AD Integration**: Use Azure AD for authentication
✅ **Multi-Factor Authentication**: Enable MFA for users
✅ **Service Principals**: Use service principals for applications

### 2. Authorization

✅ **Least Privilege**: Grant minimum required permissions
✅ **RBAC**: Use RBAC for resource access
✅ **Regular Audits**: Review access regularly
✅ **Remove Unused Access**: Remove access when no longer needed

### 3. Secrets Management

✅ **Key Vault**: Store all secrets in Key Vault
✅ **No Hardcoding**: Never hardcode secrets in code
✅ **Rotation**: Regularly rotate secrets
✅ **Versioning**: Use secret versioning

### 4. Network Security

✅ **Private Endpoints**: Use private endpoints when possible
✅ **Firewall Rules**: Restrict access by IP/network
✅ **VNet Integration**: Use virtual network integration
✅ **HTTPS Only**: Require HTTPS for all connections

### 5. Encryption

✅ **Encryption at Rest**: Enable encryption at rest
✅ **Encryption in Transit**: Use HTTPS/TLS
✅ **Customer-Managed Keys**: Use customer-managed keys when needed

### 6. Monitoring

✅ **Audit Logs**: Enable audit logging
✅ **Alerts**: Set up security alerts
✅ **Access Monitoring**: Monitor who accesses what
✅ **Anomaly Detection**: Detect unusual access patterns


## Summary

In this module, we've covered:

✅ Types of access control in Azure
✅ Role-Based Access Control (RBAC)
✅ Shared Access Signatures (SAS)
✅ Azure Key Vault for secret management
✅ Azure Storage vs Data Lake security
✅ Security best practices

### Key Takeaways

1. **RBAC** provides role-based access to Azure resources
2. **SAS** provides delegated, time-limited access to storage
3. **Key Vault** securely stores secrets, keys, and certificates
4. **Data Lake Gen2** provides finer-grained access control than Blob Storage
5. **Security** requires multiple layers: authentication, authorization, encryption
6. **Best Practices** include least privilege, secret rotation, and monitoring

### Next Steps

Proceed to **Module 10: Monitoring & Optimization** to learn about:
- DMVs (Dynamic Management Views)
- Portal monitoring
- Data skew and process skew
- Partitioning and data distribution
- Performance optimization
