# Chapter 43: Logging in CI/CD

In distributed systems spanning hundreds of microservices across multiple Kubernetes clusters, traditional file-based logging becomes untenable. Containers are ephemeral, nodes are cattle not pets, and logs scattered across thousands of files provide no systemic visibility. Logging in CI/CD extends beyond application debugging to encompass audit trails of who deployed what when, build pipeline diagnostics, and security forensics. Structured logging—machine-parseable, queryable, and correlated across service boundaries—transforms log data from unstructured text into operational intelligence.

This chapter establishes comprehensive logging strategies for cloud-native environments, from application instrumentation through centralized aggregation, covering retention policies, sensitive data handling, and integration with continuous delivery pipelines.

## 43.1 Structured Logging

Structured logging replaces human-readable text with machine-parseable formats (JSON), enabling efficient filtering, aggregation, and correlation across distributed systems.

### JSON Logging Format

**Traditional Unstructured Log:**
```
2024-01-15 10:23:45 INFO Payment processed successfully for user 12345 amount $99.99 transaction abc-123
```

**Structured Log:**
```json
{
  "timestamp": "2024-01-15T10:23:45.123Z",
  "level": "INFO",
  "service": "payment-service",
  "trace_id": "abc123def456",
  "span_id": "span789",
  "message": "Payment processed",
  "attributes": {
    "user_id": "12345",
    "amount": 99.99,
    "currency": "USD",
    "transaction_id": "abc-123",
    "payment_method": "credit_card",
    "duration_ms": 145
  },
  "source": {
    "file": "PaymentProcessor.java",
    "line": 45,
    "function": "processPayment"
  }
}
```

**Explanation:**
The structured format separates concerns: timestamps are ISO-8601 (sortable), levels are standardized (DEBUG, INFO, WARN, ERROR), and contextual data resides in typed fields rather than embedded in strings. This enables queries like `attributes.amount > 100 AND attributes.currency = "EUR"` that would be impossible with grep on unstructured text.

### Application Implementation (Java/Logback)

```java
// PaymentService.java
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
import net.logstash.logback.argument.StructuredArguments;

@Service
public class PaymentService {
    private static final Logger logger = LoggerFactory.getLogger(PaymentService.class);
    
    public PaymentResult processPayment(PaymentRequest request) {
        // Add correlation context to all logs in this request
        MDC.put("trace_id", request.getTraceId());
        MDC.put("span_id", generateSpanId());
        MDC.put("user_id", request.getUserId());
        
        long startTime = System.currentTimeMillis();
        
        try {
            PaymentResult result = executePayment(request);
            
            long duration = System.currentTimeMillis() - startTime;
            
            // Structured logging with Logstash encoder
            logger.info("Payment processed successfully",
                StructuredArguments.keyValue("transaction_id", result.getTransactionId()),
                StructuredArguments.keyValue("amount", request.getAmount()),
                StructuredArguments.keyValue("currency", request.getCurrency()),
                StructuredArguments.keyValue("duration_ms", duration),
                StructuredArguments.keyValue("payment_method", request.getPaymentMethod())
            );
            
            return result;
        } catch (PaymentException e) {
            logger.error("Payment processing failed",
                StructuredArguments.keyValue("error_code", e.getErrorCode()),
                StructuredArguments.keyValue("error_type", e.getClass().getSimpleName()),
                StructuredArguments.keyValue("retryable", e.isRetryable()),
                e // Include stack trace
            );
            throw e;
        } finally {
            MDC.clear(); // Clean up context
        }
    }
}
```

**Logback Configuration (logback-spring.xml):**
```xml
<configuration>
    <!-- Console appender with JSON formatting -->
    <appender name="JSON_CONSOLE" class="ch.qos.logback.core.ConsoleAppender">
        <encoder class="net.logstash.logback.encoder.LogstashEncoder">
            <includeContext>true</includeContext>
            <includeMdc>true</includeMdc>
            <fieldNames>
                <timestamp>timestamp</timestamp>
                <message>message</message>
                <logger>logger</logger>
                <thread>thread</thread>
                <level>level</level>
                <levelValue>[ignore]</levelValue>
            </fieldNames>
            <pattern>
                {
                    "service": "payment-service",
                    "environment": "${ENVIRONMENT:-development}",
                    "version": "${VERSION:-unknown}"
                }
            </pattern>
        </encoder>
    </appender>
    
    <!-- Root logger -->
    <root level="INFO">
        <appender-ref ref="JSON_CONSOLE" />
    </root>
    
    <!-- Specific package levels -->
    <logger name="com.company.payment" level="DEBUG" />
    <logger name="org.springframework.web" level="WARN" />
</configuration>
```

**Explanation:**
- **MDC (Mapped Diagnostic Context)**: Thread-local map that adds fields to every log entry in the request lifecycle. `trace_id` injected here appears in all subsequent logs without explicit passing.
- **LogstashEncoder**: Outputs JSON format compatible with ELK/EFK stacks. The `includeMdc` setting automatically includes MDC fields in the JSON output.
- **StructuredArguments**: Type-safe key-value pairs that serialize correctly (numbers remain numbers, not strings).
- **Pattern**: Adds static fields (service name, environment) to every log entry for filtering.

### Node.js Implementation (Winston)

```javascript
// logger.js
const winston = require('winston');

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp({ format: 'ISO8601' }),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: {
    service: 'order-service',
    environment: process.env.NODE_ENV || 'development',
    version: process.env.VERSION || 'unknown',
    host: require('os').hostname()
  },
  transports: [
    new winston.transports.Console({
      format: winston.format.json()
    })
  ]
});

// Middleware to add request context
function requestLogger(req, res, next) {
  const childLogger = logger.child({
    trace_id: req.headers['x-trace-id'] || generateTraceId(),
    span_id: generateSpanId(),
    user_id: req.user?.id,
    request_method: req.method,
    request_path: req.path,
    client_ip: req.ip
  });
  
  req.logger = childLogger;
  
  // Log request start
  childLogger.info('Request started');
  
  // Log response on finish
  res.on('finish', () => {
    childLogger.info('Request completed', {
      status_code: res.statusCode,
      duration_ms: Date.now() - req.startTime,
      response_size: res.get('Content-Length')
    });
  });
  
  next();
}

// Usage in routes
app.post('/orders', (req, res) => {
  req.logger.info('Processing order', { 
    order_id: req.body.orderId,
    amount: req.body.amount 
  });
  
  // Business logic...
});
```

**Explanation:**
Winston's `format.combine` chains multiple formatters. `winston.format.json()` outputs JSON. The `logger.child()` method creates a contextual logger that inherits configuration but adds request-specific metadata. This avoids passing logger instances through every function call.

## 43.2 Logging Architecture

### EFK Stack (Elasticsearch, Fluentd/Fluent Bit, Kibana)

**Architecture Flow:**
```
Application → Stdout → Fluent Bit (DaemonSet) → Elasticsearch → Kibana
```

**Fluent Bit Configuration (Kubernetes DaemonSet):**
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10
        
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        Keep_Log            Off
        Labels              On
        Annotations         Off
        
    [FILTER]
        Name                nest
        Match               kube.*
        Operation           lift
        Nested_under        kubernetes
        Add_prefix          k8s_
        
    [FILTER]
        Name                modify
        Match               kube.*
        Rename              message log
        Rename              k8s_container_name container_name
        Rename              k8s_namespace_name namespace
        
    [OUTPUT]
        Name            es
        Match           kube.*
        Host            elasticsearch-master
        Port            9200
        Logstash_Format On
        Logstash_Prefix k8s-logs
        Retry_Limit     False
        Suppress_Type_Name On
        Trace_Error     On
        
  parsers.conf: |
    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On
        Decode_Field_As escaped log do_next
        Decode_Field_As json log
```

**Explanation:**
- **INPUT**: `tail` plugin reads container logs from `/var/log/containers/*.log` (symlinks to Docker/containerd log files).
- **FILTER kubernetes**: Enriches logs with Pod metadata (labels, namespace, container name) by querying the Kubernetes API.
- **FILTER modify**: Renames fields to match Elasticsearch expectations (e.g., `message` → `log`).
- **OUTPUT**: Sends to Elasticsearch with Logstash format (daily indices like `k8s-logs-2024.01.15`).

### PLG Stack (Promtail, Loki, Grafana)

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus.

**Promtail Configuration:**
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
  namespace: logging
data:
  promtail.yaml: |
    server:
      log_level: info
      http_listen_port: 3101
      
    clients:
      - url: http://loki:3100/loki/api/v1/push
        tenant_id: production
        
    positions:
      filename: /run/promtail/positions.yaml
      
    scrape_configs:
      - job_name: kubernetes-pods
        kubernetes_sd_configs:
          - role: pod
            namespaces:
              names:
                - production
                - staging
                
        pipeline_stages:
          - cri: {}  # Parse containerd/cri-o format
          
          - json:
              expressions:
                level: level
                message: message
                trace_id: trace_id
                service: service
                
          - labels:
              level:
              service:
              trace_id:
              
          - timestamp:
              source: timestamp
              format: RFC3339
              
          - output:
              source: message
              
        relabel_configs:
          - source_labels: ['__meta_kubernetes_pod_node_name']
            target_label: 'node'
          - source_labels: ['__meta_kubernetes_namespace']
            target_label: 'namespace'
          - source_labels: ['__meta_kubernetes_pod_name']
            target_label: 'pod'
          - source_labels: ['__meta_kubernetes_container_name']
            target_label: 'container'
```

**Explanation:**
Promtail runs on each node, tails log files, and pushes to Loki. The `pipeline_stages` parse JSON logs, extracting labels (indexed fields) like `service` and `trace_id`. Unlike Elasticsearch, Loki only indexes labels, not full log content, making it cheaper for high-volume logs. The `relabel_configs` add Kubernetes metadata as labels for filtering.

## 43.3 Kubernetes Logging Patterns

### Container Logging Standard

Containers should log to stdout/stderr, not files:

```dockerfile
# Good - logs to stdout
CMD ["java", "-jar", "app.jar"]

# Bad - logs to file inside container
CMD ["java", "-jar", "app.jar", ">", "/var/log/app.log"]
```

**Log Rotation (container runtime level):**
```yaml
# /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3",
    "labels": "production_status,environment",
    "env": "OS_VERSION"
  }
}
```

**Explanation:**
Docker's `json-file` driver captures stdout/stderr. `max-size` and `max-file` prevent disk exhaustion. Kubernetes automatically rotates these. Logging agents (Fluent Bit, Promtail) tail these files on the host, not inside containers.

### Sidecar Logging Pattern

For legacy applications that log to files:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: legacy-app
spec:
  containers:
    - name: legacy-app
      image: company/legacy-app:1.0
      volumeMounts:
        - name: log-volume
          mountPath: /var/log/legacy
          
    - name: log-shipper
      image: fluent/fluent-bit:2.1
      volumeMounts:
        - name: log-volume
          mountPath: /var/log/legacy
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
      command: ["/fluent-bit/bin/fluent-bit"]
      args: ["-c", "/fluent-bit/etc/fluent-bit.conf"]
      
  volumes:
    - name: log-volume
      emptyDir: {}
    - name: fluent-bit-config
      configMap:
        name: fluent-bit-sidecar
```

**Explanation:**
The `emptyDir` volume is shared between containers. The legacy app writes logs to files; the sidecar (Fluent Bit) tails those files and forwards to the aggregation system. This pattern avoids modifying legacy code while achieving centralized logging.

## 43.4 Correlation and Tracing Context

### Trace ID Propagation

Trace IDs correlate logs across distributed services:

```java
// MDCFilter.java
@Component
public class TraceIdFilter implements Filter {
    
    @Override
    public void doFilter(ServletRequest request, ServletResponse response, 
                        FilterChain chain) throws IOException, ServletException {
        
        HttpServletRequest httpRequest = (HttpServletRequest) request;
        
        // Extract trace ID from incoming request or generate new
        String traceId = httpRequest.getHeader("X-Trace-ID");
        if (traceId == null || traceId.isEmpty()) {
            traceId = UUID.randomUUID().toString().replace("-", "");
        }
        
        // Extract span ID
        String spanId = httpRequest.getHeader("X-Span-ID");
        if (spanId == null) {
            spanId = generateSpanId();
        }
        
        // Add to MDC (appears in all logs)
        MDC.put("trace_id", traceId);
        MDC.put("span_id", spanId);
        MDC.put("parent_span_id", httpRequest.getHeader("X-Parent-Span-ID"));
        
        // Propagate to downstream services
        HttpServletResponse httpResponse = (HttpServletResponse) response;
        httpResponse.setHeader("X-Trace-ID", traceId);
        
        try {
            chain.doFilter(request, response);
        } finally {
            MDC.clear(); // Prevent leakage to next request
        }
    }
}
```

**Downstream HTTP Client:**
```java
public class PaymentClient {
    
    public PaymentResult charge(PaymentRequest request) {
        // Current trace context
        String traceId = MDC.get("trace_id");
        String spanId = generateSpanId();
        String currentSpanId = MDC.get("span_id");
        
        HttpHeaders headers = new HttpHeaders();
        headers.set("X-Trace-ID", traceId);
        headers.set("X-Span-ID", spanId);
        headers.set("X-Parent-Span-ID", currentSpanId);
        
        ResponseEntity<PaymentResult> response = restTemplate.exchange(
            "http://payment-service/charge",
            HttpMethod.POST,
            new HttpEntity<>(request, headers),
            PaymentResult.class
        );
        
        return response.getBody();
    }
}
```

**Explanation:**
The filter extracts or generates trace IDs on entry, places them in MDC (thread-local storage), and ensures all logs for that request include the trace ID. HTTP clients propagate the trace ID to downstream services via headers. This creates a complete request chain across services.

### Querying Correlated Logs

**Kibana Query:**
```json
{
  "query": {
    "bool": {
      "must": [
        { "match": { "trace_id": "abc123def456" }},
        { "range": { "timestamp": { 
          "gte": "now-1h", 
          "lte": "now" 
        }}}
      ]
    }
  },
  "sort": [
    { "timestamp": { "order": "asc" }}
  ]
}
```

**Loki Query (LogQL):**
```logql
{service=~"payment-service|order-service"} 
  |= "trace_id=\"abc123def456\""
  | json
  | line_format "{{.timestamp}} [{{.service}}] {{.message}}"
```

**Explanation:**
The Loki query searches across services (`payment-service` or `order-service`) for logs containing the trace ID, parses JSON to extract fields, and formats output chronologically. This reconstructs the entire request flow across microservices.

## 43.5 Security and Sensitive Data

### PII Redaction

Automatic removal of sensitive data:

```java
@Component
public class PiiMaskingFilter {
    
    private static final Pattern CREDIT_CARD_PATTERN = 
        Pattern.compile("\\b(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\b");
    private static final Pattern EMAIL_PATTERN = 
        Pattern.compile("\\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Z|a-z]{2,}\\b");
    
    public String mask(String message) {
        String masked = CREDIT_CARD_PATTERN.matcher(message)
            .replaceAll(match -> {
                String card = match.group();
                return "****-****-****-" + card.substring(card.length() - 4);
            });
            
        masked = EMAIL_PATTERN.matcher(masked)
            .replaceAll(match -> {
                String email = match.group();
                int atIndex = email.indexOf('@');
                return email.charAt(0) + "***@" + email.substring(atIndex + 1);
            });
            
        return masked;
    }
}
```

**Logback Masking Encoder:**
```xml
<encoder class="com.company.logging.MaskingEncoder">
    <maskPatterns>
        <pattern>ssn=\d{3}-\d{2}-\d{4}</pattern>
        <pattern>password=[^&\s]+</pattern>
        <pattern>token=[a-zA-Z0-9]{32}</pattern>
    </maskPatterns>
    <layout class="net.logstash.logback.layout.LogstashLayout"/>
</encoder>
```

**Explanation:**
Masking happens at the logging framework level before output. Credit cards show only last 4 digits, emails mask the local part. This prevents accidental credential exposure in logs while retaining debugging usefulness.

### Audit Logging

Separate audit trail for compliance:

```java
@Service
public class AuditService {
    
    private static final Logger auditLogger = 
        LoggerFactory.getLogger("AUDIT");
    
    public void logSecurityEvent(SecurityEvent event) {
        try {
            ObjectMapper mapper = new ObjectMapper();
            String auditEntry = mapper.writeValueAsString(Map.of(
                "timestamp", Instant.now().toString(),
                "event_type", event.getType(),
                "user_id", event.getUserId(),
                "action", event.getAction(),
                "resource", event.getResource(),
                "outcome", event.getOutcome(),
                "source_ip", event.getSourceIp(),
                "user_agent", event.getUserAgent(),
                "correlation_id", MDC.get("trace_id"),
                "integrity_hash", calculateHash(event)
            ));
            
            auditLogger.info(auditEntry);
        } catch (JsonProcessingException e) {
            // Fail safe - log error but don't throw
            auditLogger.error("Failed to serialize audit event", e);
        }
    }
    
    private String calculateHash(SecurityEvent event) {
        // HMAC for tamper detection
        // Implementation...
    }
}
```

**Explanation:**
Audit logs are immutable, tamper-evident records of security-relevant events (logins, data access, configuration changes). They use a separate logger category (`AUDIT`) that routes to separate storage with stricter retention and access controls. Integrity hashes detect tampering.

## 43.6 CI/CD Pipeline Logging

### Build Log Aggregation

GitHub Actions workflow with structured logging:

```yaml
name: Build and Deploy
on: [push]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Logging
        run: |
          echo "::group::Environment Setup"
          echo "Setting up build environment..."
          echo "::endgroup::"
          
      - name: Build Application
        id: build
        run: |
          echo "::notice::Starting build process"
          
          mvn clean package | tee build.log | while IFS= read -r line; do
            if [[ $line == *"ERROR"* ]]; then
              echo "::error::$line"
            elif [[ $line == *"WARNING"* ]]; then
              echo "::warning::$line"
            fi
          done
          
          echo "::set-output name=status::success"
          echo "::set-output name=duration::$(date +%s)"
          
      - name: Upload Logs
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: build-logs-${{ github.run_id }}
          path: |
            build.log
            target/surefire-reports/
            
      - name: Notify on Failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "Build failed",
              "attachments": [{
                "color": "danger",
                "fields": [
                  {"title": "Repository", "value": "${{ github.repository }}", "short": true},
                  {"title": "Commit", "value": "${{ github.sha }}", "short": true},
                  {"title": "Author", "value": "${{ github.actor }}", "short": true},
                  {"title": "Logs", "value": "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}", "short": false}
                ]
              }]
            }
```

**Explanation:**
GitHub Actions commands like `::error::` and `::group::` create collapsible, annotated logs. The `tee` command saves logs to file while displaying them. Artifacts preserve logs post-build. Slack notifications include direct links to run logs.

### GitLab CI Logging

```yaml
build:
  stage: build
  script:
    - echo "CI_JOB_ID=${CI_JOB_ID}" > build.env
    - echo "CI_COMMIT_SHA=${CI_COMMIT_SHA}" >> build.env
    
    - |
      docker build --progress=plain -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA . 2>&1 | 
      tee build.log | 
      grep -E "(^Step|ERROR|Successfully built)"
    
    - |
      if [ ${PIPESTATUS[0]} -ne 0 ]; then
        echo "Build failed with errors:"
        tail -n 100 build.log
        exit 1
      fi
  
  artifacts:
    when: always
    expire_in: 1 week
    paths:
      - build.log
      - build.env
    reports:
      junit: target/surefire-reports/TEST-*.xml
  
  after_script:
    - |
      curl -X POST $LOG_AGGREGATOR_URL \
        -H "Content-Type: application/json" \
        -d "{
          \"job_id\": \"$CI_JOB_ID\",
          \"pipeline_id\": \"$CI_PIPELINE_ID\",
          \"status\": \"$CI_JOB_STATUS\",
          \"logs\": $(cat build.log | base64 -w 0),
          \"timestamp\": \"$(date -Iseconds)\"
        }"
```

**Explanation:**
GitLab CI captures logs as artifacts with `when: always` (preserved even on failure). The `after_script` pushes logs to a central aggregator (Elasticsearch, Splunk) for long-term retention and cross-pipeline analysis. JUnit reports integrate with GitLab's test visualization.

## 43.7 Log Retention and Cost

### Tiered Storage Strategy

```yaml
# Elasticsearch ILM (Index Lifecycle Management) Policy
PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_primary_shard_size": "50gb",
            "max_age": "1d"
          }
        }
      },
      "warm": {
        "min_age": "2d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "cold": {
        "min_age": "7d",
        "actions": {
          "allocate": {
            "require": {
              "data": "cold"
            }
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
```

**Explanation:**
- **Hot**: Recent logs (1-2 days) on fast SSD storage, searchable in real-time
- **Warm**: Older logs (2-7 days) on cheaper storage, optimized (shrunk, merged)
- **Cold**: Archived logs (7-90 days) on slow/cheap storage (S3), rarely queried
- **Delete**: Compliance-mandated deletion after 90 days (GDPR, PCI-DSS)

### Sampling High-Volume Logs

```yaml
# Fluent Bit sampling configuration
[FILTER]
    Name          sampling
    Match         kube.*
    Sampling      10  # Log 1 out of 10 entries for high-volume services
    
[FILTER]
    Name          grep
    Match         kube.*
    Regex         level (ERROR|WARN)  # Always keep errors
    
[FILTER]
    Name          modify
    Match         kube.*
    Condition     Key_does_not_exist level
    Add           level INFO
```

**Explanation:**
For high-traffic services (10k logs/second), sampling reduces volume. The `Sampling` filter keeps only 10% of logs, but the `grep` filter ensures all ERROR/WARN logs are retained regardless of sampling. This balances cost with observability.

## 43.8 Integration with Observability

### Unified Observability (Logs, Metrics, Traces)

**OpenTelemetry Collector Configuration:**
```yaml
# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
        
  filelog:
    include: [/var/log/containers/*.log]
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.timestamp
          layout: '%Y-%m-%dT%H:%M:%S.%LZ'
          
processors:
  resource:
    attributes:
      - key: environment
        value: production
        action: upsert
        
  batch:
    timeout: 1s
    send_batch_size: 1024
    
exporters:
  loki:
    endpoint: http://loki:3100/loki/api/v1/push
    
  prometheusremotewrite:
    endpoint: http://prometheus:9090/api/v1/write
    
  jaeger:
    endpoint: jaeger-collector:14250
    tls:
      insecure: true
      
service:
  pipelines:
    logs:
      receivers: [otlp, filelog]
      processors: [resource, batch]
      exporters: [loki]
      
    metrics:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [prometheusremotewrite]
      
    traces:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [jaeger]
```

**Explanation:**
The OpenTelemetry Collector receives logs, metrics, and traces via OTLP (OpenTelemetry Protocol), processes them (batching, enrichment), and exports to backend systems (Loki for logs, Prometheus for metrics, Jaeger for traces). This unifies observability data under a single pipeline.

### Correlating Logs with Metrics

```yaml
# Recording rule for high error rate
groups:
  - name: payment_alerts
    rules:
      - record: payment:error_rate_5m
        expr: |
          sum(rate(payment_failures_total[5m])) 
          / 
          sum(rate(payment_total[5m]))
          
      - alert: HighPaymentFailureRate
        expr: payment:error_rate_5m > 0.05
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High payment failure rate detected"
          dashboard: "https://grafana.company.com/d/payment-dashboard?var-trace_id={{ $labels.trace_id }}"
          logs: "https://grafana.company.com/explore?orgId=1&left=%7B%22datasource%22:%22Loki%22,%22queries%22:%5B%7B%22expr%22:%22%7Bservice%3D%5C%22payment-service%5C%22%7D%20%7C%3D%20%5C%22{{ $labels.trace_id }}%5C%22%22%7D%5D%7D"
```

**Explanation:**
When the alert fires, the annotation includes direct links to Grafana dashboards filtered by the trace ID from the metric labels. This bridges metrics (the alert) to logs (the context) instantly, reducing MTTR (Mean Time To Resolution).

---

## Chapter Summary and Preview

This chapter established logging as a critical observability pillar in CI/CD ecosystems, extending beyond simple debugging to encompass audit trails, security forensics, and operational intelligence. We examined structured logging formats (JSON) that replace human-readable text with machine-parseable data, enabling efficient filtering and correlation across distributed microservices. The implementation patterns in Java (Logback with Logstash encoder) and Node.js (Winston) demonstrated how Mapped Diagnostic Context (MDC) and child loggers propagate trace IDs across thread boundaries and service calls without explicit parameter passing.

The logging architecture section compared the EFK stack (Elasticsearch, Fluent Bit, Kibana) suitable for full-text search and complex analytics, against the PLG stack (Promtail, Loki, Grafana) optimized for cost-effective, label-based log aggregation inspired by Prometheus. Kubernetes-specific patterns including DaemonSet log collectors, sidecar patterns for legacy applications, and stdout/stderr streaming ensure comprehensive log capture from ephemeral containers without persistent volumes.

Security considerations mandated PII redaction at the logging framework level, automatic masking of credit cards and credentials before output, and separate audit logging streams with integrity hashing for compliance requirements. CI/CD pipeline logging strategies captured build artifacts, test results, and deployment events with structured metadata linking commits, authors, and outcomes to enable post-failure analysis and compliance auditing.

Retention policies using Elasticsearch Index Lifecycle Management (ILM) tier data across hot, warm, cold, and deletion phases, balancing query performance against storage costs. Sampling strategies for high-volume services ensure error logs are always retained while reducing info-level verbosity. OpenTelemetry integration unified logs, metrics, and traces under a single pipeline, enabling correlation between alerting metrics and contextual logs through trace ID propagation.

**Key Takeaways:**
- Always use structured JSON logging with standardized field names (timestamp, level, service, trace_id, message) to enable machine parsing and efficient querying across distributed systems.
- Implement trace ID propagation using MDC (Java) or async hooks (Node.js) to correlate logs across microservices; ensure HTTP clients forward trace context via headers to maintain request chains.
- Deploy log collectors (Fluent Bit or Promtail) as Kubernetes DaemonSets to capture container stdout/stderr without requiring applications to manage file logging or log rotation.
- Separate security audit logs from application logs, routing them to immutable storage with integrity verification and stricter access controls than operational logs.
- Implement PII redaction at the logging framework level using pattern matching or sanitization serializers to prevent accidental credential or personal data exposure in centralized log aggregators.
- Use tiered retention policies (hot/warm/cold/delete) to manage costs, keeping recent logs on fast storage for debugging while archiving older logs to object storage for compliance.

**Next Chapter Preview:**
Chapter 44: Metrics and Monitoring establishes quantitative observability through time-series data collection, aggregation, and alerting. We will examine Prometheus architecture and metric types (counters, gauges, histograms, summaries), service discovery for dynamic Kubernetes targets, PromQL query language for operational analysis, and Grafana dashboard design for visualization. The chapter covers SLO (Service Level Objective) definition and SLI (Service Level Indicator) implementation, error budget burn rate alerting, and integration with CI/CD pipelines for canary analysis and progressive delivery decision-making, completing the observability triad (logs, metrics, traces) with quantitative reliability engineering practices.