Skip to content

GeorgiYovchev/log-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Log Processing Pipeline Documentation (Loki + Kafka + Fluent Bit)

1. Overview

This document describes how to deploy a full log processing pipeline using:

  • Kafka – message broker used as the ingestion buffer
  • Fluent Bit – consumer of Kafka messages and forwarder to Loki
  • Loki – log storage backend
  • Grafana – visualization UI (with GitLab OAuth)

Promtail is not used in this setup. Instead, Fluent Bit consumes directly from Kafka and pushes logs to Loki.


2. Architecture

Kubernetes → Fluent Bit → Kafka (buffer 3 days) → Fluent Bit → Loki (storage 7 days) → Grafana
                                  ↓                     ↓
                            Multi-Topic          Lua Processing
                            (per environment)    + Environment Labels

Storage Efficiency: ~12:1 compression ratio (Kafka 1.6GB → Loki 210MB)


3. Multi-Environment Setup

Kafka Topics

Six separate topics for environment isolation:

  • logs – Production (main cluster)
  • logs-staging – Staging environment
  • logs-prod-fr – Production Frankfurt cluster
  • logs-sport – Sport production cluster
  • logs-sport-stage – Sport staging cluster
  • logs-sport-iframes – Sport iframes cluster

Producer Configuration (Kubernetes)

Each cluster sends to its dedicated topic:

# Production
[OUTPUT]
    Name            kafka
    Match           kube.*
    Brokers         217.154.234.140:19092
    Topics          logs
    Format          json
    Retry_Limit     no_limits

# Sport Staging
[OUTPUT]
    Name            kafka
    Match           kube.*
    Brokers         217.154.234.140:19092
    Topics          logs-sport-stage
    Format          json
    Retry_Limit     no_limits

Consumer Configuration (Log Server)

Fluent Bit consumes from all topics with environment labels:

[INPUT]
    Name        kafka
    Topics      logs-sport-stage
    Brokers     kafka:9092
    Group_Id    fluentbit-consumer-sport-stage
    Format      json
    Tag         kafka.logs-sport-stage

[FILTER]
    Name         modify
    Match        kafka.logs-sport-stage
    Add          environment sport-stage

4. Initial Deployment

Directory Structure

mkdir -p /opt/log-stack/{data/{kafka,loki,grafana},lua}
cd /opt/log-stack

# Set proper permissions
sudo chown -R 1001:1001 /opt/log-stack/data/kafka
sudo chmod -R 777 /opt/log-stack/data/loki
sudo chown -R 472:472 /opt/log-stack/data/grafana

Configuration Files

All configuration files are in this repository:

Important: Edit docker-compose.yml and replace 217.154.234.140 with your actual server IP.

Lua Script Processing

The set_level.lua script processes logs before storage:

Functions:

  • Log Level Detection: Automatically sets level field based on log content (error/warning/info)
  • Timestamp Normalization: Moves @timestamp to root level for Loki compatibility
  • Field Cleanup: Removes unnecessary Kubernetes metadata fields to reduce storage

Fields Removed:

  • kubernetes.docker_id
  • kubernetes.pod_ip
  • kubernetes.container_hash
  • stream, partition, offset, _p, topic

Example:

-- Input log
{
  "log": "ERROR: Database connection failed",
  "@timestamp": "2025-01-15T12:00:00Z",
  "kubernetes": {
    "docker_id": "abc123",
    "pod_name": "app-pod"
  }
}

-- Output after Lua processing
{
  "log": "ERROR: Database connection failed",
  "level": "error",
  "kubernetes": {
    "pod_name": "app-pod"
  }
}

Grafana OAuth (GitLab)

Grafana configured with GitLab authentication:

environment:
  - GF_AUTH_GITLAB_ENABLED=true
  - GF_AUTH_GITLAB_ALLOW_SIGN_UP=true
  - GF_AUTH_GITLAB_ALLOWED_DOMAINS=oddstech.net
  - GF_AUTH_GITLAB_ROLE_ATTRIBUTE_PATH=contains(groups[*], 'devops') && 'Admin' || 'Editor'
  - GF_SERVER_ROOT_URL=https://loki.oddstech.net

Access: https://loki.oddstech.net (GitLab OAuth)

Deploy

cd /opt/log-stack
docker compose up -d
docker compose ps

5. Verification

Check All Topics

# List topics
docker exec kafka kafka-topics.sh --list --bootstrap-server localhost:9092

# Check all consumer groups
docker exec kafka kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --list

# Check lag for specific environment
docker exec kafka kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --describe --group fluentbit-consumer-sport-stage

Grafana Queries

# All environments
{job="kafka_consumer"}

# Specific environment
{job="kafka_consumer", environment="sport-stage"}

# All sport environments
{job="kafka_consumer", environment=~"sport.*"}

# Errors across all environments
{job="kafka_consumer", level="error"}

# Count by environment (last 5 min)
sum by (environment) (rate({job="kafka_consumer"}[5m]))

6. CI/CD and Configuration Management

Why CI/CD?

  • Zero Data Loss: Only changed services restart
  • Validation: Configs tested before deployment (YAML, Lua syntax)
  • Auditability: All changes tracked in Git
  • Rollback: Easy revert if needed

Repository Structure

log-stack/
├── docker-compose.yml        # Multi-topic Kafka + services
├── fluent-bit.conf           # 6 Kafka inputs with environment labels
├── loki-config.yml
├── lua/
│   └── set_level.lua
├── ansible/
│   ├── playbook.yml          # Smart restart (down → up)
│   └── inventory.yml
├── .gitlab-ci.yml
└── .gitattributes            # Line ending normalization

Setup

  1. Add SSH key to GitLab:

    • Settings → CI/CD → Variables
    • Key: SSH_PRIVATE_KEY
    • Value: Your SSH private key (PEM format, base64 encoded)
    • Flags: ✅ Protect variable
  2. Edit ansible/inventory.yml:

   all:
     children:
       log_stack:
         hosts:
           log-server:
             ansible_host: YOUR_IP
             ansible_user: root

Workflow

# 1. Create feature branch
git checkout -b feature/add-new-environment

# 2. Edit config (e.g., fluent-bit.conf)
# Add new environment topic

# 3. Push and create MR
git add fluent-bit.conf
git commit -m "feat: Add logs-new-env topic"
git push origin feature/add-new-environment

# 4. Create MR → Pipeline validates YAML syntax
# 5. Merge to main → Ansible deploys → Only Fluent Bit restarts

Smart Restart Matrix

Change Services Restarted Data Loss
Kafka config in docker-compose.yml ✅ Kafka only ❌ No
Loki config in loki-config.yml ✅ Loki only ❌ No
Fluent Bit config in fluent-bit.conf ✅ Fluent Bit only ❌ No
Lua script in lua/set_level.lua ✅ Fluent Bit only ❌ No
Grafana config in docker-compose.yml ✅ Grafana only ❌ No
Kafka-init topics ✅ Kafka-init recreate ❌ No

Why No Data Loss?

  • Volumes persist in /opt/log-stack/data/
  • Smart restart uses down → up instead of restart
  • Kafka retains messages during Fluent Bit restart

7. Monitoring

# Data sizes
du -sh /opt/log-stack/data/{kafka,loki,grafana}

# Loki metrics
curl http://localhost:3100/metrics | grep loki_distributor_lines_received_total

# Check all consumer lag
for group in fluentbit-consumer fluentbit-consumer-staging fluentbit-consumer-prod-fr fluentbit-consumer-sport fluentbit-consumer-sport-stage fluentbit-consumer-sport-iframes; do
  docker exec kafka kafka-consumer-groups.sh \
    --bootstrap-server localhost:9092 \
    --describe --group $group
done

8. Troubleshooting

Kafka

docker logs kafka --tail=50
docker exec kafka kafka-topics.sh --list --bootstrap-server localhost:9092

# Check consumer groups
docker exec kafka kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --list

Fluent Bit

docker logs fluent-bit --tail=50

# Check Lua script execution
docker logs fluent-bit | grep -i lua

# Check environment label processing
docker logs fluent-bit | grep -i environment

Loki

curl http://localhost:3100/ready
docker logs loki --tail=50

Network

# Test external Kafka connectivity
telnet YOUR_PUBLIC_IP 19092

9. Summary

Pipeline: Kubernetes → Kafka (multi-topic) → Fluent Bit (+ Lua + environment labels) → Loki → Grafana

Key Features:

  • Multi-Environment: 6 separate Kafka topics for environment isolation
  • Environment Labels: Automatic labeling by environment in Loki
  • Kafka: 3 days retention, port 19092 for external producers
  • Loki: 7 days retention, 12x compression ratio
  • Lua Processing: Automatic log level detection and field cleanup
  • Grafana: GitLab OAuth authentication
  • Persistent Storage: /opt/log-stack/data/
  • CI/CD: GitLab + Ansible with smart restart logic (down → up)
  • Zero Downtime: Only changed services restart

Active Topics:

  • logs (prod), logs-staging, logs-prod-fr
  • logs-sport, logs-sport-stage, logs-sport-iframes

Access:

About

CI/CD pipeline for Kafka → Fluent Bit → Loki → Grafana stack. Auto-validation + Ansible deployment with smart restarts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages