Log Processing Pipeline Documentation (Loki + Kafka + Fluent Bit)

1. Overview

This document describes how to deploy a full log processing pipeline using:

Kafka – message broker used as the ingestion buffer
Fluent Bit – consumer of Kafka messages and forwarder to Loki
Loki – log storage backend
Grafana – visualization UI (with GitLab OAuth)

Promtail is not used in this setup. Instead, Fluent Bit consumes directly from Kafka and pushes logs to Loki.

2. Architecture

Kubernetes → Fluent Bit → Kafka (buffer 3 days) → Fluent Bit → Loki (storage 7 days) → Grafana
                                  ↓                     ↓
                            Multi-Topic          Lua Processing
                            (per environment)    + Environment Labels

Storage Efficiency: ~12:1 compression ratio (Kafka 1.6GB → Loki 210MB)

3. Multi-Environment Setup

Kafka Topics

Six separate topics for environment isolation:

logs – Production (main cluster)
logs-staging – Staging environment
logs-prod-fr – Production Frankfurt cluster
logs-sport – Sport production cluster
logs-sport-stage – Sport staging cluster
logs-sport-iframes – Sport iframes cluster

Producer Configuration (Kubernetes)

Each cluster sends to its dedicated topic:

# Production
[OUTPUT]
    Name            kafka
    Match           kube.*
    Brokers         217.154.234.140:19092
    Topics          logs
    Format          json
    Retry_Limit     no_limits

# Sport Staging
[OUTPUT]
    Name            kafka
    Match           kube.*
    Brokers         217.154.234.140:19092
    Topics          logs-sport-stage
    Format          json
    Retry_Limit     no_limits

Consumer Configuration (Log Server)

Fluent Bit consumes from all topics with environment labels:

[INPUT]
    Name        kafka
    Topics      logs-sport-stage
    Brokers     kafka:9092
    Group_Id    fluentbit-consumer-sport-stage
    Format      json
    Tag         kafka.logs-sport-stage

[FILTER]
    Name         modify
    Match        kafka.logs-sport-stage
    Add          environment sport-stage

4. Initial Deployment

Directory Structure

mkdir -p /opt/log-stack/{data/{kafka,loki,grafana},lua}
cd /opt/log-stack

# Set proper permissions
sudo chown -R 1001:1001 /opt/log-stack/data/kafka
sudo chmod -R 777 /opt/log-stack/data/loki
sudo chown -R 472:472 /opt/log-stack/data/grafana

Configuration Files

All configuration files are in this repository:

docker-compose.yml - Main stack definition with Kafka, Loki, Grafana, Fluent Bit
fluent-bit.conf - Multi-topic Kafka input with environment labels
loki-config.yml - Loki storage and retention settings
lua/set_level.lua - Lua script for log level detection and field cleanup

Important: Edit docker-compose.yml and replace 217.154.234.140 with your actual server IP.

Lua Script Processing

The set_level.lua script processes logs before storage:

Functions:

Log Level Detection: Automatically sets level field based on log content (error/warning/info)
Timestamp Normalization: Moves @timestamp to root level for Loki compatibility
Field Cleanup: Removes unnecessary Kubernetes metadata fields to reduce storage

Fields Removed:

kubernetes.docker_id
kubernetes.pod_ip
kubernetes.container_hash
stream, partition, offset, _p, topic

Example:

-- Input log
{
  "log": "ERROR: Database connection failed",
  "@timestamp": "2025-01-15T12:00:00Z",
  "kubernetes": {
    "docker_id": "abc123",
    "pod_name": "app-pod"
  }
}

-- Output after Lua processing
{
  "log": "ERROR: Database connection failed",
  "level": "error",
  "kubernetes": {
    "pod_name": "app-pod"
  }
}

Grafana OAuth (GitLab)

Grafana configured with GitLab authentication:

environment:
  - GF_AUTH_GITLAB_ENABLED=true
  - GF_AUTH_GITLAB_ALLOW_SIGN_UP=true
  - GF_AUTH_GITLAB_ALLOWED_DOMAINS=oddstech.net
  - GF_AUTH_GITLAB_ROLE_ATTRIBUTE_PATH=contains(groups[*], 'devops') && 'Admin' || 'Editor'
  - GF_SERVER_ROOT_URL=https://loki.oddstech.net

Access: https://loki.oddstech.net (GitLab OAuth)

Deploy

cd /opt/log-stack
docker compose up -d
docker compose ps

5. Verification

Check All Topics

# List topics
docker exec kafka kafka-topics.sh --list --bootstrap-server localhost:9092

# Check all consumer groups
docker exec kafka kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --list

# Check lag for specific environment
docker exec kafka kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --describe --group fluentbit-consumer-sport-stage

Grafana Queries

# All environments
{job="kafka_consumer"}

# Specific environment
{job="kafka_consumer", environment="sport-stage"}

# All sport environments
{job="kafka_consumer", environment=~"sport.*"}

# Errors across all environments
{job="kafka_consumer", level="error"}

# Count by environment (last 5 min)
sum by (environment) (rate({job="kafka_consumer"}[5m]))

6. CI/CD and Configuration Management

Why CI/CD?

✅ Zero Data Loss: Only changed services restart
✅ Validation: Configs tested before deployment (YAML, Lua syntax)
✅ Auditability: All changes tracked in Git
✅ Rollback: Easy revert if needed

Repository Structure

log-stack/
├── docker-compose.yml        # Multi-topic Kafka + services
├── fluent-bit.conf           # 6 Kafka inputs with environment labels
├── loki-config.yml
├── lua/
│   └── set_level.lua
├── ansible/
│   ├── playbook.yml          # Smart restart (down → up)
│   └── inventory.yml
├── .gitlab-ci.yml
└── .gitattributes            # Line ending normalization

Setup

Add SSH key to GitLab:
- Settings → CI/CD → Variables
- Key: SSH_PRIVATE_KEY
- Value: Your SSH private key (PEM format, base64 encoded)
- Flags: ✅ Protect variable
Edit ansible/inventory.yml:

   all:
     children:
       log_stack:
         hosts:
           log-server:
             ansible_host: YOUR_IP
             ansible_user: root

Workflow

# 1. Create feature branch
git checkout -b feature/add-new-environment

# 2. Edit config (e.g., fluent-bit.conf)
# Add new environment topic

# 3. Push and create MR
git add fluent-bit.conf
git commit -m "feat: Add logs-new-env topic"
git push origin feature/add-new-environment

# 4. Create MR → Pipeline validates YAML syntax
# 5. Merge to main → Ansible deploys → Only Fluent Bit restarts

Smart Restart Matrix

Change	Services Restarted	Data Loss
Kafka config in docker-compose.yml	✅ Kafka only	❌ No
Loki config in loki-config.yml	✅ Loki only	❌ No
Fluent Bit config in fluent-bit.conf	✅ Fluent Bit only	❌ No
Lua script in lua/set_level.lua	✅ Fluent Bit only	❌ No
Grafana config in docker-compose.yml	✅ Grafana only	❌ No
Kafka-init topics	✅ Kafka-init recreate	❌ No

Why No Data Loss?

Volumes persist in /opt/log-stack/data/
Smart restart uses down → up instead of restart
Kafka retains messages during Fluent Bit restart

7. Monitoring

# Data sizes
du -sh /opt/log-stack/data/{kafka,loki,grafana}

# Loki metrics
curl http://localhost:3100/metrics | grep loki_distributor_lines_received_total

# Check all consumer lag
for group in fluentbit-consumer fluentbit-consumer-staging fluentbit-consumer-prod-fr fluentbit-consumer-sport fluentbit-consumer-sport-stage fluentbit-consumer-sport-iframes; do
  docker exec kafka kafka-consumer-groups.sh \
    --bootstrap-server localhost:9092 \
    --describe --group $group
done

8. Troubleshooting

Kafka

docker logs kafka --tail=50
docker exec kafka kafka-topics.sh --list --bootstrap-server localhost:9092

# Check consumer groups
docker exec kafka kafka-consumer-groups.sh \
  --bootstrap-server localhost:9092 \
  --list

Fluent Bit

docker logs fluent-bit --tail=50

# Check Lua script execution
docker logs fluent-bit | grep -i lua

# Check environment label processing
docker logs fluent-bit | grep -i environment

Loki

curl http://localhost:3100/ready
docker logs loki --tail=50

Network

# Test external Kafka connectivity
telnet YOUR_PUBLIC_IP 19092

9. Summary

Pipeline: Kubernetes → Kafka (multi-topic) → Fluent Bit (+ Lua + environment labels) → Loki → Grafana

Key Features:

Multi-Environment: 6 separate Kafka topics for environment isolation
Environment Labels: Automatic labeling by environment in Loki
Kafka: 3 days retention, port 19092 for external producers
Loki: 7 days retention, 12x compression ratio
Lua Processing: Automatic log level detection and field cleanup
Grafana: GitLab OAuth authentication
Persistent Storage: /opt/log-stack/data/
CI/CD: GitLab + Ansible with smart restart logic (down → up)
Zero Downtime: Only changed services restart

Active Topics:

logs (prod), logs-staging, logs-prod-fr
logs-sport, logs-sport-stage, logs-sport-iframes

Access:

Grafana UI: https://loki.oddstech.net (GitLab OAuth)
All configuration tracked in Git for easy rollbacks

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ansible		ansible
lua		lua
.gitlab-ci.yml		.gitlab-ci.yml
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
fluent-bit.conf		fluent-bit.conf
loki-config.yml		loki-config.yml

Folders and files

Latest commit

History

Repository files navigation

Log Processing Pipeline Documentation (Loki + Kafka + Fluent Bit)

1. Overview

2. Architecture

3. Multi-Environment Setup

Kafka Topics

Producer Configuration (Kubernetes)

Consumer Configuration (Log Server)

4. Initial Deployment

Directory Structure

Configuration Files

Lua Script Processing

Grafana OAuth (GitLab)

Deploy

5. Verification

Check All Topics

Grafana Queries

6. CI/CD and Configuration Management

Why CI/CD?

Repository Structure

Setup

Workflow

Smart Restart Matrix

7. Monitoring

8. Troubleshooting

Kafka

Fluent Bit

Loki

Network

9. Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages