This document describes how to deploy a full log processing pipeline using:
- Kafka – message broker used as the ingestion buffer
- Fluent Bit – consumer of Kafka messages and forwarder to Loki
- Loki – log storage backend
- Grafana – visualization UI (with GitLab OAuth)
Promtail is not used in this setup. Instead, Fluent Bit consumes directly from Kafka and pushes logs to Loki.
Kubernetes → Fluent Bit → Kafka (buffer 3 days) → Fluent Bit → Loki (storage 7 days) → Grafana
↓ ↓
Multi-Topic Lua Processing
(per environment) + Environment Labels
Storage Efficiency: ~12:1 compression ratio (Kafka 1.6GB → Loki 210MB)
Six separate topics for environment isolation:
logs– Production (main cluster)logs-staging– Staging environmentlogs-prod-fr– Production Frankfurt clusterlogs-sport– Sport production clusterlogs-sport-stage– Sport staging clusterlogs-sport-iframes– Sport iframes cluster
Each cluster sends to its dedicated topic:
# Production
[OUTPUT]
Name kafka
Match kube.*
Brokers 217.154.234.140:19092
Topics logs
Format json
Retry_Limit no_limits
# Sport Staging
[OUTPUT]
Name kafka
Match kube.*
Brokers 217.154.234.140:19092
Topics logs-sport-stage
Format json
Retry_Limit no_limitsFluent Bit consumes from all topics with environment labels:
[INPUT]
Name kafka
Topics logs-sport-stage
Brokers kafka:9092
Group_Id fluentbit-consumer-sport-stage
Format json
Tag kafka.logs-sport-stage
[FILTER]
Name modify
Match kafka.logs-sport-stage
Add environment sport-stagemkdir -p /opt/log-stack/{data/{kafka,loki,grafana},lua}
cd /opt/log-stack
# Set proper permissions
sudo chown -R 1001:1001 /opt/log-stack/data/kafka
sudo chmod -R 777 /opt/log-stack/data/loki
sudo chown -R 472:472 /opt/log-stack/data/grafanaAll configuration files are in this repository:
docker-compose.yml- Main stack definition with Kafka, Loki, Grafana, Fluent Bitfluent-bit.conf- Multi-topic Kafka input with environment labelsloki-config.yml- Loki storage and retention settingslua/set_level.lua- Lua script for log level detection and field cleanup
Important: Edit docker-compose.yml and replace 217.154.234.140 with your actual server IP.
The set_level.lua script processes logs before storage:
Functions:
- Log Level Detection: Automatically sets
levelfield based on log content (error/warning/info) - Timestamp Normalization: Moves
@timestampto root level for Loki compatibility - Field Cleanup: Removes unnecessary Kubernetes metadata fields to reduce storage
Fields Removed:
kubernetes.docker_idkubernetes.pod_ipkubernetes.container_hashstream,partition,offset,_p,topic
Example:
-- Input log
{
"log": "ERROR: Database connection failed",
"@timestamp": "2025-01-15T12:00:00Z",
"kubernetes": {
"docker_id": "abc123",
"pod_name": "app-pod"
}
}
-- Output after Lua processing
{
"log": "ERROR: Database connection failed",
"level": "error",
"kubernetes": {
"pod_name": "app-pod"
}
}Grafana configured with GitLab authentication:
environment:
- GF_AUTH_GITLAB_ENABLED=true
- GF_AUTH_GITLAB_ALLOW_SIGN_UP=true
- GF_AUTH_GITLAB_ALLOWED_DOMAINS=oddstech.net
- GF_AUTH_GITLAB_ROLE_ATTRIBUTE_PATH=contains(groups[*], 'devops') && 'Admin' || 'Editor'
- GF_SERVER_ROOT_URL=https://loki.oddstech.netAccess: https://loki.oddstech.net (GitLab OAuth)
cd /opt/log-stack
docker compose up -d
docker compose ps# List topics
docker exec kafka kafka-topics.sh --list --bootstrap-server localhost:9092
# Check all consumer groups
docker exec kafka kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--list
# Check lag for specific environment
docker exec kafka kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--describe --group fluentbit-consumer-sport-stage# All environments
{job="kafka_consumer"}
# Specific environment
{job="kafka_consumer", environment="sport-stage"}
# All sport environments
{job="kafka_consumer", environment=~"sport.*"}
# Errors across all environments
{job="kafka_consumer", level="error"}
# Count by environment (last 5 min)
sum by (environment) (rate({job="kafka_consumer"}[5m]))
- ✅ Zero Data Loss: Only changed services restart
- ✅ Validation: Configs tested before deployment (YAML, Lua syntax)
- ✅ Auditability: All changes tracked in Git
- ✅ Rollback: Easy revert if needed
log-stack/
├── docker-compose.yml # Multi-topic Kafka + services
├── fluent-bit.conf # 6 Kafka inputs with environment labels
├── loki-config.yml
├── lua/
│ └── set_level.lua
├── ansible/
│ ├── playbook.yml # Smart restart (down → up)
│ └── inventory.yml
├── .gitlab-ci.yml
└── .gitattributes # Line ending normalization
-
Add SSH key to GitLab:
- Settings → CI/CD → Variables
- Key:
SSH_PRIVATE_KEY - Value: Your SSH private key (PEM format, base64 encoded)
- Flags: ✅ Protect variable
-
Edit
ansible/inventory.yml:
all:
children:
log_stack:
hosts:
log-server:
ansible_host: YOUR_IP
ansible_user: root# 1. Create feature branch
git checkout -b feature/add-new-environment
# 2. Edit config (e.g., fluent-bit.conf)
# Add new environment topic
# 3. Push and create MR
git add fluent-bit.conf
git commit -m "feat: Add logs-new-env topic"
git push origin feature/add-new-environment
# 4. Create MR → Pipeline validates YAML syntax
# 5. Merge to main → Ansible deploys → Only Fluent Bit restarts| Change | Services Restarted | Data Loss |
|---|---|---|
| Kafka config in docker-compose.yml | ✅ Kafka only | ❌ No |
| Loki config in loki-config.yml | ✅ Loki only | ❌ No |
| Fluent Bit config in fluent-bit.conf | ✅ Fluent Bit only | ❌ No |
| Lua script in lua/set_level.lua | ✅ Fluent Bit only | ❌ No |
| Grafana config in docker-compose.yml | ✅ Grafana only | ❌ No |
| Kafka-init topics | ✅ Kafka-init recreate | ❌ No |
Why No Data Loss?
- Volumes persist in
/opt/log-stack/data/ - Smart restart uses
down → upinstead ofrestart - Kafka retains messages during Fluent Bit restart
# Data sizes
du -sh /opt/log-stack/data/{kafka,loki,grafana}
# Loki metrics
curl http://localhost:3100/metrics | grep loki_distributor_lines_received_total
# Check all consumer lag
for group in fluentbit-consumer fluentbit-consumer-staging fluentbit-consumer-prod-fr fluentbit-consumer-sport fluentbit-consumer-sport-stage fluentbit-consumer-sport-iframes; do
docker exec kafka kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--describe --group $group
donedocker logs kafka --tail=50
docker exec kafka kafka-topics.sh --list --bootstrap-server localhost:9092
# Check consumer groups
docker exec kafka kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--listdocker logs fluent-bit --tail=50
# Check Lua script execution
docker logs fluent-bit | grep -i lua
# Check environment label processing
docker logs fluent-bit | grep -i environmentcurl http://localhost:3100/ready
docker logs loki --tail=50# Test external Kafka connectivity
telnet YOUR_PUBLIC_IP 19092Pipeline: Kubernetes → Kafka (multi-topic) → Fluent Bit (+ Lua + environment labels) → Loki → Grafana
Key Features:
- Multi-Environment: 6 separate Kafka topics for environment isolation
- Environment Labels: Automatic labeling by environment in Loki
- Kafka: 3 days retention, port 19092 for external producers
- Loki: 7 days retention, 12x compression ratio
- Lua Processing: Automatic log level detection and field cleanup
- Grafana: GitLab OAuth authentication
- Persistent Storage:
/opt/log-stack/data/ - CI/CD: GitLab + Ansible with smart restart logic (down → up)
- Zero Downtime: Only changed services restart
Active Topics:
logs(prod),logs-staging,logs-prod-frlogs-sport,logs-sport-stage,logs-sport-iframes
Access:
- Grafana UI: https://loki.oddstech.net (GitLab OAuth)
- All configuration tracked in Git for easy rollbacks