-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker-compose - reading gz logs + normal logs? #2333
Comments
Workaround:
|
Awesome, thanks a lot for sharing those findings. |
Hey, this does not work as I expected it to. It keeps accumulating logs and adding them together, instead of skipping ones already added. Eg. When this container entrypoint.sh runs again in 24 hours: 30 requests Not sure if I'm explaining this correctly, but it seems I'm somehow just compounding everything instead of ignoring existing logs in the db. |
And something seems very odd, since I'm doing it like this, and it just keeps incrementing the logs every time it runs (every 30 seconds for about ~+2500 requests), which makes no sense since the logs are test files which don't change in content. docker-compose.yml: goaccess:
image: allinurl/goaccess:latest
restart: unless-stopped
entrypoint: /entrypoint.sh
container_name: goaccess-test
hostname: goaccess-test
volumes:
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
- ./entrypoint.sh:/entrypoint.sh:ro
- ./read.sh:/read.sh:ro
- ./conf:/srv/conf:ro
- /mnt/traefik-logs/:/srv/traefik-logs/:ro
- data-test:/srv/data
- report-test:/srv/report global.conf:
read.sh: #!/usr/bin/env sh
cat /srv/traefik-logs/test.log
zcat /srv/traefik-logs/test.log.1.gz entrypoint.sh: #!/bin/sh
for service in global; do
if [ ! -d /srv/data/$service ]; then
mkdir /srv/data/$service
fi
if [ ! -d /srv/report/$service ]; then
mkdir /srv/report/$service
fi
case $service in
global)
sh read.sh | goaccess -p /srv/conf/default.conf --db-path /srv/data/global -o /srv/report/global/index.html -
;;
esac
done
sleep 30s |
So a simplified version of what I'm looking for is:
Is this doable? |
I can't seem to figure out a way to juggle gz + regular files, without missing something in the report. If I first read all gz files and persist them, If I first read all gz files + access.log and persist them Ah, so while writing this a solution came to my mind: |
Ugh, I busted my balls, but this might be a decent solution: docker-compose.yml: version: "3"
volumes:
data:
report:
dangling-volume:
services:
goaccess:
image: allinurl/goaccess:latest
restart: unless-stopped
entrypoint: /entrypoint.sh
container_name: goaccess
hostname: goaccess
ports:
- "20000:20000"
- "20001:20001"
# etc...
volumes:
- ./read.sh:/read.sh:ro
- ./conf:/srv/conf:ro
- /traefik-logs/:/srv/traefik-logs/:ro
- data:/srv/data
- report:/srv/report
- dangling-volume:/var/www/goaccess
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
- ./entrypoint.sh:/entrypoint.sh:ro read.sh: #!/usr/bin/env sh
#for file in $(find "$log_dir" -type f -name "access.log.*.gz" -printf "%T+\t%p\n" | sort | awk '{print $2}'); do # no -printf on busybox wtf
for file in $(find "$log_dir" -type f -name "access.log*.gz" -exec stat -c "%Y %n" {} \; | sort | awk '{print $2}'); do
zcat "$file"
done
cat "$log_dir"/access.log entrypoint.sh: #!/bin/sh
## find out how to persist gz logs and continue reading new ones
# first read gz, persist
# then restore+persist with tail -n0 regular log file
export db_dir="/srv/data"
export report_dir="/srv/report"
export conf_dir="/srv/conf"
export log_dir="/srv/traefik-logs"
initialize=0
#sleep 1h
query_logs () {
local service=$1
local config=$2
local query=$3
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] Initializing $service: $config config: query - \"$query\""
if [ -z "$query" ]; then
sh /read.sh | goaccess \
--config-file=$conf_dir/$config.conf \
--db-path=$db_dir/$service \
--output=$report_dir/$service/index.html - > /dev/null 2>&1
else
sh /read.sh | egrep "$query" | goaccess \
--config-file=$conf_dir/$config.conf \
--db-path=$db_dir/$service \
--output=$report_dir/$service/index.html - > /dev/null 2>&1
fi
initialize=0
}
start_live_query() {
local service=$1
local config=$2
local query=$3
local port=$4
localdomain=your.domain
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] Starting WSS for $service, on port $port"
echo
if [ -z "$query" ]; then
tail -f -n0 $log_dir/access.log | goaccess \
--ws-url=wss://$service.$localdomain:443 --port=$port --real-time-html --restore \
--config-file=$conf_dir/$config.conf \
--db-path=$db_dir/$service \
--output=$report_dir/$service/index.html - > /dev/null 2>&1 &
else
tail -f -n0 $log_dir/access.log | egrep "$query" | goaccess \
--ws-url=wss://$service.$localdomain:443 --port=$port --real-time-html --restore \
--config-file=$conf_dir/$config.conf \
--db-path=$db_dir/$service \
--output=$report_dir/$service/index.html - > /dev/null 2>&1 &
fi
}
for service in global service1; do # service2 etc....
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [+] Configuring $service"
if [ ! -d "$db_dir"/$service ]; then
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] Didn't find "$db_dir"/$service. Creating..."
mkdir "$db_dir"/$service
initialize=1
fi
if [ ! -d "$report_dir"/$service ]; then
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] Didn't find "$report_dir"/$service. Creating..."
mkdir "$report_dir"/$service
initialize=1
fi
# if the container was down, there will be missed logs since dirs exist
# and the container missed to live parse/save actual logs, that is okay because it won't change the statistics much
# a clean run will fix those ^ issues, maybe once a month backup/delete volumes and restart the stack?
case "$service" in
global)
if [ "$initialize" -eq 1 ]; then
query_logs $service global ""
fi
start_live_query $service global "" 20000
;;
service1)
if [ "$initialize" -eq 1 ]; then
query_logs $service default "traefik-router-query-for-service-1"
fi
start_live_query $service default "traefik-router-query-for-service-1" 20001
;;
#service2)
#;; # etc....
esac
done
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [+] All done, now sleeping till the end of time.."
sleep infinity |
Like you said, I'd read first the gz files and persist them, then I'd simply run the uncompressed access log directly with goaccess using |
Hey @allinurl, So that's exactly what I did, I just forgot to paste my config file. Basically in both cases goaccess should persist, be it reading old gz logs (query_logs function), or running in live mode and reading the current day log (start_live_query function), though it also restores data when running in live mode, which is configured with the cli flag.
Except there's one issue with this approach: goaccess doesn't know when logrotate rotates the current log, so it doesn't know to start reading the new file. Basically from what I posted above
And since my docker machine is not the machine running traefik, I'll have to figure out a way to get around this. My initial thoughts were to simply restart the goaccess stack at a certain time, though I'm not sure if logrotate runs at exactly the same time, every time... |
Do you know if you are truncating the log upon rotation? unless |
Here's my logrotate.d/traefik file:
The thing is, the log files are not within FS boundaries, because the entire log directory is mounted via nfs. |
That could be an issue. I'd run a test by rotating the log manually and see if the inode changes. e.g.,
|
$ /bin/ls -lathi access.log
392838 -rw-r----- 1 321 321 13M Dec 6 11:17 access.log
$ logrotate -f /etc/logrotate.d/traefik
$ /bin/ls -lathi access.log
391327 -rw-r----- 1 321 321 2.9K Dec 6 11:21 access.log Uhm Houston, we have a problem. The inode changed on the traefik machine, where the log is located and rotated. (Meaning the inode also changed on the goaccess machine - they match) I don't think sharing the log directory with the goaccess machine via nfs presents an issue here. This seems to be something logrotate related, or perhaps traefik doing something weird with that USR1 signal, which according to their documentation should be used. https://doc.traefik.io/traefik/observability/access-logs/#log-rotation |
It seems I didn't quite understand your previous comment. Adding |
Yep, you want to preserve the inode number so goaccess knows where to start again. Glad that did the job. Closing this. Feel free to reopen it if needed. |
Here's the latest version of my entrypoint: #!/bin/sh
# first read gz, persist
# then restore+persist and tail regular access log
export db_dir="/srv/data"
export report_dir="/srv/report"
export conf_dir="/srv/conf"
export log_dir="/srv/traefik-logs"
initialize=0
query_logs () {
local service=$1
local config=$2
local query=$3
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] Querying existing logs for: $service"
if [ -z "$query" ]; then
sh /read.sh | goaccess \
--config-file=$conf_dir/$config.conf \
--db-path=$db_dir/$service \
--output=$report_dir/$service/index.html - > /dev/null 2>&1
else
sh /read.sh | egrep "$query" | goaccess \
--config-file=$conf_dir/$config.conf \
--db-path=$db_dir/$service \
--output=$report_dir/$service/index.html - > /dev/null 2>&1
fi
initialize=0
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [+] Query complete"
}
start_live_query() {
local service=$1
local config=$2
local query=$3
local port=$4
localdomain=your.domain
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] Starting WSS for $service, on port $port"
if [ -z "$query" ]; then
tail -f -n0 $log_dir/access.log | goaccess \
--ws-url=wss://$service.$localdomain:443 --port=$port --real-time-html --restore \
--config-file=$conf_dir/$config.conf \
--db-path=$db_dir/$service \
--output=$report_dir/$service/index.html - > /dev/null 2>&1 &
else
tail -f -n0 $log_dir/access.log | egrep "${query}" | goaccess \
--ws-url=wss://$service.$localdomain:443 --port=$port --real-time-html --restore \
--config-file=$conf_dir/$config.conf \
--db-path=$db_dir/$service \
--output=$report_dir/$service/index.html - > /dev/null 2>&1 &
fi
echo "$(date +"[%Y-%m-%d %H:%M:%S]") --------------------------------------------"
}
initialize_router() {
service=$1
config=$2
query=$3
port=$4
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [+] Initializing $service statistics"
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] GoAccess config: \"$config\""
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] Regex Query: \"$query\""
if [ "$initialize" -eq 1 ]; then
query_logs ${service} ${config} ${query}
fi
start_live_query ${service} ${config} ${query} ${port}
}
find "$report_dir" -maxdepth 1 -type f -name "*.html" -exec rm {} +
for service in global your-service-1 your-service-2; do
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [+] Configuring $service"
if [ ! -d "$db_dir"/$service ]; then
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] Creating "$db_dir"/$service."
mkdir "$db_dir"/$service
initialize=1
fi
if [ ! -d "$report_dir"/$service ]; then
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [*] Creating "$report_dir"/$service."
mkdir "$report_dir"/$service
initialize=1
fi
case "$service" in
# internal services, api/synapse-admin/mailcow/
global)
router='(int|ext)-router' # supports basic "or" regex, but not ".*" wildcards
config=global
port=20000
;;
your-service-1)
router='service-1-traefik-router-name'
config=default
port=20001
;;
your-service-2)
router='service-2-(traefik-)?-router-name'
config=default
port=20002
;;
esac
initialize_router ${service} ${config} ${router} ${port}
done
echo "$(date +"[%Y-%m-%d %H:%M:%S]") [+] All done, now sleeping till the end of time.."
sleep infinity |
Awesome, thanks for sharing that! |
Hello,
I'm trying to read
.gz
and regular traefik logs while running goaccess inside docker-compose, but I can't seem to figure out a way to accomplish this.docker-compose.yml
So the idea is -
goaccess.conf
:Doing the following commands on my debian desktop gives me 2M total requests. First command yields 1.1M, the second one adds another 0.9M.
This doesn't make sense for the following reason - All logs should be around 5.69M requests. Unfortunately when goaccess parses only
.gz
logs (first command), the number of requests shown on the output report is 1.1M. Strangely when I do the second command, which should only parseaccess.log
, it adds another 0.9M, instead of 0.57M - Which is the actual amount of requests inaccess.log
Commenting out
#real-time-html true
, parses the gz logs correctly for some reason: (5.7M)Yeah and same for the regular
access.log
(0.57M)Man, fuck this flag I'm legit busting my balls for the past 6 hours testing and documenting all this just for it to work now. Fuck.
Running
![image](https://user-images.githubusercontent.com/59068073/171732893-b3a34447-898c-466d-b5ff-d46179187d19.png)
docker-compose up -d
with the first command (which should yield the same output as on my host) only parses 79k requests, which makes absolutely 0 sense. You'd think it perhaps parsed a single log, but that doesn't seem to be the case.All in all, I'm simply trying to read week old logs which are in gz format, just to save them to the goaccess db, and after that keep goaccess running as a container and keep parsing logs from
access.log
regularly, though I also don't understand how often does it automatically re-parse the current file for updates?And also, how would this be best implemented with
logrotate
, sinceaccess.log
rotates daily. Perhaps force anaccess.log
parse before rotating?The text was updated successfully, but these errors were encountered: