Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing stopps at first empty logfiles #1683

Open
gerhard-tinned opened this issue Feb 26, 2020 · 4 comments
Open

parsing stopps at first empty logfiles #1683

gerhard-tinned opened this issue Feb 26, 2020 · 4 comments

Comments

@gerhard-tinned
Copy link

@gerhard-tinned gerhard-tinned commented Feb 26, 2020

I am using goaccess version 1.2 which I knowis not the latest version. Still I have seen no issue about this, not even a closed one. I was wondering if I do something wrong.

I use goaccess to generate a summary of all current logs of all my webhosts like this.

goaccess -f /home/webhost1/log/*access*.log /home/webhost2/log/*access*.log -o /path/to/report.html --date-spec=hr --hour-spec=min

I noticed after log rotate was triggered (and some logs stay empty for a while) that the processing stopped at the first empty log file processed. The output shows the empty logfile name - nothing more, no error.

When I use find with the -not -emty parameter to filter out the empt files, it works as expected. If there are no empty log files passed via the -f arhgument, it works as expected.

I know version 1.2 is not the latest. I was wondering if this still exists in version 1.3.

@allinurl

This comment has been minimized.

Copy link
Owner

@allinurl allinurl commented Feb 28, 2020

Interesting, I can't tell you if this is an issue on v1.3 until I can run some sort of test. Are you able to reproduce this without the auto rotation? Thanks

@gerhard-tinned

This comment has been minimized.

Copy link
Author

@gerhard-tinned gerhard-tinned commented Feb 28, 2020

Short Answer, I can trigger that behaviour.

Long one, ... My Setup looks like that. I have a very simple shell script to generate the goaccess html reports. The procedure looks like that:

1.) I go through all the webhosts and define the path to the log files like that:
"/path/to/$WEBHOST/logs/.access.log-"
That pattern with "log-
" at the end only matches already rotated logs.

2.) I use that pattern to start goaccess with "-o /.../hist.html", --db-path, --keep-db-files and --load-from-disk
From my understanding, this will read the db into goaccess, load the log files, and keeps the db files. At this point I figured out that you have done a GREAT job identifying already loaded files. That reduces the complexity in my simple shell script.

3.) I get the file path to the current log files (the not jet rotated ones like that:
"/path/to/$WEBHOST/logs/*.access.*log"
This matches only the current unrotated log. As this log files is changing on every request, I cannot load it into the same database as data would be loaded multiple times.

4.) So I generate a separate report for them using the "-o /.../index.html", --date-spec=hr and --hour-spec=min
I do not keep or load the db as it would only show garbage. I use --date-spec=hr and --hour-spec=min to see more details for the current logfile report.

The Problem was noticed when I created the report for the current logfile using

goaccess -f /../*access.*.log /../*access.*.log -o /../index.html --log-format=COMBINED --date-spec=hr --hour-spec=min

One of the files referenced in the -f parameter was an empty file (filesize 0 byte). goaccess starts, prints the log file name to the console and ends without a real error message. But what I noticed was that the report file (/../index.html) was not even created. I investigated and found that the empty file triggered the termination of the processing.

Current workaround:

When I use find to interpret the path/file pattern and remove empty files, the report file is created and works as expected.

find /../*access*.log -not -empty

I could test this quiet extensive as it happened on every run (have a webhost with nearly no access at all (reason does not matter I guess).

I guess with that details it should be easy to reproduce if the issue still exists in version 1.3, right?

@gerhard-tinned

This comment has been minimized.

Copy link
Author

@gerhard-tinned gerhard-tinned commented Mar 2, 2020

Reproducible with following test procedure using goaccess 1.2:

  1. placed two log files into a directory. both contain some apache access log lines.
  2. Run goaccess to verify correct behaviour -> OK
    goaccess -f *.log -o working.html  --log-format=COMBINED --date-spec=hr --hour-spec=min
    
  3. create an empty file matching the input file pattern
    touch some.webserver.log
    
  4. Run the goaccess command again with different out filename.
    goaccess -f *.log -o failed.html  --log-format=COMBINED --date-spec=hr --hour-spec=min
    

The result is one generated file called "working.html" but no file named "failed.html" is created. The second run produces the output "some.webserver.log]" before goaccess terminates.

This was the test using version 1.2. As Centos 7 still uses 1.2, I am still using version 1.2 of GoAccess.

@allinurl

This comment has been minimized.

Copy link
Owner

@allinurl allinurl commented Mar 2, 2020

Thanks for sharing those details, let me look into this and I'll post back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.