Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined log file import fails if user-agent is empty #681

Closed
ValdikSS opened this issue May 17, 2023 · 4 comments
Closed

Combined log file import fails if user-agent is empty #681

ValdikSS opened this issue May 17, 2023 · 4 comments

Comments

@ValdikSS
Copy link

ValdikSS commented May 17, 2023

goatcounter-v2.4.1 from releases page does not correctly process access log file of combined format if user-agent field is empty, as if without "-", but just "".

$ echo '1.1.1.1 - - [15/May/2023:00:00:54 +0000] "GET /proxy.pac HTTP/1.1" 200 133 "-" ""' > fail.txt

$ ./goatcounter-v2.4.1-linux-amd64 import -format combined -site http://127.0.0.1:80 fail.txt 

   19:22:31 ERROR: error processing line 1 {error="path: must be set, must be longer than 1 characters." line="1.1.1.1 - - [15/May/2023:00:00:54 +0000] \"GET /proxy.pac HTTP/1.1\" 200 133 \"-\" \"\"" lineno=1}
http://127.0.0.1:80: 400 Bad Request: {
  "errors": {
    "0": "path: must be set, must be longer than 1 characters.\n"
  }
}
@arp242
Copy link
Owner

arp242 commented May 17, 2023

The problem seems to be that the field is "", rather than "-" to indicate an empty value. It scans for .+? (that is, at least one character).

As far as I know, a - should always be used to indicate missing data, but none of this is a "real" standard.

Was this generated by some common(-ish) software?

@ValdikSS
Copy link
Author

ValdikSS commented May 17, 2023

Was this generated by some common(-ish) software?

That's a real access log file of nginx, from a real website. Such records are added when user-agent header is present but empty (user-agent: ).

@arp242
Copy link
Owner

arp242 commented May 17, 2023

I could reproduce it like this:

$ nc localhost 8090
GET / HTTP/1.1
User-Agent:

HTTP/1.1 400 Bad Request
...

Gave me:

127.0.0.1 - - [17/May/2023:22:04:40 +0200] "GET / HTTP/1.1" 400 157 "-" ""

The trick is to have User-Agent header with no value, rather than User-Agent being omitted (e.g. curl -H 'User-Agent: ' will simply not send any header).

@arp242
Copy link
Owner

arp242 commented May 17, 2023

Ah, it seems you edited your message. Well, we came to the same conclusion 😅

Mostly just wanted to verify it's not an issue with a custom script or the like. I'll get it fixed.

@arp242 arp242 closed this as completed in 312482b May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants