-
-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve UA parsing #66
Comments
uasurfer also doesn't seem that great; on a few test runs I got a lot of wrong data; see: 4143a04 |
Another possible project: https://github.com/ua-parser/uap-go |
I would seriously recommend still storing the full UA header/string. But maybe store it normalized. E.g. a reference from requests to ua table to save space. The UA table will probably end up fully cached. |
The problem here is a legal/ethical one @ptman, not a performance/space one. Storing the full User-Agent header makes it easier to identify persons based on the statistical data, and I'd like to make that harder when possible. Just "Firefox 72" is both useful and quite anonymous, but
It's ridiculous that they're sending this in the first place, but that's not something in my power to fix. I considered normalizing as well; for example we can probably get away with removing the data between parent ( So in short, I'm not 100% sure yet what the best solution is here yet. |
UA strings are useful for debugging and also for grouping different clients that ignore cookies. It's data sent willingly from the browser, not something you have to go digging around to extract. Operating systems can make a huge difference in browser behaviour. And it's something that by default ends up in httpd logs. I understand the desire for privacy, but I would just store the whole UA string. Especially since they have been tricky to parse in the past and can be tricky to parse in the future. |
Yeah, I appreciate there are advantages to storing it as well, which is why that is what GoatCounter is doing now. It's a bit of a tricky balancing act. Aside from that "the right thing" to do here, there is also the legal aspect to consider; the GDPR specifically mentions:
Does this cover these kind of User-Agent strings? Possibly.
I don't think most users have knowledge that the full device info and language is being sent. |
I'm not a GDPR lawyer, but UA strings are ok in logs, AFAIK. GDPR allows processing information for different purposes. One being consent. But logs aren't processed based on consent. It probably "for legitimate interests of data controller", i.e. technical maintenance, troubleshooting, debugging etc. One could argue that UA strings are an old technical debugging device that helps with maintenance. E.g. identifying scrapers etc. |
Yeah, maybe. I think with the lack of case law and inconsistent interpretations right now no one can really tell how it applies here for sure. |
https://groups.google.com/a/chromium.org/forum/m/#!msg/blink-dev/-2JIRNMWJ7s/yHe4tQNLCgAJ I was aware of client hints, but no idea things were going to move this fast... |
UA strings were never reliable and will not be very relevant in the near future. |
Just because it's not 100% reliable doesn't mean it's not useful. It's mostly accurate and gives a good indication of which browsers people are using, which is useful in making decisions about browser support and the like. I don't know what the future will hold. I know about Google's recently announced plans (linked above) but older browsers won't implement that, and it's especially useful to see if people are using older browsers. I suspect it will still be useful for several years to come. |
github.com/mssola/user_agent isn't always reliable. I took a look in to fixing it, but it's not so easy.
I just noticed there's also https://github.com/avct/uasurfer, which may give better results.
This would also allow storing just the calculated result ("Firefox 70.0") instead of the full UA string, which sometimes contains quite a lot of information.
The text was updated successfully, but these errors were encountered: