Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added uuidv4 and now() in RFC3339 UTC Format #1645

Merged
merged 44 commits into from
Oct 20, 2017
Merged

Conversation

spagno
Copy link
Contributor

@spagno spagno commented Oct 17, 2017

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    feature

  • What is the related issue number (starting with #)

  • What is the current behavior? (You can also link to an open issue here)
    There isn't the possibility to configure acces_log_format with uuidv4 and timestamp in rfc3339 format

  • What is the new behavior (if this is a feature change)?
    You can use "%(i)s" to have a unique request_id in the log
    You can use "%(z)s" to have a UTC rfc3339 timestamp in the log

  • Other information:
    first PR ever, if something is wrong please don't be mean 🗡️

@spagno spagno closed this Oct 17, 2017
@webknjaz
Copy link
Member

Hi @spagno,

What did you try to do here? What went wrong? Do you need any assistance with this?

@webknjaz
Copy link
Member

webknjaz commented Oct 17, 2017

If you're concerned about broken linters, you just need to fix this:
https://travis-ci.org/cherrypy/cherrypy/jobs/289121270#L715

And then pushing new commit to the branch should fix that. You'll need to reopen this PR in such case.

@spagno
Copy link
Contributor Author

spagno commented Oct 17, 2017

thank you very much, I'm trying to understand why I broke that

@spagno
Copy link
Contributor Author

spagno commented Oct 17, 2017

Understood, I'll reopen the PR

@spagno spagno reopened this Oct 17, 2017
@codecov
Copy link

codecov bot commented Oct 17, 2017

Codecov Report

Merging #1645 into master will increase coverage by 0.57%.
The diff coverage is 89.28%.

@@            Coverage Diff             @@
##           master    #1645      +/-   ##
==========================================
+ Coverage   76.99%   77.57%   +0.57%     
==========================================
  Files         106      106              
  Lines       14313    14365      +52     
==========================================
+ Hits        11021    11143     +122     
+ Misses       3292     3222      -70

@webknjaz
Copy link
Member

Oh, just follow the link and you'll see:

Fix double quoted strings................................................Failed
hookid: double-quote-string-fixer
Files were modified by this hook. Additional output:
Fixing strings in cherrypy/_cplogging.py

Which means that you used double quotes for strings, while the convention is to use single quotes.

You also can pip install pre-commit locally + do pre-commit run --all-files manually if needed :)

@webknjaz
Copy link
Member

@spagno may I expect any tests for this additionally?

@spagno
Copy link
Contributor Author

spagno commented Oct 17, 2017

@webknjaz yes, I'm working on it. Just don't know if test all parameters (creating a new testing def) or just the new ones

@webknjaz
Copy link
Member

@spagno both cases are fine.
I'd be thankful for testing just your changes, but if you're able to cover more — that would be fantastic :)

@spagno
Copy link
Contributor Author

spagno commented Oct 17, 2017

@webknjaz well I'll try!

@spagno
Copy link
Contributor Author

spagno commented Oct 18, 2017

seems like python in windows manages log differently from linux; can't run appropriate tests on windows, I have to delete the tests (for now) and reimplement them

@cherrypy cherrypy deleted a comment Oct 18, 2017
@cherrypy cherrypy deleted a comment Oct 18, 2017
@cherrypy cherrypy deleted a comment Oct 18, 2017
@cherrypy cherrypy deleted a comment Oct 18, 2017
@cherrypy cherrypy deleted a comment Oct 18, 2017
@cherrypy cherrypy deleted a comment Oct 18, 2017

import cherrypy
import six
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't move it here, as it's a third-party. It must remain between stdlib imports and the project imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, it was the visualstudio code import sorter, I'll not use it anymore

@@ -216,7 +215,8 @@ def error(self, msg='', context='', severity=logging.INFO,
if traceback:
exc_info = _cperror._exc_info()

self.error_log.log(severity, ' '.join((self.time(), context, msg)), exc_info=exc_info)
self.error_log.log(severity, ' '.join(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it's nicer to split lines between arguments of log() method, not inside argument expressions.
Also please not that we follow 120-chars line restrictions (there's no 80-char terminals nowadays), so you don't really have to split it.

@@ -262,6 +262,8 @@ def access(self):
'f': dict.get(inheaders, 'Referer', ''),
'a': dict.get(inheaders, 'User-Agent', ''),
'o': dict.get(inheaders, 'Host', '-'),
'i': str(uuid.uuid4()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly do you need a new uuid for each access call shouldn't it be per-request?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because if you want to set different logformat with cherrypy._cplogging.LogManager.access_log_format(format)
if you want to have unique uuid for each request I thougth that could be the right place to set the uuid.
Use case:
I'm developing a simple semaphore dashboard for our CI; I want that the dashboard logs would be compliant with our best practices: in my case the configuration would be:
cherrypy._cplogging.LogManager.access_log_format('{"request_id":"%(i)s", "user-agent":"%(a)s", "timestamp": "%(z)s", "referrer":"%(f)s", "status":%(s)s, "remote_ip":"%(h)s"}')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that it would call uuid.uuid4() every time access() is called if I remember correctly, meaning that there can be several different UUIDs generated per request if you'd call access() more then once. Right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've explored prior art and found out that @openstack uses %(uuid)s format string for this, so I'd like to follow the same convention.
Do you know any project using %(i)s for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, uuid is generally custom (for example nginx) and in some cases it isn't supported and you have to do workaround for it (in haproxy you have to use a lua script for that). I decided to use "i" because there other atoms are single char, to mantain a sort of "convention"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(applies to both %(i)s and %(z)s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, for that reason I added only the atoms without touching the default access_log_format which is in standard Apache format

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they are still not lazy, meaning we waste CPU time calculating them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, I'll fix it, thanks for the tips and sorry for the poor start quality: like I said it's my first public PR and I'm a sysadmin not a developer :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries @spagno, I'm not judging. I'm here to help and walk you through the process :) You're doing great!

@@ -262,6 +262,8 @@ def access(self):
'f': dict.get(inheaders, 'Referer', ''),
'a': dict.get(inheaders, 'User-Agent', ''),
'o': dict.get(inheaders, 'Host', '-'),
'i': str(uuid.uuid4()),
'z': self.time_z(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about exposing these atoms to error log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I'll check how to do it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the code and I'm in a little in trouble: in the error_log the atoms aren't used at all, is that right? If it's right I think that adding this support could be out of scope of this PR and I can work on that with a different PR if there's the interest about that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, supporting these atoms we allow users use them if they want to change that error_log format template. Okay, you can work on this separately.

@@ -4,7 +4,6 @@
import time

import six

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, there must be separation between stdlib and project imports

@webknjaz
Copy link
Member

webknjaz commented Oct 18, 2017

@spagno please don't remove tests w/o review as it's harder for me to find them in history and nearly impossible to comment in a nice way.

I've looked at the failed test and it turns out that your assertValidUUIDv4 helper has this data var:

data = [b'127.0.0.1 - - [18/Oct/2017:01:46:50] "GET /as_yield HTTP/1.1" 200 7 "" ""\r\n', b"b'test suite marker: '1508291210....RER" "USERAGENT" HOST\r\n', b"b'test suite marker: '1508291210.6715672\n", b'21afb35b-518c-4c20-a745-c6c071ba82c8\r\n']

And then at the first iteration of the loop you throw an error, while the UUID is the 3rd element.

If what I see in log is correct, you could improve it via smth like this:

...
data = [chunk.decode('utf-8').rstrip('\n').rstrip('\r') for chunk in data]
for log_chunk in data:
    try:
        uuid_log = data[-1]
        uuid_obj = UUID(uuid_log, version=4)
    except (TypeError, ValueError):
        msg = '%r not a valid UUIDv4' % uuid_log
        self._handleLogError(msg, data, marker, lines)
    else:
        if str(uuid_obj) != uuid_log:
            msg = '%r not a valid UUIDv4' % uuid_log
            self._handleLogError(msg, data, marker, lines)

@@ -312,6 +314,11 @@ def time(self):
return ('[%02d/%s/%04d:%02d:%02d:%02d]' %
(now.day, month, now.year, now.hour, now.minute, now.second))

def time_z(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spagno plz turn this into @staticmethod

@spagno spagno force-pushed the master branch 3 times, most recently from 1cc3046 to d60b61f Compare October 18, 2017 14:50

import cherrypy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line

from cherrypy import _cperror
from cherrypy._cpcompat import ntob


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore this line


import cherrypy
from cherrypy import _cperror
from cherrypy._cpcompat import ntob


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore this line

@webknjaz
Copy link
Member

@spagno I've pushed improvements and everything is likely to work well for now. Still, I'd like to better test time_z (maybe use spy object for that).

@webknjaz webknjaz merged commit 122a484 into cherrypy:master Oct 20, 2017
webknjaz added a commit that referenced this pull request Oct 20, 2017
PR #1645 by @spagno:

* ``i`` holds a per-request UUID4
* ``z`` outputs UTC time in format of RFC 3339
* ``cherrypy._cprequest.Request.unique_id.uuid4`` now has lazily
  invocable UUID4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants