-
Notifications
You must be signed in to change notification settings - Fork 0
Config class and S3 clients #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: * The application needs a class to manage config values and a stream variable to store the application logs for an email sent at the end of each run. How this addresses that need: * Add Config class with corresponding unit tests and fixtures for the methods * Update cli.py with calls to Config methods and stream variable * Update README.md * Update pyproject.toml Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1108
Why these changes are being introduced: * An S3 client is needed for submission metadata and content files How this addresses that need: * Add S3Client class with corresponding unit tests and fixtures for the methods * Update dependencies Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1099
| message = f"Missing required environment variables: {', '.join(missing_vars)}" | ||
| raise OSError(message) | ||
|
|
||
| def configure_logger( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a combination of configure_logger from wiley-deposits and our template config code, there's some repetition and I have a gut feeling there's a better way to do this but didn't land on anything specific, curious about your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a possible refactor, removing some redundancies:
def configure_logger(
self, logger: logging.Logger, stream: StringIO, *, verbose: bool
) -> str:
logging_format_base = "%(asctime)s %(levelname)s %(name)s.%(funcName)s()"
logger.addHandler(logging.StreamHandler(stream))
if verbose:
log_method, log_level = logger.debug, logging.DEBUG
template = logging_format_base + " line %(lineno)d: %(message)s"
for handler in logging.root.handlers:
handler.addFilter(logging.Filter("dsc"))
else:
log_method, log_level = logger.info, logging.INFO
template = logging_format_base + ": %(message)s"
logger.setLevel(log_level)
logging.basicConfig(format=template)
logger.addHandler(logging.StreamHandler(stream))
log_method(f"{logging.getLevelName(logger.getEffectiveLevel())}")
return (
f"Logger '{logger.name}' configured with level="
f"{logging.getLevelName(logger.getEffectiveLevel())}"
)At a glance, what makes this hard to refactor are the tests:
test_configure_logger_not_verbosetest_configure_logger_verbose
which attempt to read from the stream handler that is setup. That's what the log_method, log_level = do above, is get the appropriate logging method to use for log_method(f"{logging.getLevelName(logger.getEffectiveLevel())}") which satisfies these tests.
My feeling: good enough for now, but globally I think we could improve our logging setups. The presence of logging.basicConfig() is kind of a red flag unto itself. But, I think this is mostly consistent with other projects, so value in that.
For example: I'm not sure that returning a string is ideal. Perhaps this method could just log what it has done, and return None. But this would require updating the CLI, tests, etc. that are expecting a string response. And as mentioned, this is consistent with other apps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need review this at a deeper level but figured I'd pass along a discussion @ghukill and I had on a previous PR I worked on re: excluding configure_sentry and configure_logger from the Config class:
From @ghukill :
- I like having a Config class that I can put complex behavior on for static-esque values I want to access through the application
- I'm indifferent to configure_logger and configure_sentry on there / they feel somewhat disconnected from the essence of the application, which I attribute to the Config class. Especially sentry. Logging is a little more difficult for me... but then again logging is always kind of difficult. \ Of course, if those methods (
configure_sentryandconfigure_logger) are on it, they don't need to be called.... but if I'm not supposed to call configure_sentry from anywhere in the app, then maybe it doesn't belong on that global Config object?
This is 5:00pm talking, no much reasoning, but just my knee-jerk thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great refactor, thanks! I was hoping this would prompt a larger conversation about the template/config class because I think it needs work as well, maybe a future DataEng meeting topic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a future DataEng topic could be good, for sure. But I think it could/should extend longer than 30 minute meeting w/ CB.
I have two somewhat conflicting feelings at the moment re: config and logging:
- it's okay for projects to tweak our established approach as needed
- it's okay if slightly awkward or sub-optimal insofar as it's consistent with other repos
I think it'd be easy for any one project to grind to a halt trying to get it perfect, when we'll always have tension with, "but others don't do it that way..."
Until we do take a long and thoughtful look at our config + logging approach for python projects, I'm pretty okay with allowing projects to try and strike a balance between feelings #1 and #2 above.
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments and a suggestion for a docstring update, but none blocking! Looking like good foundation until the more interesting bits come.
| message = f"Missing required environment variables: {', '.join(missing_vars)}" | ||
| raise OSError(message) | ||
|
|
||
| def configure_logger( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a possible refactor, removing some redundancies:
def configure_logger(
self, logger: logging.Logger, stream: StringIO, *, verbose: bool
) -> str:
logging_format_base = "%(asctime)s %(levelname)s %(name)s.%(funcName)s()"
logger.addHandler(logging.StreamHandler(stream))
if verbose:
log_method, log_level = logger.debug, logging.DEBUG
template = logging_format_base + " line %(lineno)d: %(message)s"
for handler in logging.root.handlers:
handler.addFilter(logging.Filter("dsc"))
else:
log_method, log_level = logger.info, logging.INFO
template = logging_format_base + ": %(message)s"
logger.setLevel(log_level)
logging.basicConfig(format=template)
logger.addHandler(logging.StreamHandler(stream))
log_method(f"{logging.getLevelName(logger.getEffectiveLevel())}")
return (
f"Logger '{logger.name}' configured with level="
f"{logging.getLevelName(logger.getEffectiveLevel())}"
)At a glance, what makes this hard to refactor are the tests:
test_configure_logger_not_verbosetest_configure_logger_verbose
which attempt to read from the stream handler that is setup. That's what the log_method, log_level = do above, is get the appropriate logging method to use for log_method(f"{logging.getLevelName(logger.getEffectiveLevel())}") which satisfies these tests.
My feeling: good enough for now, but globally I think we could improve our logging setups. The presence of logging.basicConfig() is kind of a red flag unto itself. But, I think this is mostly consistent with other projects, so value in that.
For example: I'm not sure that returning a string is ideal. Perhaps this method could just log what it has done, and return None. But this would require updating the CLI, tests, etc. that are expecting a string response. And as mentioned, this is consistent with other apps.
jonavellecuerdo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple small change requests, but it's looking good! Will inspect at a deeper level via second pass.
| message = f"Missing required environment variables: {', '.join(missing_vars)}" | ||
| raise OSError(message) | ||
|
|
||
| def configure_logger( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need review this at a deeper level but figured I'd pass along a discussion @ghukill and I had on a previous PR I worked on re: excluding configure_sentry and configure_logger from the Config class:
From @ghukill :
- I like having a Config class that I can put complex behavior on for static-esque values I want to access through the application
- I'm indifferent to configure_logger and configure_sentry on there / they feel somewhat disconnected from the essence of the application, which I attribute to the Config class. Especially sentry. Logging is a little more difficult for me... but then again logging is always kind of difficult. \ Of course, if those methods (
configure_sentryandconfigure_logger) are on it, they don't need to be called.... but if I'm not supposed to call configure_sentry from anywhere in the app, then maybe it doesn't belong on that global Config object?
This is 5:00pm talking, no much reasoning, but just my knee-jerk thinking.
| def __init__(self) -> None: | ||
| self.client = client("s3") | ||
|
|
||
| def archive_file_with_new_key( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious whether archiving the metadata file will be required for workflows other than Wiley deposits? 🤔 If an archive/ folder will exist for every workflow bucket, I think the function name is fine as-is, otherwise, it might be good to rename the function so it is more universal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe so, we probably want an archived/processed path in all of the workflows to prevent re-processing of files
* Refactor Config.configure_logger method * Rename retrieve_file_type_from_bucket > get_files_iter and refactor method
Pull Request Test Coverage Report for Build 12202770589Details
💛 - Coveralls |
* Add Config class Why these changes are being introduced: * The application needs a class to manage config values and a stream variable to store the application logs for an email sent at the end of each run. How this addresses that need: * Add Config class with corresponding unit tests and fixtures for the methods * Update cli.py with calls to Config methods and stream variable * Update README.md * Update pyproject.toml Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1108 * Add S3 client Why these changes are being introduced: * An S3 client is needed for submission metadata and content files How this addresses that need: * Add S3Client class with corresponding unit tests and fixtures for the methods * Update dependencies Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1099 * Updates based on discussion in PR #6 * Refactor Config.configure_logger method * Rename retrieve_file_type_from_bucket > get_files_iter and refactor method
Purpose and background context
This code is largely copied over from
wiley-depositswith some minor changes to align it with our current coding practices.Note: Most of the line changes are from
Pipfile.lockHow can a reviewer manually see the effects of these changes?
Not possible yet
Includes new or updated dependencies?
YES
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)