Skip to content

Conversation

@ehanson8
Copy link
Contributor

@ehanson8 ehanson8 commented Dec 5, 2024

Purpose and background context

This code is largely copied over from wiley-deposits with some minor changes to align it with our current coding practices.

Note: Most of the line changes are from Pipfile.lock

How can a reviewer manually see the effects of these changes?

Not possible yet

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* The application needs a class to manage config values and a stream variable to store the application logs for an email sent at the end of each run.

How this addresses that need:
* Add Config class with corresponding unit tests and fixtures for the methods
* Update cli.py with calls to Config methods and stream variable
* Update README.md
* Update pyproject.toml

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1108
Why these changes are being introduced:
* An S3 client is needed for submission metadata and content files

How this addresses that need:
* Add S3Client class with corresponding unit tests and fixtures for the methods
* Update dependencies

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1099
message = f"Missing required environment variables: {', '.join(missing_vars)}"
raise OSError(message)

def configure_logger(
Copy link
Contributor Author

@ehanson8 ehanson8 Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a combination of configure_logger from wiley-deposits and our template config code, there's some repetition and I have a gut feeling there's a better way to do this but didn't land on anything specific, curious about your thoughts

Copy link

@ghukill ghukill Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a possible refactor, removing some redundancies:

def configure_logger(
    self, logger: logging.Logger, stream: StringIO, *, verbose: bool
) -> str:
    logging_format_base = "%(asctime)s %(levelname)s %(name)s.%(funcName)s()"
    logger.addHandler(logging.StreamHandler(stream))

    if verbose:
        log_method, log_level = logger.debug, logging.DEBUG
        template = logging_format_base + " line %(lineno)d: %(message)s"
        for handler in logging.root.handlers:
            handler.addFilter(logging.Filter("dsc"))
    else:
        log_method, log_level = logger.info, logging.INFO
        template = logging_format_base + ": %(message)s"

    logger.setLevel(log_level)
    logging.basicConfig(format=template)
    logger.addHandler(logging.StreamHandler(stream))
    log_method(f"{logging.getLevelName(logger.getEffectiveLevel())}")

    return (
        f"Logger '{logger.name}' configured with level="
        f"{logging.getLevelName(logger.getEffectiveLevel())}"
    )

At a glance, what makes this hard to refactor are the tests:

  • test_configure_logger_not_verbose
  • test_configure_logger_verbose

which attempt to read from the stream handler that is setup. That's what the log_method, log_level = do above, is get the appropriate logging method to use for log_method(f"{logging.getLevelName(logger.getEffectiveLevel())}") which satisfies these tests.

My feeling: good enough for now, but globally I think we could improve our logging setups. The presence of logging.basicConfig() is kind of a red flag unto itself. But, I think this is mostly consistent with other projects, so value in that.

For example: I'm not sure that returning a string is ideal. Perhaps this method could just log what it has done, and return None. But this would require updating the CLI, tests, etc. that are expecting a string response. And as mentioned, this is consistent with other apps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need review this at a deeper level but figured I'd pass along a discussion @ghukill and I had on a previous PR I worked on re: excluding configure_sentry and configure_logger from the Config class:

From @ghukill :

  • I like having a Config class that I can put complex behavior on for static-esque values I want to access through the application
  • I'm indifferent to configure_logger and configure_sentry on there / they feel somewhat disconnected from the essence of the application, which I attribute to the Config class. Especially sentry. Logging is a little more difficult for me... but then again logging is always kind of difficult. \ Of course, if those methods (configure_sentry and configure_logger) are on it, they don't need to be called.... but if I'm not supposed to call configure_sentry from anywhere in the app, then maybe it doesn't belong on that global Config object?
    This is 5:00pm talking, no much reasoning, but just my knee-jerk thinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor, thanks! I was hoping this would prompt a larger conversation about the template/config class because I think it needs work as well, maybe a future DataEng meeting topic?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a future DataEng topic could be good, for sure. But I think it could/should extend longer than 30 minute meeting w/ CB.

I have two somewhat conflicting feelings at the moment re: config and logging:

  1. it's okay for projects to tweak our established approach as needed
  2. it's okay if slightly awkward or sub-optimal insofar as it's consistent with other repos

I think it'd be easy for any one project to grind to a halt trying to get it perfect, when we'll always have tension with, "but others don't do it that way..."

Until we do take a long and thoughtful look at our config + logging approach for python projects, I'm pretty okay with allowing projects to try and strike a balance between feelings #1 and #2 above.

@ehanson8 ehanson8 marked this pull request as ready for review December 5, 2024 20:43
Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments and a suggestion for a docstring update, but none blocking! Looking like good foundation until the more interesting bits come.

message = f"Missing required environment variables: {', '.join(missing_vars)}"
raise OSError(message)

def configure_logger(
Copy link

@ghukill ghukill Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a possible refactor, removing some redundancies:

def configure_logger(
    self, logger: logging.Logger, stream: StringIO, *, verbose: bool
) -> str:
    logging_format_base = "%(asctime)s %(levelname)s %(name)s.%(funcName)s()"
    logger.addHandler(logging.StreamHandler(stream))

    if verbose:
        log_method, log_level = logger.debug, logging.DEBUG
        template = logging_format_base + " line %(lineno)d: %(message)s"
        for handler in logging.root.handlers:
            handler.addFilter(logging.Filter("dsc"))
    else:
        log_method, log_level = logger.info, logging.INFO
        template = logging_format_base + ": %(message)s"

    logger.setLevel(log_level)
    logging.basicConfig(format=template)
    logger.addHandler(logging.StreamHandler(stream))
    log_method(f"{logging.getLevelName(logger.getEffectiveLevel())}")

    return (
        f"Logger '{logger.name}' configured with level="
        f"{logging.getLevelName(logger.getEffectiveLevel())}"
    )

At a glance, what makes this hard to refactor are the tests:

  • test_configure_logger_not_verbose
  • test_configure_logger_verbose

which attempt to read from the stream handler that is setup. That's what the log_method, log_level = do above, is get the appropriate logging method to use for log_method(f"{logging.getLevelName(logger.getEffectiveLevel())}") which satisfies these tests.

My feeling: good enough for now, but globally I think we could improve our logging setups. The presence of logging.basicConfig() is kind of a red flag unto itself. But, I think this is mostly consistent with other projects, so value in that.

For example: I'm not sure that returning a string is ideal. Perhaps this method could just log what it has done, and return None. But this would require updating the CLI, tests, etc. that are expecting a string response. And as mentioned, this is consistent with other apps.

Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small change requests, but it's looking good! Will inspect at a deeper level via second pass.

message = f"Missing required environment variables: {', '.join(missing_vars)}"
raise OSError(message)

def configure_logger(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need review this at a deeper level but figured I'd pass along a discussion @ghukill and I had on a previous PR I worked on re: excluding configure_sentry and configure_logger from the Config class:

From @ghukill :

  • I like having a Config class that I can put complex behavior on for static-esque values I want to access through the application
  • I'm indifferent to configure_logger and configure_sentry on there / they feel somewhat disconnected from the essence of the application, which I attribute to the Config class. Especially sentry. Logging is a little more difficult for me... but then again logging is always kind of difficult. \ Of course, if those methods (configure_sentry and configure_logger) are on it, they don't need to be called.... but if I'm not supposed to call configure_sentry from anywhere in the app, then maybe it doesn't belong on that global Config object?
    This is 5:00pm talking, no much reasoning, but just my knee-jerk thinking.

def __init__(self) -> None:
self.client = client("s3")

def archive_file_with_new_key(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious whether archiving the metadata file will be required for workflows other than Wiley deposits? 🤔 If an archive/ folder will exist for every workflow bucket, I think the function name is fine as-is, otherwise, it might be good to rename the function so it is more universal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, we probably want an archived/processed path in all of the workflows to prevent re-processing of files

* Refactor Config.configure_logger method
* Rename retrieve_file_type_from_bucket > get_files_iter and refactor method
@coveralls
Copy link

Pull Request Test Coverage Report for Build 12202770589

Details

  • 64 of 68 (94.12%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-4.8%) to 95.238%

Changes Missing Coverage Covered Lines Changed/Added Lines %
dsc/config.py 35 37 94.59%
dsc/s3.py 22 24 91.67%
Totals Coverage Status
Change from base Build 12183247713: -4.8%
Covered Lines: 80
Relevant Lines: 84

💛 - Coveralls

@ehanson8 ehanson8 merged commit 8eb29e2 into main Dec 6, 2024
2 checks passed
@ehanson8 ehanson8 deleted the IN-1099-aws-clients branch December 6, 2024 17:55
ehanson8 added a commit that referenced this pull request Dec 6, 2024
* Add Config class

Why these changes are being introduced:
* The application needs a class to manage config values and a stream variable to store the application logs for an email sent at the end of each run.

How this addresses that need:
* Add Config class with corresponding unit tests and fixtures for the methods
* Update cli.py with calls to Config methods and stream variable
* Update README.md
* Update pyproject.toml

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1108

* Add S3 client

Why these changes are being introduced:
* An S3 client is needed for submission metadata and content files

How this addresses that need:
* Add S3Client class with corresponding unit tests and fixtures for the methods
* Update dependencies

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1099

* Updates based on discussion in PR #6

* Refactor Config.configure_logger method
* Rename retrieve_file_type_from_bucket > get_files_iter and refactor method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants