Skip to content

Commit

Permalink
Update sentry and logging configuration
Browse files Browse the repository at this point in the history
Why these changes are being introduced:

Our terraform template requires that each env variable be present in
all envs, even if we don't need it in a particular env (e.g. SENTRY_DSN
is not needed in dev). Env variables are now given a string value of
"None" in env(s) where they are not needed.

In addition, while working on this a logging bug was addressed in which
logging was configured on the cli module logger instead of the root
logger and thus was not propagating to all modules.

In resolving both issues it made sense to refactor these configuration
steps into their own functions in a separate config module.

How this addresses that need:
* Creates a new config module with functions to configure logging and
  sentry, and moves that logic from the cli module to these functions.
* Updates sentry configuration to check for a string value of "None" in
  addition to a None object value for the SENTRY_DSN env variable.
* Adds unit tests for new config module and update cli tests to match
  new expected outputs.

Additional changes:
* Minor update to conftest.py configuration for all tests.
* Two minor logging additions:
  * One to add a timer to the transform command and log the total
    transform time at the end of the process, useful as a benchmark for
    large/long transform jobs.
  * One to log a status update after every 1000 records transformed,
    useful to monitor progress during large/long transform jobs.
* Update Makefile to match standard section formatting across repos.
* Update Github Actions to use our shared CI workflow.
* Update coverage reporting to use coverage library instead of
  pytest-cov to match our shared CI workflow.
* Code/documentation cleanup across a few files.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/RDI-160
* https://mitlibraries.atlassian.net/browse/RDI-148
  • Loading branch information
hakbailey committed Jun 17, 2022
1 parent 2b1dbc9 commit b1a18aa
Show file tree
Hide file tree
Showing 15 changed files with 328 additions and 285 deletions.
7 changes: 7 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
name: CI
on: push
jobs:
test:
uses: mitlibraries/.github/.github/workflows/python-shared-test.yml@main
lint:
uses: mitlibraries/.github/.github/workflows/python-shared-lint.yml@main
37 changes: 0 additions & 37 deletions .github/workflows/test.yml

This file was deleted.

36 changes: 24 additions & 12 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,31 +5,43 @@ ECR_NAME_DEV:=transmogrifier-dev
ECR_URL_DEV:=222053980223.dkr.ecr.us-east-1.amazonaws.com/transmogrifier-dev
### End of Terraform-generated header ###

lint: bandit black flake8 isort mypy

### Dependency commands ###
install: ## Install script and dependencies
pipenv install --dev

update: install ## Update all Python dependencies
pipenv clean
pipenv update --dev


### Testing commands ###
test: ## Run tests and print a coverage report
pipenv run coverage run --source=transmogrifier -m pytest
pipenv run coverage report -m

coveralls: test
pipenv run coverage lcov -o ./coverage/lcov.info


### Linting commands ###
lint: bandit black flake8 isort mypy ## Lint the repo

bandit:
pipenv run bandit -r transmogrifier

black:
pipenv run black --check --diff transmogrifier tests

coveralls: test
pipenv run coveralls
pipenv run black --check --diff .

flake8:
pipenv run flake8 transmogrifier tests

install:
pipenv install --dev
pipenv run flake8 .

isort:
pipenv run isort transmogrifier tests --diff
pipenv run isort . --diff

mypy:
pipenv run mypy transmogrifier

test:
pipenv run pytest --cov-report term-missing --cov=transmogrifier

### Terraform-generated Developer Deploy Commands for Dev environment ###
dist-dev: ## Build docker container (intended for developer-based manual build)
Expand Down
3 changes: 1 addition & 2 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,15 @@ lxml = "*"
sentry-sdk = "*"
smart-open = {version = "*", extras = ["s3"]}


[dev-packages]
bandit = "*"
black = "*"
coverage = "*"
coveralls = "*"
flake8 = "*"
isort = "*"
mypy = "*"
pytest = "*"
pytest-cov = "*"

[requires]
python_version = "3.10"
Expand Down
369 changes: 185 additions & 184 deletions Pipfile.lock

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,13 @@ To lint the repo:
```
make lint
```

To run the app:
```
pipenv run transform <command>
```

## Required ENV
`LOGGING_LEVEL` = Set to the desired logging level for the app.

`SENTRY_DSN` = If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.

Expand Down
Empty file removed tests/__init__.py
Empty file.
7 changes: 7 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import os
from functools import partial

import pytest
Expand All @@ -24,6 +25,12 @@
from transmogrifier.sources.dspace_dim import DSpaceDim


@pytest.fixture(autouse=True)
def test_env():
os.environ = {"WORKSPACE": "test"}
yield


@pytest.fixture()
def runner():
return CliRunner()
Expand Down
27 changes: 14 additions & 13 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
from transmogrifier.cli import main


def test_cli_with_env(caplog, monkeypatch, runner, tmp_path):
monkeypatch.setenv("SENTRY_DSN", "https://1234567890@00000.ingest.sentry.io/123456")
monkeypatch.setenv("WORKSPACE", "test")
def test_cli_jpal_no_sentry_not_verbose(caplog, monkeypatch, runner, tmp_path):
monkeypatch.delenv("SENTRY_DSN", raising=False)
outfile = tmp_path / "timdex_jpal_records.json"
result = runner.invoke(
main,
Expand All @@ -17,18 +16,15 @@ def test_cli_with_env(caplog, monkeypatch, runner, tmp_path):
],
)
assert result.exit_code == 0
assert "Running transmogrifier with env=test and log level=INFO" in caplog.text
assert (
"Sentry DSN found, exceptions will be sent to Sentry with env=test"
in caplog.text
)
assert "Logger 'root' configured with level=INFO" in caplog.text
assert "No Sentry DSN found, exceptions will not be sent to Sentry" in caplog.text
assert "Running transform for source jpal" in caplog.text
assert "Completed transform, total record count: 38" in caplog.text
assert "Total time to complete transform" in caplog.text


def test_cli_without_env(caplog, monkeypatch, runner, tmp_path):
monkeypatch.delenv("SENTRY_DSN", raising=False)
monkeypatch.delenv("WORKSPACE", raising=False)
def test_cli_jpal_with_sentry_and_verbose(caplog, monkeypatch, runner, tmp_path):
monkeypatch.setenv("SENTRY_DSN", "https://1234567890@00000.ingest.sentry.io/123456")
outfile = tmp_path / "timdex_jpal_records.json"
result = runner.invoke(
main,
Expand All @@ -43,8 +39,13 @@ def test_cli_without_env(caplog, monkeypatch, runner, tmp_path):
],
)
assert result.exit_code == 0
assert "Running transmogrifier with env=None and log level=DEBUG" in caplog.text
assert "Sentry DSN found" not in caplog.text
assert "Logger 'root' configured with level=DEBUG" in caplog.text
assert (
"Sentry DSN found, exceptions will be sent to Sentry with env=test"
in caplog.text
)
assert "Running transform for source jpal" in caplog.text
assert "Completed transform, total record count: 38" in caplog.text


def test_cli_dspace_mets(caplog, runner, tmp_path):
Expand Down
35 changes: 35 additions & 0 deletions tests/test_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import logging

from transmogrifier.config import configure_logger, configure_sentry


def test_configure_logger_not_verbose():
logger = logging.getLogger(__name__)
result = configure_logger(logger, verbose=False)
assert logger.getEffectiveLevel() == 20
assert result == "Logger 'test_config' configured with level=INFO"


def test_configure_logger_verbose(caplog):
logger = logging.getLogger(__name__)
result = configure_logger(logger, verbose=True)
assert logger.getEffectiveLevel() == 10
assert result == "Logger 'test_config' configured with level=DEBUG"


def test_configure_sentry_no_env_variable(monkeypatch):
monkeypatch.delenv("SENTRY_DSN", raising=False)
result = configure_sentry()
assert result == "No Sentry DSN found, exceptions will not be sent to Sentry"


def test_configure_sentry_env_variable_is_none(monkeypatch):
monkeypatch.setenv("SENTRY_DSN", "None")
result = configure_sentry()
assert result == "No Sentry DSN found, exceptions will not be sent to Sentry"


def test_configure_sentry_env_variable_is_dsn(monkeypatch):
monkeypatch.setenv("SENTRY_DSN", "https://1234567890@00000.ingest.sentry.io/123456")
result = configure_sentry()
assert result == "Sentry DSN found, exceptions will be sent to Sentry with env=test"
5 changes: 0 additions & 5 deletions tests/test_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,8 +192,3 @@ def test_parse_xml_records_raises_error_if_no_records_found():
records = parse_xml_records("tests/fixtures/no_records.xml")
with raises(ValueError):
next(records)


# TODO: add read from S3 with moto

# TODO: add tests for write function, including to S3 with moto
8 changes: 0 additions & 8 deletions transmogrifier/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +0,0 @@
import logging

logging.basicConfig(
format=(
"%(asctime)s %(levelname)s %(name)s.%(funcName)s() line "
"%(lineno)d: %(message)s"
)
)
32 changes: 11 additions & 21 deletions transmogrifier/cli.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import logging
import os
from datetime import timedelta
from time import perf_counter

import click
import sentry_sdk

from transmogrifier.config import SOURCES
from transmogrifier.config import SOURCES, configure_logger, configure_sentry
from transmogrifier.helpers import parse_xml_records, write_timdex_records_to_json
from transmogrifier.sources.datacite import Datacite
from transmogrifier.sources.dspace_dim import DSpaceDim
Expand Down Expand Up @@ -38,30 +38,17 @@
"-v", "--verbose", is_flag=True, help="Pass to log at debug level instead of info"
)
def main(source, input_file, output_file, verbose):
env = os.getenv("WORKSPACE")
START_TIME = perf_counter()
root_logger = logging.getLogger()
if verbose:
root_logger.setLevel(logging.DEBUG)
else:
root_logger.setLevel(logging.INFO)
logger.info(
"Running transmogrifier with env=%s and log level=%s,",
env,
logging.getLevelName(logger.getEffectiveLevel()),
)
if sentry_dsn := os.getenv("SENTRY_DSN"):
sentry_sdk.init(sentry_dsn, environment=env)
logger.info(
"Sentry DSN found, exceptions will be sent to Sentry with env=%s", env
)
logger.info(configure_logger(root_logger, verbose))
logger.info(configure_sentry())
logger.info("Running transform for source %s", source)
input_records = parse_xml_records(input_file)
if source == "dspace":
input_records = parse_xml_records(input_file)
output_records = DspaceMets(
source, SOURCES[source]["base_url"], SOURCES[source]["name"], input_records
)
elif source == "jpal":
input_records = parse_xml_records(input_file)
output_records = Datacite(
source,
SOURCES[source]["base_url"],
Expand All @@ -77,7 +64,6 @@ def main(source, input_file, output_file, verbose):
input_records,
)
elif source == "zenodo":
input_records = parse_xml_records(input_file)
output_records = Zenodo(
source,
SOURCES[source]["base_url"],
Expand All @@ -86,3 +72,7 @@ def main(source, input_file, output_file, verbose):
)
count = write_timdex_records_to_json(output_records, output_file)
logger.info("Completed transform, total record count: %d", count)
elapsed_time = perf_counter() - START_TIME
logger.info(
"Total time to complete transform: %s", str(timedelta(seconds=elapsed_time))
)
35 changes: 35 additions & 0 deletions transmogrifier/config.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
"""transmogrifier.config module."""
import logging
import os

import sentry_sdk

SOURCES = {
"dspace": {"name": "DSpace@MIT", "base_url": "https://dspace.mit.edu/"},
"jpal": {
Expand All @@ -13,3 +19,32 @@
"base_url": "https://zenodo.org/record/",
},
}


def configure_logger(logger: logging.Logger, verbose: bool) -> str:
if verbose:
logging.basicConfig(
format="%(asctime)s %(levelname)s %(name)s.%(funcName)s() line %(lineno)d: "
"%(message)s",
)
logger.setLevel(logging.DEBUG)
for handler in logging.root.handlers:
handler.addFilter(logging.Filter("transmogrifier"))
else:
logging.basicConfig(
format="%(asctime)s %(levelname)s %(name)s.%(funcName)s(): %(message)s"
)
logger.setLevel(logging.INFO)
return (
f"Logger '{logger.name}' configured with level="
f"{logging.getLevelName(logger.getEffectiveLevel())}"
)


def configure_sentry() -> str:
env = os.getenv("WORKSPACE")
sentry_dsn = os.getenv("SENTRY_DSN")
if sentry_dsn and sentry_dsn.lower() != "none":
sentry_sdk.init(sentry_dsn, environment=env)
return f"Sentry DSN found, exceptions will be sent to Sentry with env={env}"
return "No Sentry DSN found, exceptions will not be sent to Sentry"
Loading

0 comments on commit b1a18aa

Please sign in to comment.