Skip to content

Commit

Permalink
In 884 refactor config loading (#88)
Browse files Browse the repository at this point in the history
* IN-884 Refactor method for loading configurations

Why these changes are being introduced:
* Simplify method for running Carbon ECS tasks by providing
environment variables through the 'secrets' and 'environment' blocks
and simplify function for loading configs.

How this addresses that need:
* Update references to environment variables by using revised names
* Replace 'Config' class with a method for loading configs
* Update 'test_env' to include new set of environment variables
* Simplify cli command and update tests accordingly
* Configure a logger that sends logs to CloudWatch
* Deprecate options in cli command
* Revise test_file_is_ftped to simply check for existence of
file on test FTP server
* Revise test_people_returns_people and test_articles_return_articles
to use test FTP server
* Add test_config


Side effects of this change:
* Revised names for environment variables (namely, appending
'SYMPLECTIC' prefix to FTP variables
* Click options are deprecated so ECS task definitions must also be updated

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-884
  • Loading branch information
jonavellecuerdo committed Aug 23, 2023
1 parent 3e7f074 commit 32d47f4
Show file tree
Hide file tree
Showing 12 changed files with 720 additions and 372 deletions.
2 changes: 2 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ name = "pypi"
black = "*"
boto3-stubs = {extras = ["essential"], version = "*"}
coveralls = "*"
freezegun = "*"
lxml-stubs = "*"
moto = {extras = ["sns"], version = "*"}
mypy = "*"
pre-commit = "*"
pytest = "*"
Expand Down
491 changes: 329 additions & 162 deletions Pipfile.lock

Large diffs are not rendered by default.

40 changes: 14 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,29 +44,6 @@ When a PR is merged onto the `main` branch, Github Actions will build a new cont

Tagging a release on the `main` branch will promote a copy of the `latest` container from Stage-Worklods to Prod.

## Configuration

The Fargate task needs the following arguments passed in at runtime.

| Argument | Description |
|----------|-------------|
| --ftp | |
| --sns-topic | The ARN for the SNS topic. This is used to send an email notification. |
| \<feed_type\> | The type of feed to run. This should be either `people` or `articles`. |

The ECS Fargate task also makes use of the following environment variables.

| Argument | Description |
|----------|-------------|
| `CARBON_DB` | an SQLAlchemy database connection string of the form `oracle://<username>:<password>@<server>:1521/<sid>`. |
| `FTP_HOST` | Hostname of FTP server |
| `FTP_PORT` | FTP server port |
| `FTP_USER` | FTP username |
| `FTP_PASS` | FTP password |
| `FTP_PATH` | Full path to file on FTP server |

These values are all set in the ECS Task Definition by the Terraform code in [mitlib-tf-workloads-carbon](https://github.com/mitlibraries/mitlib-tf-workloads-carbon).

## Usage

The CLI interface works the same whether running locally or as a container. When running as a container, however, remember that if specifying an output file (rather than stdout) it will go to a file local to the container, not your host system.
Expand All @@ -83,10 +60,21 @@ Carbon will generate an XML feed that can be uploaded to Symplectic. The command
(carbon)$ env CARBON_DB sqlite:///people.db carbon people
```

## Optional ENV
## Required ENV
* `FEED_TYPE` = The type of feed and is set to either "people" or "articles".
* `CONNECTION_STRING` = The connection string of the form `oracle://<username>:<password>@<server>:1521/<sid>` for the Data Warehouse.
* `SNS_TOPIC` = The ARN for the SNS topic used for sending email notifications.
* `SYMPLECTIC_FTP_HOST` = The hostname of the Symplectic FTP server.
* `SYMPLECTIC_FTP_PORT` = The port of the Symplectic FTP server.
* `SYMPLECTIC_FTP_USER` = The username for accessing the Symplectic FTP server.
* `SYMPLECTIC_FTP_PASS` = The password for accessing the Symplectic FTP server.
* `SYMPLECTIC_FTP_PATH` = The full file path to the XML file (including the file name) that is uploaded to the Symplectic FTP server.
* `WORKSPACE` = Set to `dev` for local development. This will be set to `stage` and `prod` in those environments by Terraform.

* `LOG_LEVEL` = The log level for the alma-patronload application. Defaults to INFO if not set.

* `ORACLE_LIB_DIR` = The directory containing the Oracle Instant Client library.

## Optional ENV

* `LOG_LEVEL` = The log level for the `carbon` application. Defaults to `INFO` if not set.
* `ORACLE_LIB_DIR` = The directory containing the Oracle Instant Client library.
* `SENTRY_DSN` = If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
114 changes: 46 additions & 68 deletions carbon/app.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
from __future__ import annotations

import logging
import os
import re
import threading
from contextlib import closing, contextmanager
from datetime import UTC, datetime
from ftplib import FTP, FTP_TLS # nosec
from functools import partial, update_wrapper
from functools import partial
from typing import IO, TYPE_CHECKING, Any

import boto3
import click
from click import Context
from lxml import etree as ET # nosec
from sqlalchemy import func, select

Expand All @@ -22,6 +21,7 @@
from socket import socket
from ssl import SSLContext

logger = logging.getLogger(__name__)

AREAS = (
"ARCHITECTURE & PLANNING AREA",
Expand Down Expand Up @@ -455,15 +455,6 @@ def _add_person(xf: IO, person: dict[str, Any]) -> None:
xf.write(record)


class Config(dict):
@classmethod
def from_env(cls) -> Config:
cfg = cls()
for var in ENV_VARS:
cfg[var] = os.environ.get(var)
return cfg


class FTPFeeder:
def __init__(
self,
Expand All @@ -481,64 +472,51 @@ def run(self) -> None:
with open(r, "rb") as fp_r, open(w, "wb") as fp_w:
ftp_rdr = FTPReader(
fp_r,
self.config["FTP_USER"],
self.config["FTP_PASS"],
self.config["FTP_PATH"],
self.config["FTP_HOST"],
int(self.config["FTP_PORT"]),
self.config["SYMPLECTIC_FTP_USER"],
self.config["SYMPLECTIC_FTP_PASS"],
self.config["SYMPLECTIC_FTP_PATH"],
self.config["SYMPLECTIC_FTP_HOST"],
int(self.config["SYMPLECTIC_FTP_PORT"]),
self.ssl_ctx,
)
PipeWriter(out=fp_w).pipe(ftp_rdr).write(feed_type)


def sns_log(f: Callable) -> Callable:
"""AWS SNS log decorator for wrapping a click command.
This can be used as a decorator for a click command. It will wrap
execution of the click command in a try/except so that any exception
can be logged to the SNS topic before being re-raised.
"""
msg_start = "[{}] Starting carbon run for the {} feed in the {} environment."
msg_success = "[{}] Finished carbon run for the {} feed in the {} environment."
msg_fail = (
"[{}] The following problem was encountered during the "
"carbon run for the {} feed in the {} environment:\n\n"
"{}"
)

@click.pass_context
def wrapped(ctx: Context, *args: str, **kwargs: str) -> Callable:
sns_id = ctx.params.get("sns_topic")
if sns_id:
client = boto3.client("sns")
stage = ctx.params.get("ftp_path", "").lstrip("/").split("/")[0]
feed = ctx.params.get("feed_type", "")
client.publish(
TopicArn=sns_id,
Subject="Carbon run",
Message=msg_start.format(datetime.now(tz=UTC).isoformat(), feed, stage),
)
try:
res = ctx.invoke(f, *args, **kwargs)
except Exception as e:
client.publish(
TopicArn=sns_id,
Subject="Carbon run",
Message=msg_fail.format(
datetime.now(tz=UTC).isoformat(), feed, stage, e
),
)
raise
else:
client.publish(
TopicArn=sns_id,
Subject="Carbon run",
Message=msg_success.format(
datetime.now(tz=UTC).isoformat(), feed, stage
),
)
else:
res = ctx.invoke(f, *args, **kwargs)
return res

return update_wrapper(wrapped, f)
def sns_log(
config_values: dict[str, Any], status: str, error: Exception | None = None
) -> None:
sns_client = boto3.client("sns")
sns_id = config_values.get("SNS_TOPIC")
stage = config_values.get("SYMPLECTIC_FTP_PATH", "").lstrip("/").split("/")[0]
feed = config_values.get("FEED_TYPE", "")

if status == "start":
sns_client.publish(
TopicArn=sns_id,
Subject="Carbon run",
Message=(
f"[{datetime.now(tz=UTC).isoformat()}] Starting carbon run for the "
f"{feed} feed in the {stage} environment."
),
)
elif status == "success":
sns_client.publish(
TopicArn=sns_id,
Subject="Carbon run",
Message=(
f"[{datetime.now(tz=UTC).isoformat()}] Finished carbon run for the "
f"{feed} feed in the {stage} environment."
),
)
logger.info("Carbon run has successfully completed.")
elif status == "fail":
sns_client.publish(
TopicArn=sns_id,
Subject="Carbon run",
Message=(
f"[{datetime.now(tz=UTC).isoformat()}] The following problem was "
f"encountered during the carbon run for the {feed} feed "
f"in the {stage} environment: {error}."
),
)
logger.info("Carbon run has failed.")
107 changes: 27 additions & 80 deletions carbon/cli.py
Original file line number Diff line number Diff line change
@@ -1,69 +1,18 @@
import json
from typing import IO
import logging
import os

import boto3
import click

from carbon.app import Config, FTPFeeder, Writer, sns_log
from carbon.config import configure_sentry
from carbon.app import FTPFeeder, sns_log
from carbon.config import configure_logger, configure_sentry, load_config_values
from carbon.db import engine

logger = logging.getLogger(__name__)


@click.command()
@click.version_option()
@click.argument("feed_type", type=click.Choice(["people", "articles"]))
@click.option("--db", envvar="CARBON_DB", help="Database connection string")
@click.option("-o", "--out", help="Output file", type=click.File("wb"))
@click.option(
"--ftp",
is_flag=True,
help="Send output to FTP server; do not use this with the -o/--out option",
)
@click.option(
"--ftp-host",
envvar="FTP_HOST",
help="Hostname of FTP server",
default="localhost",
show_default=True,
)
@click.option(
"--ftp-port",
envvar="FTP_PORT",
help="FTP server port",
default=21,
show_default=True,
)
@click.option("--ftp-user", envvar="FTP_USER", help="FTP username")
@click.option("--ftp-pass", envvar="FTP_PASS", help="FTP password")
@click.option("--ftp-path", envvar="FTP_PATH", help="Full path to file on FTP server")
@click.option(
"--secret-id",
help="AWS Secrets id containing DB connection "
"string and FTP password. If given, will "
"override other command line options.",
)
@click.option(
"--sns-topic",
help="AWS SNS Topic ARN. If given, a message "
"will be sent when the load begins and "
"then another message will be sent with "
"the outcome of the load.",
)
@sns_log
def main(
feed_type: str,
db: str,
out: IO,
ftp_host: str,
ftp_port: int,
ftp_user: str,
ftp_pass: str,
ftp_path: str,
secret_id: str,
sns_topic: str, # noqa: ARG001
*,
ftp: bool,
) -> None:
def main() -> None:
"""Generate feeds for Symplectic Elements.
Specify which FEED_TYPE should be generated. This should be either
Expand All @@ -83,26 +32,24 @@ def main(
server. The server should support FTP over TLS. Only one of -o/--out or
--ftp should be used.
"""
cfg = Config(
CARBON_DB=db,
FTP_USER=ftp_user,
FTP_PASS=ftp_pass,
FTP_PATH=ftp_path,
FTP_HOST=ftp_host,
FTP_PORT=ftp_port,
)
configure_sentry()

if secret_id is not None:
client = boto3.client("secretsmanager")
secret = client.get_secret_value(SecretId=secret_id)
secret_env = json.loads(secret["SecretString"])
cfg.update(secret_env)

engine.configure(cfg["CARBON_DB"])
if ftp:
click.echo(f"Starting carbon run for {feed_type}")
FTPFeeder({"feed_type": feed_type}, cfg).run()
click.echo(f"Finished carbon run for {feed_type}")
config_values = load_config_values()
sns_log(config_values=config_values, status="start")

try:
root_logger = logging.getLogger()
logger.info(configure_logger(root_logger, os.getenv("LOG_LEVEL", "INFO")))
configure_sentry()
logger.info(
"Carbon config settings loaded for environment: %s",
config_values["WORKSPACE"],
)

engine.configure(config_values["CONNECTION_STRING"])

click.echo(f"Starting carbon run for {config_values['FEED_TYPE']}")
FTPFeeder({"feed_type": config_values["FEED_TYPE"]}, config_values).run()
click.echo(f"Finished carbon run for {config_values['FEED_TYPE']}")
except RuntimeError:
sns_log(config_values=config_values, status="fail")
else:
Writer(out=out).write(feed_type)
sns_log(config_values=config_values, status="success")
Loading

0 comments on commit 32d47f4

Please sign in to comment.