Skip to content

Commit

Permalink
In 908 build dev testing framework (#107)
Browse files Browse the repository at this point in the history
* IN-908 Build framework for dev testing

Why these changes are being introduced:
* Provide a clear, understandable workflow for developers
 to easily test the application and verify functionality.

How this addresses that need:
* Update README to provide more info on how to run the application
   * Reorganize code block in section 'Required Env'
* Restore option to write output to a file (without using FTP server)
* Create an option to toggle SNS logging ("--use_sns_logging/--ignore_sns_logging")
* Create unit tests to verify conditional SNS logging
* Create CLI test for failed run

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-908
  • Loading branch information
jonavellecuerdo committed Sep 18, 2023
1 parent 22a70a7 commit 49e9155
Show file tree
Hide file tree
Showing 9 changed files with 362 additions and 133 deletions.
10 changes: 8 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ update: install # update all python dependencies
dependencies: # download Oracle instant client zip
aws s3 cp s3://$(S3_BUCKET)/files/$(ORACLE_ZIP) vendor/$(ORACLE_ZIP)

## ---- Test commands ---- ##
## ---- Unit test commands ---- ##

test: # run tests and print coverage report
pipenv run coverage run --source=carbon -m pytest -vv
Expand Down Expand Up @@ -97,5 +97,11 @@ dist-stage:
docker push $(ECR_URL_STAGE):latest
docker push $(ECR_URL_STAGE):`git describe --always`

run-connection-tests-stage: # use after the Data Warehouse password is changed every year to confirm that the new password works.

## ---- Carbon run commands ---- ##

run-connection-tests-with-docker: # run connection tests from local docker instance, driven by Oracle DB and Symplectic FTP configs from env vars
docker run -v ./.env:/.env carbon-dev --run_connection_tests

run-connection-tests-with-ecs-stage: # use after the Data Warehouse password is changed every year to confirm that the new password works
aws ecs run-task --cluster carbon-ecs-stage --task-definition carbon-ecs-stage-people --launch-type="FARGATE" --region us-east-1 --network-configuration '{"awsvpcConfiguration": {"subnets": ["subnet-05df31ac28dd1a4b0","subnet-04cfa272d4f41dc8a"], "securityGroups": ["sg-0f11e2619db7da196"],"assignPublicIp": "DISABLED"}}' --overrides '{"containerOverrides": [ {"name": "carbon-ecs-stage", "command": ["--run_connection_tests"]}]}'
113 changes: 72 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,86 +1,117 @@
# Carbon

Carbon is a tool for generating a feed of people that can be loaded into Symplectic Elements. It is designed to be run as a container. This document contains general application information. Please refer to the [mitlib-tf-workloads-carbon](https://github.com/mitlibraries/mitlib-tf-workloads-carbon) for the deployment configuration.
Carbon is a tool for loading data into [Symplectic Elements](https://support.symplectic.co.uk/support/solutions/articles/6000049890-symplectic-elements-quick-start-guide). Carbon retrieves records from the Data Warehouse, normalizes and writes the data to XML files, and uploads the XML files to the Elements FTP server. It is used to create and run the following feed types:

* `people`: Provides data for the HR Feed.
* `articles`: Provides data for the Publications Feed.

Please refer to the [mitlib-tf-workloads-carbon](https://github.com/mitlibraries/mitlib-tf-workloads-carbon) for the deployment configuration.

## Development

* To install with dev dependencies: `make install`
* To update dependencies: `make update`
* To run unit tests: `make test`
* To lint the repo: `make lint`
* To run the app: `pipenv run carbon --help`

The Data Warehouse runs on an older version of Oracle that necessitates the `thick` mode of python-oracledb, which requires the Oracle Instant Client Library (this app was developed with version 21.9.0.0.0). The test suite uses SQLite, so you can develop and test without connecting to the Data Warehouse.
The Data Warehouse runs on an older version of Oracle that necessitates the `thick` mode of `python-oracledb`, which requires the Oracle Instant Client Library (this app was developed with version 21.9.0.0.0).

### With Docker
### Running the test suite

**Note:** As of this writing, the Apple M1 Macs cannot run Oracle Instant Client, so Docker is the only option for development on those machines.
The test suite uses SQLite, so you can develop and test without connecting to the Data Warehouse.

From the project folder:
1. Run `make test` to run unit tests.

1. Export AWS credentials for the `dev1` environment.
### Running the application on your local machine

2. Run `make dependencies` to download the Oracle Instant Client from S3.
1. Export AWS credentials for the `Dev1` environment. For local runs, the `AWS_DEFAULT_REGION` environmnet variable must also be set.

3. Run `make dist-dev` to build the Docker container image.
2. Create a `.env` file at the root folder of the Carbon repo, and set the required environment variables described in [Required Env](#required-env).

4. Run `make publish-dev` to push the Docker container image to ECR for the `dev1` environment.
**Note**: The host for the Data Warehouse is different when connecting from outside of AWS (which uses Cloudconnector). For assistance, reach out to the [Data Warehouse team](https://ist.mit.edu/warehouse).

5. Run any `make` commands for testing the application.
3. Connect to an approved [VPN client](https://ist.mit.edu/vpn).

Any tests that involve connecting to the Data Warehouse will need to be run as an ECS task in the `stage` environment, which requires building and publishing the Docker container image to ECR for `stage`. As noted in step 1, the appropriate AWS credentials for `stage` must be set to run the commands for building and publishing the Docker container image. The `ECR_NAME_STAGE` and `ECR_URL_STAGE` environment variables must also be set; the values correspond to the 'Repository name' and 'URI' indicated on ECR for the container image, respectively.
4. Follow the steps relevant to the machine you are running:
* If you are on a machine that cannot run Oracle Instant Client, follow the steps outlined in [With Docker](#with-docker). When running the application locally, skip the step to run `make publish-dev` as it is not necessary to publish the container to ECR.

**Note**: As of this writing, Apple M1 Macs cannot run Oracle Instant Client.
* If you are on a machine that can run Oracle Instant Client, follow the steps outlined in [Without Docker](#without-docker).

### Without Docker
#### With Docker

1. Download Oracle Instant Client (basiclite is sufficient) and set the `ORACLE_LIB_DIR` env variable.
1. Run `make dependencies` to download the Oracle Instant Client from S3.

2. Run `pipenv run carbon`.
2. Run `make dist-dev` to build the Docker container image.

## Connecting to the Data Warehouse
3. Run `make publish-dev` to push the Docker container image to ECR for the `Dev1` environment.

The password for the Data Warehouse is updated each year. To verify that the updated password works, the app must be run as an ECS task in `stage` because Cloudconnector is not enabled in `dev1`. The app can run a database connection test when called with the flag, `--run_connection_tests`.
4. Run any `make` commands for testing the application. In the Makefile, the names of relevant make commands will contain the suffix '-with-docker'.

1. Export stage credentials and set `ECR_NAME_STAGE` and `ECR_URL_STAGE` env variables.
2. Run `make install`.
3. Run `make run-connection-tests-stage`.
4. View the logs from the ECS task run on CloudWatch.
* On CloudWatch, select the `carbon-ecs-stage` log group.
* Select the most recent log stream.
* Verify that the following log is included:
> Successfully connected to the Data Warehouse: \<VERSION NUMBER\>
#### Without Docker

1. Download [Oracle Instant Client](https://www.oracle.com/database/technologies/instant-client/downloads.html) (basiclite is sufficient) and set the `ORACLE_LIB_DIR` env variable.

2. Run any `make` commands for testing the application. In the Makefile, the names of relevant make commands will contain the suffix '-with-docker'.

### Running the application as an ECS task

The application can be run as an ECS task. Any runs that require a connection to the Data Warehouse must be executed as a task in the `Stage-Workloads` environment because Cloudconnector is not enabled in `Dev1`. This requires building and publishing the Docker container image to ECR for `Stage-Workloads`.

1. Export AWS credentials for the `stage` environment. The `ECR_NAME_STAGE` and `ECR_URL_STAGE` environment variables must also be set. The values correspond to the 'Repository name' and 'URI' indicated on ECR for the container image, respectively.

2. Run `make dist-stage` to build the Docker container image.

3. Run `make publish-stage` to push the Docker container image to ECR for the `stage` environment.

4. Run any `make` commands for testing the application. In the Makefile, the names of relevant make commands will contain the suffix '-with-ecs-stage' (e.g. `run-connection-tests-with-ecs-stage`).

For an example, see [Connecting to the Data Warehouse](#connecting-to-the-data-warehouse).

## Deploying

In the AWS Organization, we have a automated pipeline from Dev --> Stage --> Prod, handled by GitHub Actions.
In the AWS Organization, we have a automated pipeline from `Dev1` --> `Stage-Workloads` --> `Prod-Workloads`, handled by GitHub Actions.

### Staging

When a PR is merged onto the `main` branch, Github Actions will build a new container image, tag it both with `latest`, the git short hash, and the PR number, and then push the container with all the tags to the ECR repository in Stage. An EventBridge scheduled event will periodically trigger the Fargate task to run. This task will use the latest image from the ECR registry.
When a PR is merged onto the `main` branch, Github Actions will build a new container image. The container image will be tagged with "latest" and the shortened commit hash (the commit that merges the PR to `main`). The tagged image is then uploaded to the ECR repository in `Stage-Workloads`. An EventBridge scheduled event will periodically trigger the Fargate task to run. This task will use the latest image from the ECR registry.

### Production

Tagging a release on the `main` branch will promote a copy of the `latest` container from Stage-Worklods to Prod.
Tagging a release on the `main` branch will promote a copy of the `latest` container from `Stage-Workloads` to `Prod-Workloads`.

## Usage
## Connecting to the Data Warehouse

The Carbon application retrieves 'people' records from the Data Warehouse and generates an XML file that is uploaded to the Symplectic Elements FTP server. On Symplectic Elements, a job is scheduled to ingest the data from the XML file to create user accounts.
The password for the Data Warehouse is updated each year. To verify that the updated password works, run the connection tests for Carbon. Carbon will run connection tests for the Data Warehouse and the Elements FTP server when executed with the flag `--run_connection_tests`.

**Note:** The Carbon application can also retrieve 'articles' records, but as of this writing, it is not known whether this feature is still actively used.
1. Export AWS credentials for the `stage` environment. The `ECR_NAME_STAGE` and `ECR_URL_STAGE` environment variables must also be set. The values correspond to the 'Repository name' and 'URI' indicated on ECR for the container image, respectively.
2. Run `make install`.
3. Run `make run-connection-tests-with-ecs-stage`.
4. View the logs from the ECS task run on CloudWatch.
* On CloudWatch, select the `carbon-ecs-stage` log group.
* Select the most recent log stream.
* Verify that the following log is included:
> Successfully connected to the Data Warehouse: \<VERSION NUMBER\>
## Required ENV

* `FEED_TYPE` = The type of feed and is set to either "people" or "articles".
* `CONNECTION_STRING` = The connection string of the form `oracle://<username>:<password>@<server>:1521/<sid>` for the Data Warehouse.
* `SNS_TOPIC` = The ARN for the SNS topic used for sending email notifications.
* `SYMPLECTIC_FTP_HOST` = The hostname of the Symplectic FTP server.
* `SYMPLECTIC_FTP_PORT` = The port of the Symplectic FTP server.
* `SYMPLECTIC_FTP_USER` = The username for accessing the Symplectic FTP server.
* `SYMPLECTIC_FTP_PASS` = The password for accessing the Symplectic FTP server.
* `SYMPLECTIC_FTP_PATH` = The full file path to the XML file (including the file name) that is uploaded to the Symplectic FTP server.
* `WORKSPACE` = Set to `dev` for local development. This will be set to `stage` and `prod` in those environments by Terraform.
```
WORKSPACE= "dev"
# type of feed, either "people" or "articles"
FEED_TYPE="people"
# JSON formatted string of key/value pairs for the MIT Data Warehouse connection
DATAWAREHOUSE_CLOUDCONNECTOR_JSON='{"USER": "<VALID_DATAWAREHOUSE_USERNAME>", "PASSWORD": "<VALID_DATAWAREHOUSE_PASSWORD>", "HOST": "<VALID_DATAWAREHOUSE_HOST>", "PORT": "<VALID_DATAWAREHOUSE_PORT>", "PATH": "<VALID_DATAWAREHOUSE_ORACLE_SID>", "CONNECTION_STRING": "<VALID_DATAWAREHOUSE_CONNECTION_STRING>"}'
# A JSON formatted string of key/value pairs for connecting to the Symplectic Elements FTP server
SYMPLECTIC_FTP_JSON='{"SYMPLECTIC_FTP_HOST": "<VALID_ELEMENTS_FTP_HOST>", "SYMPLECTIC_FTP_PORT": "<VALID_ELEMENTS_FTP_PORT>", "SYMPLECTIC_FTP_USER": "<VALID_ELEMENTS_FTP_USER>", "SYMPLECTIC_FTP_PASS": "<VALID_ELEMENTS_FTP_PASSWORD>"}'
# full XML file path that is uploaded to the Symplectic Elements FTP server
SYMPLECTIC_FTP_PATH="<FTP_FILE_DIRECTORY>/people.xml"
# SNS topic ARN used for sending email notifications.
SNS_TOPIC="<VALID_SNS_TOPIC_ARN>"
```

## Optional ENV

Expand Down
20 changes: 17 additions & 3 deletions carbon/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,11 +118,11 @@ def write(self, feed_type: str) -> None:
This method will block until both the reader and writer are finished.
"""
pipe = threading.Thread(target=self.ftp_output_file)
pipe.start()
thread = threading.Thread(target=self.ftp_output_file)
thread.start()
super().write(feed_type)
self.output_file.close()
pipe.join()
thread.join()


class FtpFile:
Expand Down Expand Up @@ -168,6 +168,20 @@ def __call__(self) -> None:
ftps.quit()


class DatabaseToFilePipe:
"""A pipe feeding data from the Data Warehouse to a local file."""

def __init__(self, config: dict, engine: DatabaseEngine, output_file: IO):
self.config = config
self.engine = engine
self.output_file = output_file

def run(self) -> None:
FileWriter(engine=self.engine, output_file=self.output_file).write(
feed_type=self.config["FEED_TYPE"]
)


class DatabaseToFtpPipe:
"""A pipe feeding data from the Data Warehouse to the Symplectic Elements FTP server.
Expand Down
57 changes: 47 additions & 10 deletions carbon/cli.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import logging
import os
from typing import IO

import click

from carbon.app import DatabaseToFtpPipe
from carbon.app import DatabaseToFilePipe, DatabaseToFtpPipe
from carbon.config import configure_logger, configure_sentry, load_config_values
from carbon.database import DatabaseEngine
from carbon.helpers import sns_log
Expand All @@ -13,8 +14,32 @@

@click.command()
@click.version_option()
@click.option("--run_connection_tests", is_flag=True)
def main(*, run_connection_tests: bool) -> None:
@click.option(
"-o",
"--output_file",
help=(
"Name of file (including the extension) into which Carbon writes the output. "
"Defaults to None, which will write the output to an XML file on the "
"Symplectic Elements FTP server."
),
type=click.File("wb"),
default=None,
)
@click.option(
"--run_connection_tests",
help="Test connection to the Data Warehouse and the Symplectic Elements FTP server",
is_flag=True,
)
@click.option(
"--use_sns_logging/--ignore_sns_logging",
help=(
"Turn on SNS logging. If SNS logging is used, notification emails "
"indicating the start and result of a Carbon run will be sent to subscribers "
"for the Carbon topic. Defaults to True."
),
default=True,
)
def main(*, output_file: IO, run_connection_tests: bool, use_sns_logging: bool) -> None:
"""Generate a data feed that uploads XML files to the Symplectic Elements FTP server.
The feed uses a SQLAlchemy engine to connect to the Data Warehouse. A query is
Expand Down Expand Up @@ -44,20 +69,32 @@ def main(*, run_connection_tests: bool) -> None:
)

engine = DatabaseEngine()
engine.configure(config_values["CONNECTION_STRING"], thick_mode=True)

# test connection to the Data Warehouse
engine.configure(config_values["CONNECTION_STRING"], thick_mode=True)
engine.run_connection_test()

# test connection to the Symplectic Elements FTP server
pipe = DatabaseToFtpPipe(config=config_values, engine=engine)
pipe.run_connection_test()
pipe: DatabaseToFtpPipe | DatabaseToFilePipe
if output_file:
pipe = DatabaseToFilePipe(
config=config_values, engine=engine, output_file=output_file
)
else:
pipe = DatabaseToFtpPipe(config=config_values, engine=engine)
# test connection to the Symplectic Elements FTP server
pipe.run_connection_test()

if not run_connection_tests:
sns_log(config_values=config_values, status="start")
logger.info("Carbon run has started.")
if use_sns_logging:
sns_log(config_values=config_values, status="start")
try:
pipe.run()
except Exception as error: # noqa: BLE001
sns_log(config_values=config_values, status="fail", error=error)
logger.info("Carbon run has failed.")
if use_sns_logging:
sns_log(config_values=config_values, status="fail", error=error)
else:
sns_log(config_values=config_values, status="success")
logger.info("Carbon run has successfully completed.")
if use_sns_logging:
sns_log(config_values=config_values, status="success")
2 changes: 0 additions & 2 deletions carbon/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,6 @@

ENV_VARS = [
"FEED_TYPE",
"LOG_LEVEL",
"SENTRY_DSN",
"SNS_TOPIC",
"SYMPLECTIC_FTP_PATH",
"WORKSPACE",
Expand Down
15 changes: 7 additions & 8 deletions carbon/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def get_initials(*args: str) -> str:

def sns_log(
config_values: dict[str, Any], status: str, error: Exception | None = None
) -> None:
) -> dict | None:
"""Send a message to an Amazon SNS topic about the status of the Carbon run.
When Carbon is run in the 'stage' environment, subscribers to the 'carbon-ecs-stage'
Expand All @@ -115,26 +115,25 @@ def sns_log(
feed = config_values.get("FEED_TYPE", "")

if status == "start":
sns_client.publish(
return sns_client.publish(
TopicArn=sns_id,
Subject="Carbon run",
Message=(
f"[{datetime.now(tz=UTC).isoformat()}] Starting carbon run for the "
f"{feed} feed in the {stage} environment."
),
)
elif status == "success":
sns_client.publish(
if status == "success":
return sns_client.publish(
TopicArn=sns_id,
Subject="Carbon run",
Message=(
f"[{datetime.now(tz=UTC).isoformat()}] Finished carbon run for the "
f"{feed} feed in the {stage} environment."
),
)
logger.info("Carbon run has successfully completed.")
elif status == "fail":
sns_client.publish(
if status == "fail":
return sns_client.publish(
TopicArn=sns_id,
Subject="Carbon run",
Message=(
Expand All @@ -143,4 +142,4 @@ def sns_log(
f"in the {stage} environment: {error}."
),
)
logger.info("Carbon run has failed.")
return None

0 comments on commit 49e9155

Please sign in to comment.