-
Couldn't load subscription status.
- Fork 2
In 908 build dev testing framework #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: * Provide a clear, understandable workflow for developers to easily test the application and verify functionality. How this addresses that need: * Update README to provide more info on how to run the application * Restore option to write output to a file (without using FTP server) Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-908
486ad79 to
ddc79eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts
README.md
Outdated
|
|
||
| 3. Run `make publish-stage` to push the Docker container image to ECR for the `stage` environment. | ||
|
|
||
| 4. Run any `make` commands with calls to `aws ecs run-task --cluster carbon-ecs-stage ...` for testing the application. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the language you used with step 4 in With Docker instead of referencing the ECS command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied the change.
Makefile
Outdated
|
|
||
| ## ---- Carbon run commands ---- ## | ||
|
|
||
| run-connection-tests-dev: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about using dev here if this is in fact a local run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, had the same thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I changed the name of the Make command to run-connection-tests-locally and updated the the instructions to read:
Run any
makecommands for testing the application. In the Makefile, the names of relevant make commands will contain the suffix ['-locally' | '-stage'].
Let me know what you think!
README.md
Outdated
| 2. Run `make dependencies` to download the Oracle Instant Client from S3. | ||
| 3. If the run requires a connection to the Data Warehouse, connect to an approved VPN client. Otherwise, skip this step. | ||
| 4. Follow the steps relevant to the machine you are running: | ||
| * If you are on a machine that cannot run Oracle Instant Client, follow the steps outlined in [With Docker](#with-docker). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should include a note to skip make publish-dev since we just need to run make dist-dev to build the container locally right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also agreed. Will save for a more general comment, but I think this is a good example where some of these instructions deviate for purely local, docker testing vs testing the deployed container in Dev1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a sentence about skipping the step to publish the Docker container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of leaving inline comments, and in anticipation of discussing later, going to break it out into some higher level comments.
As we've batted around a bit, I think there is some blurring between "tests" and "docker development conveniences" (for lack of a better term).
Some of the big value I'm seeing here is the ability to run a docker container locally, of the app, and have it perform aspects of the full, normal application logic that help a developer as they work. Specifically, the ability to redirect the generated XML to a local file instead of writing to FTP.
This is particularly important given the Oracle DB library dependencies, that don't work on Apple Silicon. The docker container is critical for seeing actual output, from actual data in the data warehouse.
When I see a CLI option like --run_connection_tests it does what I expect: it's running some connection tests, printing those results, and then bailing.
But while the option --output_file=/output/people.xml is very helpful, and did see the XML that was written to the container and mounted to my host machine, it also sent some SNS logs and did more work than I needed. As I think you suggested @jonavellecuerdo, maybe another flag to prevent that from happening could be helpful.
More granular, noticed some things in the README that may need updating:
- env vars
- think
LOG_LEVELandSENTRY_DSNare both required for docker runs SNS_TOPICis an ARN; consider renaming toSNS_TOPIC_ARN?- good example of "full" vs "limited" functionality: maybe not require
SYMPLECTIC_FTP_JSONif I'm only writing XML to a local file? AWS_DEFAULT_REGIONwas also required, but not explicitly stated, and not always a default AWS env var
- think
- am I misunderstanding, but wouldn't a VPN connection always be needed for data warehouse queries?
All that said, I think it's a great step forward in the ability to see / test the output without writing to FTP. Looking forward to discussing more, but feel like it might just come down to the README explaining what "convenience" features are available for development, while being clear they aren't really "tests" in the traditional sense.
|
@ghukill A few responses to your comments: Addressed:
To-dos:
|
e10794c to
845f45c
Compare
Why these changes are being introduced: * SNS logging is not necessary for testing Carbon runs How this addresses that need: * Introduce CLI option "--use_sns_logging/--ignore_sns_logging" * Update tests for SNS logging Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-908
845f45c to
1d4dab3
Compare
|
Hi @ghukill and @ehanson8 ! Please review the changes from the last two commits. Namely, these commits involve (as discussed):
Thank you again and let me know if you have additional thoughts/questions! :) |
I think that's fine to hold off on a template until further |
I think it's fine but this could be addressed by changing the fixture's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I'll defer to @ghukill for final approval since this involves first passes on some future best practices
| * `SYMPLECTIC_FTP_PASS` = The password for accessing the Symplectic FTP server. | ||
| * `SYMPLECTIC_FTP_PATH` = The full file path to the XML file (including the file name) that is uploaded to the Symplectic FTP server. | ||
| * `WORKSPACE` = Set to `dev` for local development. This will be set to `stage` and `prod` in those environments by Terraform. | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghukill Your thoughts on this formatting for comments in copy/paste .env section? I think you did it differently for timdex-simulator and I think it's worth standardizing early on but I don't have strong feelings on either approach, either works for me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this format. Personally, I'm in support of comments above env vars, as that would support multi-line comments if we ever need them. Yes, it's busy, and takes up space, but that's okay by me.
A couple minor nit picks:
- I'd suggest we try and float common env vars to the top, like
WORKSPACE - probably don't need a comment on
WORKSPACE, as it is common to most repos, and worry those comments would drift / conflict / confuse - FWIW, and maybe not applicable here, but I'd value logical grouping over alphanumeric
At the end of the day, I'd really just imagine you're new to the codebase: what do you want communicated to you when you first look at this? I'd hope for really important and/or really obvious ones at the top, likely without comments. Then little sections of interelated ones, with comments as needed.
Just as an example, for this repo me knee-jerk expectation might be:
WORKSPACE= "dev"
# type of feed, either "people" or "articles"
FEED_TYPE="people"
# JSON formatted string of key/value pairs for the MIT Data Warehouse connection
DATAWAREHOUSE_CLOUDCONNECTOR_JSON='{"USER": "<VALID_DATAWAREHOUSE_USERNAME>", "PASSWORD": "<VALID_DATAWAREHOUSE_PASSWORD>", "HOST": "<VALID_DATAWAREHOUSE_HOST>", "PORT": "<VALID_DATAWAREHOUSE_PORT>", "PATH": "<VALID_DATAWAREHOUSE_ORACLE_SID>", "CONNECTION_STRING": "<VALID_DATAWAREHOUSE_CONNECTION_STRING>"}'
# JSON formatted string of key/value pairs for Symplectic FTP server connection
SYMPLECTIC_FTP_JSON='{"SYMPLECTIC_FTP_HOST": "<VALID_ELEMENTS_FTP_HOST>", "SYMPLECTIC_FTP_PORT": "<VALID_ELEMENTS_FTP_PORT>", "SYMPLECTIC_FTP_USER": "<VALID_ELEMENTS_FTP_USER>", "SYMPLECTIC_FTP_PASS": "<VALID_ELEMENTS_FTP_PASSWORD>"}'
# full XML filepath that is uploaded to the Symplectic FTP server
SYMPLECTIC_FTP_PATH="<FTP_FILE_DIRECTORY>/<FEED_TYPE>.xml"
# SNS topic ARN used for sending email notifications
SNS_TOPIC="arn:..."- just stylistic and subjective, but try and keep them short and snappy, easily scannable
- avoiding capital letters at the beginning, makes the comments stand-out from the UPPERCASE env vars
- floated the symplectic FTP connection under database, as they are related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this!
|
|
||
| if not run_connection_tests: | ||
| sns_log(config_values=config_values, status="start") | ||
| logger.info("Carbon run has started.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this refactored block
tests/test_helpers.py
Outdated
|
|
||
| @freeze_time("2023-08-18") | ||
| def test_sns_log(caplog, stubbed_sns_client): | ||
| def test_sns_log_returns_start_message(stubbed_sns_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this test doing more than this now? I'd make it more generic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the name! Thank you for catching this.
I agree, and think that's well summarized. Without digging in too deeply, it seems okay to me to think of this test as testing how the Theoretically -- as @ehanson8 pointed out -- this fixture could be scoped to the test level and could even accept the expected request/response payloads as arguments to re-initialize it each time. But... unsure if worth the trouble here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Agreed with @ehanson8 on most all fronts. My comments are pretty stylistic and subjective, please feel free to consider or disregard. Overall, I think it's looking geat.
| sns_start_response = sns_log(config_values, status="start") | ||
| assert sns_start_response["MessageId"] == "StartMessageId" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! So the returned receipt is enought to assert on it looks like?
Makefile
Outdated
|
|
||
| ## ---- Carbon run commands ---- ## | ||
|
|
||
| run-connection-tests-local: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find myself still tripping on this Makefile command. When I see "local" here, that makes me think it's attempting to connect to a local Oracle instance.
What about just run-connection-tests? I would assume, then, that it's the .env file that drives what context is getting tested.
Perhaps the "local" -- and I was present for the conversation and agreed at the time -- was us trying to suggest to the developer that it's using docker? If so, what about run-connection-tests-via-docker? It's long... but perhaps it indicates what is happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we've been using the phrase "run-connection-tests-stage" for "running the connection tests via an ECS task in Stage-Workloads".
I updated the Makefile commands to run-connection-tests-with-docker and run-connection-tests-with-ecs-stage.
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. Sorry for the all the back and forth on this. I guess it's understandable, as the naming kind of needles what the real goal is.
I like this one, run-connection-tests-with-ecs-stage:
- running connection tests
- hitting ECS directly
- stage environment is hardcoded into make command, so makes sense
Now the other, run-connection-tests-with-docker, if I'm summarizing my understanding:
- runs connection tests
- which Oracle DB, and which Symplectic FTP endpoint, are driven by
.envfile - looking at command, can see quite clearly it's running docker container on local machine
So... I think I like it? If I, as a developer, look at the command in the Makefile I can see the .env file getting mounted.
My follow-up question, am I missing it, or is this command documented in the README? If not, what about a small comment in the Makefile, something along the lines of:
# Run connection tests from local docker instance, driven by Oracle DB and Symplectic FTP configs from env vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment! :)
| * `SYMPLECTIC_FTP_PASS` = The password for accessing the Symplectic FTP server. | ||
| * `SYMPLECTIC_FTP_PATH` = The full file path to the XML file (including the file name) that is uploaded to the Symplectic FTP server. | ||
| * `WORKSPACE` = Set to `dev` for local development. This will be set to `stage` and `prod` in those environments by Terraform. | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this format. Personally, I'm in support of comments above env vars, as that would support multi-line comments if we ever need them. Yes, it's busy, and takes up space, but that's okay by me.
A couple minor nit picks:
- I'd suggest we try and float common env vars to the top, like
WORKSPACE - probably don't need a comment on
WORKSPACE, as it is common to most repos, and worry those comments would drift / conflict / confuse - FWIW, and maybe not applicable here, but I'd value logical grouping over alphanumeric
At the end of the day, I'd really just imagine you're new to the codebase: what do you want communicated to you when you first look at this? I'd hope for really important and/or really obvious ones at the top, likely without comments. Then little sections of interelated ones, with comments as needed.
Just as an example, for this repo me knee-jerk expectation might be:
WORKSPACE= "dev"
# type of feed, either "people" or "articles"
FEED_TYPE="people"
# JSON formatted string of key/value pairs for the MIT Data Warehouse connection
DATAWAREHOUSE_CLOUDCONNECTOR_JSON='{"USER": "<VALID_DATAWAREHOUSE_USERNAME>", "PASSWORD": "<VALID_DATAWAREHOUSE_PASSWORD>", "HOST": "<VALID_DATAWAREHOUSE_HOST>", "PORT": "<VALID_DATAWAREHOUSE_PORT>", "PATH": "<VALID_DATAWAREHOUSE_ORACLE_SID>", "CONNECTION_STRING": "<VALID_DATAWAREHOUSE_CONNECTION_STRING>"}'
# JSON formatted string of key/value pairs for Symplectic FTP server connection
SYMPLECTIC_FTP_JSON='{"SYMPLECTIC_FTP_HOST": "<VALID_ELEMENTS_FTP_HOST>", "SYMPLECTIC_FTP_PORT": "<VALID_ELEMENTS_FTP_PORT>", "SYMPLECTIC_FTP_USER": "<VALID_ELEMENTS_FTP_USER>", "SYMPLECTIC_FTP_PASS": "<VALID_ELEMENTS_FTP_PASSWORD>"}'
# full XML filepath that is uploaded to the Symplectic FTP server
SYMPLECTIC_FTP_PATH="<FTP_FILE_DIRECTORY>/<FEED_TYPE>.xml"
# SNS topic ARN used for sending email notifications
SNS_TOPIC="arn:..."- just stylistic and subjective, but try and keep them short and snappy, easily scannable
- avoiding capital letters at the beginning, makes the comments stand-out from the UPPERCASE env vars
- floated the symplectic FTP connection under database, as they are related
| "Turn on SNS logging. If SNS logging is used, notification emails " | ||
| "indicating the start and result of a Carbon run will be sent to subscribers " | ||
| "for the Carbon topic. Defaults to True." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this description. While "SNS Logging" makes me immediately think it might be cranking out lots of logging to SNS, this makes it clear it's really just emails, occassionally.
b9b5a1e to
0692bd1
Compare
* Fix name for SNS log unit test * Reorganize code block in 'Required Env' of README * Rename Makefile commands for running connection tests * Add tests to confirm options for enabling SNS logging is functional
0692bd1 to
fd92724
Compare
For the record, continued to dig in, and may have misspoke a bit. But, discussing in a call vs writing here. |
|
@ghukill Applied the suggestions you shared with me and they work well! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that work to make the SNS fixtures more flexible, what about breaking them out into individual tests now?
Related: now that you have the ability to simulate 1) start, and 2) fail SNS messages, what about an unhappy path test that triggers the failed SNS message when something goes wrong during normal app flow? Unless I'm missing that test if it already exists.
Otherwise, looking good!
tests/test_helpers.py
Outdated
| def test_sns_log(caplog, stubbed_sns_client): | ||
| def test_sns_log_publishes_status_messages_fail(stubbed_sns_client_start_fail): | ||
| config_values = load_config_values() | ||
| with patch("boto3.client") as mocked_boto_client: | ||
| mocked_boto_client.return_value = stubbed_sns_client | ||
| sns_log(config_values, status="start") | ||
| mocked_boto_client.return_value = stubbed_sns_client_start_fail | ||
|
|
||
| sns_log(config_values, status="success") | ||
| assert "Carbon run has successfully completed." in caplog.text | ||
| sns_start_response = sns_log(config_values, status="start") | ||
| assert sns_start_response["MessageId"] == "StartMessageId" | ||
|
|
||
| sns_log(config_values, status="fail") | ||
| assert "Carbon run has failed." in caplog.text | ||
| sns_fail_response = sns_log(config_values, status="fail") | ||
| assert sns_fail_response["MessageId"] == "FailMessageId" | ||
|
|
||
|
|
||
| @freeze_time("2023-08-18") | ||
| def test_sns_log_publishes_status_messages_success(stubbed_sns_client_start_success): | ||
| config_values = load_config_values() | ||
| with patch("boto3.client") as mocked_boto_client: | ||
| mocked_boto_client.return_value = stubbed_sns_client_start_success | ||
|
|
||
| sns_start_response = sns_log(config_values, status="start") | ||
| assert sns_start_response["MessageId"] == "StartMessageId" | ||
|
|
||
| sns_fail_response = sns_log(config_values, status="success") | ||
| assert sns_fail_response["MessageId"] == "SuccessMessageId" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's nice these are broken into [start, succes] and [start, fail]. But wonder if it's worth going a step further and just have each tested? Something along the lines of
stubbed_sns_client_start(fixture) -->test_sns_log_publishes_status_message_startstubbed_sns_client_success(fixture) -->test_sns_log_publishes_status_message_successstubbed_sns_client_fail(fixture) -->test_sns_log_publishes_status_message_fail
In that case, I think it's still worth keeping the stubbed_sns_client_start_success and stubbed_sns_client_start_fail fixtures, as they are helpful for testing the actual app flow and logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghukill Thank you for all your help and suggestions to improve testing! Please see the latest commit for updates to tests related to SNS logging and failed runs.
585b245 to
4dd4659
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Nice work again on all this testing exploration and extending. I think the ability to test a start --> fail for the app flow is good to have.
| functional_engine, | ||
| monkeypatch, | ||
| runner, | ||
| stubbed_sns_client_start_fail, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a little test here and have a suggestion. As we discussed, it feels odd to stub this with start / fail messages, when really you don't anticipate the stubber to get called at all!
Creating an "empty" stubber appears to work given the new pattern:
@pytest.fixture
def stubbed_sns_client_empty():
sns_client, stubber = setup_stubbed_sns_client([])
with stubber:
yield sns_clientThen you can replace stubbed_sns_client_start_fail with stubbed_sns_client_empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm understanding how my comment above was incorrect. Looks like you've got a test for a more accurate representation that happens:
- app throws exception
sns_log()invoked to notify about failure- stubber is ready for this failure scenario
So it would not bypass the SNS logging entirely. Nice work!
| stubbed_sns_client_start_fail, | ||
| ): | ||
| with patch("boto3.client") as mocked_sns_client, patch.object( | ||
| DatabaseToFtpPipe, "run", side_effect=Exception(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find with the side_effect
* Create individual fixtures for different SNS log statuses * Create CLI tests to verify conditional SNS logging
4dd4659 to
786bdc0
Compare
What does this PR do?
Scope out a workflow that allows a developer to easily test the application.
Helpful background context
When testing, the main quirk is that the host name for the Data Warehouse is different when you connect to the Data Warehouse outside of AWS. Please reach out to me if you need help setting you
.envfile.How can a reviewer manually see the effects of these changes?
Follow the instructions in the README to run the Data Warehouse connection test
by running the application in a Docker container on your local machine.
make run-connection-tests-dev.Includes new or updated dependencies?
NO
Developer
Code Reviewer
(not just this pull request message)