-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Docker stack notifications further and fix ES indexing when DB is not refreshed #6045
Conversation
This simplifies navigating to the Stack URL, which is helpful for accessing logs for both failed and successful deploys Also removes unhelpful variation in main vs PR notifications, leading to a more consistent message for both failed and successful builds
… deploys 1. Fixes a few errors in previous commit for notifications 2. Ensures ElasticSearch indexes are updated on subsequent re-deploys The latter fix is necessitated by the recent change to how ElasticSearch is containerized, building our own image instead of using one from hub. I observed that when re-deploying a stack, because the database isn't refreshed, the ES indexing would not run either, and this is why the post-deploy tests have been failing on the Search steps
fi | ||
|
||
./cfgov/manage.py update_index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcesher is this a deliberate change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chosak Yes. I added an explanation to the commit but haven't added it to the PR yet. I discovered the bug while testing the notification changes. I will update the PR description with the explanation momentarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chosak If you'd prefer I pull that into a separate PR, I'm happy to do so. Whatever the team norms are for a case like this, I'll gladly abide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to others more familiar with the Elasticsearch changes -- I don't know enough about those tests to understand why this would be necessary again. Is it because the ES content gets wiped out with each redeploy, unlike the Postgres content (if REFRESH_DB
is unset)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chosak That's exactly it. My understanding of this problem is that prior to #6014 , the ES image would not be redeployed, and thus its indexes would persist over multiple builds of a given PR. Thus, the logic in docker-entrypoint.sh
made sense. But with #6014, that changes, but I failed to account for that in docker-entrypoint.sh
and that's why post-deploy functional tests have been consistently failing on the steps that involve searching. It also explains why those tests pass on Build 1, but fail on subsequent builds.
I did talk with @Scotchester this morning about an alternative approach, which we use for Collab, which would build the ES image in a separate repo, on a separate cadence (daily? weekly?) and then the docker-stack.yml
file would change to pull that image (from internal hub) similar to how it used to pull from external hub. I haven't thought hard enough though about how well the Collab model would work in cf.gov's case. We'd need to discuss it as a team.
Perhaps we merge this change for now, to alleviate all the functional test failures, and then take up the alternate approach separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The good news is that updating the index will not take long if there are no changes to it (if I remember correctly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait 🤦 that's not true at all if the image is rebuilt and there is no index. I do think we need to take a look at that alternate approach sooner, rather than later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roger that, Scott. Just this morning, we prioritized solving the Jenkins job-dsl rebuild problem that we discussed elsewhere and which I anticipated would be a tough nut to crack, and fortunately we solved that unexpectedly quickly. So that serendipitously leaves an opening for us to prioritize this in our current sprint, as well. I'll talk to the team about it. 🙌
Would you be OK merging this PR now, knowing that we'll prioritize the alternate approach this sprint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for sure! Merge when ready :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some wonderful improvements here; thank you, Marc!
This simplifies navigating to the Stack URL, which is helpful for accessing logs for both failed and successful deploys
Also removes unhelpful variation in main vs PR notifications, leading to a more consistent message for both failed and successful builds
This will result in messages similar to these: