Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operations: improve QA deployment #376

Closed
GraemeWatt opened this issue Jul 5, 2021 · 3 comments
Closed

operations: improve QA deployment #376

GraemeWatt opened this issue Jul 5, 2021 · 3 comments

Comments

@GraemeWatt
Copy link
Member

GraemeWatt commented Jul 5, 2021

We currently deploy qa and prod instances via Kubernetes. The qa deployment has some limitations, mainly that it shares the same Elasticsearch cluster as the prod instance. This means that the search on qa returns records that do not exist on the qa instance, resulting in broken links, and care is needed to avoid taking actions on qa that would change the Elasticsearch index such as finalising records. Finalising records should also be avoided since the qa deployment would mint DOIs with DataCite and send tweets to the @HEPData Twitter account. It would be better to use a separate Elasticsearch cluster for qa to allow the full functionality to be tested. The DataCite and Twitter test accounts can be used instead of the production ones. The Celery Beat deployment could also then be turned on for qa as well as prod, since it is currently switched off.

A method of easily restoring prod backups of the CephFS data directory and PostgreSQL database to qa should be developed, and maybe automatically run at regular intervals (daily/weekly). The qa Elasticsearch indices could maybe be recreated if this is easier than copying the corresponding prod indices to qa.

This issue mostly requires changes to the Kubernetes configuration after requesting a new Elasticsearch cluster from CERN IT, but it should be checked if any changes are needed to this HEPData/hepdata repository.

If we had the ability to add a banner (#322), it could contain a message warning users that the qa deployment is a test instance, similar to the message on inspirebeta.net.

@GraemeWatt
Copy link
Member Author

Today we deployed new datacite workers to allow a dedicated datacite queue to support rate-limiting on Celery tasks (see #404 and inspirehep/kubernetes#449), but only on prod and not on qa. If this issue is resolved, the datacite workers could also be enabled on qa with an Invenio-PIDStore config option PIDSTORE_DATACITE_TESTMODE = True to use the DataCite test account.

@alisonrclarke
Copy link
Contributor

Have made some changes the the Kubernetes config on the hepdata-qa-es branch, to add the datacite workers and the updated ES host. Is that all we need for now?

@GraemeWatt
Copy link
Member Author

Have made some changes the the Kubernetes config on the hepdata-qa-es branch, to add the datacite workers and the updated ES host. Is that all we need for now?

I made some further changes and opened cern-sis/kubernetes#536. The part about automating the restoration of backups from prod to qa has not been addressed yet, but I moved it to a separate lower-priority issue #494.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

2 participants