operations: improve QA deployment #376

GraemeWatt · 2021-07-05T10:37:03Z

We currently deploy qa and prod instances via Kubernetes. The qa deployment has some limitations, mainly that it shares the same Elasticsearch cluster as the prod instance. This means that the search on qa returns records that do not exist on the qa instance, resulting in broken links, and care is needed to avoid taking actions on qa that would change the Elasticsearch index such as finalising records. Finalising records should also be avoided since the qa deployment would mint DOIs with DataCite and send tweets to the @HEPData Twitter account. It would be better to use a separate Elasticsearch cluster for qa to allow the full functionality to be tested. The DataCite and Twitter test accounts can be used instead of the production ones. The Celery Beat deployment could also then be turned on for qa as well as prod, since it is currently switched off.

A method of easily restoring prod backups of the CephFS data directory and PostgreSQL database to qa should be developed, and maybe automatically run at regular intervals (daily/weekly). The qa Elasticsearch indices could maybe be recreated if this is easier than copying the corresponding prod indices to qa.

This issue mostly requires changes to the Kubernetes configuration after requesting a new Elasticsearch cluster from CERN IT, but it should be checked if any changes are needed to this HEPData/hepdata repository.

If we had the ability to add a banner (#322), it could contain a message warning users that the qa deployment is a test instance, similar to the message on inspirebeta.net.

The text was updated successfully, but these errors were encountered:

GraemeWatt · 2021-11-23T16:45:05Z

Today we deployed new datacite workers to allow a dedicated datacite queue to support rate-limiting on Celery tasks (see #404 and inspirehep/kubernetes#449), but only on prod and not on qa. If this issue is resolved, the datacite workers could also be enabled on qa with an Invenio-PIDStore config option PIDSTORE_DATACITE_TESTMODE = True to use the DataCite test account.

alisonrclarke · 2022-03-03T14:49:24Z

Have made some changes the the Kubernetes config on the hepdata-qa-es branch, to add the datacite workers and the updated ES host. Is that all we need for now?

GraemeWatt · 2022-04-13T16:06:05Z

Have made some changes the the Kubernetes config on the hepdata-qa-es branch, to add the datacite workers and the updated ES host. Is that all we need for now?

I made some further changes and opened cern-sis/kubernetes#536. The part about automating the restoration of backups from prod to qa has not been addressed yet, but I moved it to a separate lower-priority issue #494.

GraemeWatt added type: enhancement Indicates new feature requests priority: medium complexity: medium labels Jul 5, 2021

GraemeWatt added priority: high and removed priority: medium labels Feb 18, 2022

GraemeWatt mentioned this issue Apr 13, 2022

operations: automate restoration of backups from prod to qa #494

Open

benjamin-bergia closed this as completed Apr 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operations: improve QA deployment #376

operations: improve QA deployment #376

GraemeWatt commented Jul 5, 2021 •

edited

GraemeWatt commented Nov 23, 2021

alisonrclarke commented Mar 3, 2022

GraemeWatt commented Apr 13, 2022

operations: improve QA deployment #376

operations: improve QA deployment #376

Comments

GraemeWatt commented Jul 5, 2021 • edited

GraemeWatt commented Nov 23, 2021

alisonrclarke commented Mar 3, 2022

GraemeWatt commented Apr 13, 2022

GraemeWatt commented Jul 5, 2021 •

edited