Skip to content

Latest commit

 

History

History
204 lines (152 loc) · 13.4 KB

README_Deployment.md

File metadata and controls

204 lines (152 loc) · 13.4 KB

Deployment

This document covers deploying the resources required to run Atlas, but not necessarily fine tuning the configuration.


As much as possible of deployment of the ATLAS system has been scripted, via a combination of Terraform (using the Azure Resource Manager provider), and Azure Devops .yml scripts. Atlas is supported in an Azure environment, built and deployed using Azure Devops - to change either would require some custom changes to the codebase.

What code to deploy?

See Versioning ADR for details on what constitutes a "stable" copy of the Atlas codebase.

Manual Steps

The following are the steps that are required to be taken manually when deploying ATLAS to a new environment.

Azure Configuration

  • An Azure subscription must exist into which the Atlas system will be deployed.
  • An Azure storage account must be available for Terraform to use as a backend.
  • An App Registration should be created within Azure Active Directory, to be used by Terraform for authentication.
  • A second App Registration should be created within Azure Active Directory, to be used by the code itself for managing azure resources at runtime.
    • e.g. The matching algorithm Data Refresh needs to be able to scale azure databases at runtime.
    • It also allows Terraform to fetch function keys.
    • This registration must have the role of Contributor and App Configuration Data Owner on the Atlas resource group (this can be set at the Azure subscription level before the resources are deployed).
    • App registration info must be provided via the following terraform release vars:
      • AZURE_CLIENT_ID
      • AZURE_CLIENT_SECRET.
  • An Azure Active Directory (AD) group should be created to control admin access to the Atlas SQL Server (one AD group per release environment).
    • AD group info must be provided via the following terraform release vars:
      • DATABASE_SERVER_AZUREAD_ADMINISTRATOR_LOGIN_USERNAME - Name of the AD group used to control admin access to the Atlas SQL server, e.g., DEV SQL Server Administrators
      • DATABASE_SERVER_AZUREAD_ADMINISTRATOR_OBJECTID - Object ID of AD group
      • DATABASE_SERVER_AZUREAD_ADMINISTRATOR_TENANTID - ID of Tenant where AD group resides

Azure Devops Configuration

1. Create Variable Group

A variable group named "Terraform" should be created, with the following variables.

Azure Authentication

These variables are used by Terraform for authentication so it can manage resources. They should be retrieved from the app registration created for Terraform:

  • ARM_CLIENT_ID
  • ARM_CLIENT_SECRET
  • ARM_TENANT_ID
Terraform Backend

These variables are used to access the storage container that hosts the Terraform backend. Note, these are currently duplicated as release variables; build and release must use the same values to ensure that both validate and apply use the same backend!

  • ARM_ACCESS_KEY
    • Access key of the storage account
  • BACKEND_RESOURCE_GROUP_NAME
    • Resource group to which storage account belongs
  • BACKEND_STORAGE_ACCOUNT_NAME
    • Storage account name
  • BACKEND_STORAGE_CONTAINER_NAME
    • Name of storage container, e.g., terraform-state

2. Remaining Steps

  • New Devops build pipelines should be created, using the checked in <pipeline>.yml files.
  • An Azure service connection should be set up to the target Azure subscription, scoped to the new resource group for this Atlas installation
    • Due to the restriction by resource group, terraform must be run before this can be set up - either via a partial release or manually
  • A Devops release should be manually created.
    • The following steps should be defined:
      • Apply terraform
      • Run database migrations
        • Terraform will need to be run for the first time before these can be run, to set up the database server + databases.
        • The server connection details will need to be set manually once terraform has been run.
        • For login details, the server admin details required as release variables can be used.
      • Release azure function apps
    • Release variables should be set up for each target environment. Expected variables are defined in variables.tf. Those without default values are required.
      • A default host key will be automatically generated by azure, and is used via a X-Functions-Key header to authenticate HTTP triggered functions. New keys can also be generated. The key specified in the app settings will be exported as a Terraform output variable - consumers of this output via remote states will need to know how to use this api
        • IMPORTANT - on the first release of Atlas, the release var, WEBSITE_RUN_FROM_PACKAGE, must be set to 0 (default value is 1) to permit the generation of function keys (see this link for more background). It is safe to use the default value of 1 on subsequent releases.

Function App IP Whitelisting

Via the terraform variable IP_RESTRICTION_SETTINGS, connection to ATLAS functions apps can be restricted by IP address.

This has a significant drawback that should be carefully considered before using this feature! The set-up of any webhooks during the terraform/webhooks script means that the build agent on Azure Devops itself will need access to the functions app, meaning that it must be added to the whitelist.

(Note that this webhooks job has since been removed. This warning is left for posterity, in case any new webhooks are re-introduced in future, as a warning.)

Azure Devops agent IPs are not static!

The following page has instructions on how to find the relevant IP addresses for your region.

https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml#agent-ip-ranges

Note that they change weekly, so to enable IP whitelisting of functions would require a manual weekly update of the IP_RESTRICTION_SETTINGS variable, to ensure Azure Devops remains whitelisted.

Terraform Configuration

  • You will also need to register any relevant resource providers. See notes in .\terraform\main.tf.
  • All Atlas infrastructure is controlled via terraform scripts. If any specific naming or configuration changes are required for your installation, such changes should be made to the terraform scripts in a fork of the repository - changing them manually in Azure will lead to the changes being reverted on the next deployment to that environment.

Manual Azure Configuration (Post-terraform)

Once terraform has created ATLAS resources for the first time, certain actions must be performed manually on these resources, as they are either not available or not recommended as part of the terraform scripting.

  • Elastic Service Plan - minimum instance count. (optional)
    • TODO: ATLAS-667: Remove this once terraformed
    • This is currently not supported by terraform, but to allow a pre-warmed instance count, we must have a minimum instance count.
    • Manually set the minimum instance count on all elastic app service plans to be at least as high as the pre-warmed instance count
  • Azure SQL Permissions
    • Service Accounts

      • Each service (e.g. matching) within ATLAS should have a service account created on the appropriate databases. The username and password for such accounts should then be set as a variable in the release pipeline.

      • Passwords should be created by a Password Generator, such as https://passwordsgenerator.net/.Sensible generation settings might be:

        • 16+ characters
        • Upper, Lower, Numbers.
        • Special characters should not be used.
        • Exclude ambiguous letters.
        • Exclude ambiguous Symbols (if using).
      • By default, db_datareader and db_datawriter will be necessary for a given component to access its corresponding database(s)

      • Note that the user for the matching component to access the transient matching databases (a and b) will need to be granted db_owner permission, as a truncate table command is used in the full data refresh, which requires elevated permissions

      • To ensure this happens add a Powershell task to your azure release pipeline. This should run /terraform-atlas-core/scripts/migrate_users.ps1, passing in the relevant variables as environment variables.

      • This script will add appropriate roles to all accounts as listed in the table below. Note that it will not remove roles if they later should be revoked, so this should be done manually.

      Access Requirements:

      Database User Permissions
      Matching - Transient Matching db-owner/db-writer/db-reader
      Matching - Persistent Matching db-owner/db-writer/db-reader
      Match Prediction Match Prediction db-writer/db-reader
      Donor Import Donor Import db-writer/db-reader
      Donor Import Matching db-writer/db-reader
    • Active Directory (Optional) - If you would like to be able to access the database server using Active Directory authentication, this should be manually configured

    • IP Whitelisting (Optional)

      • By default, only other azure services will be allowed to access the database server through the firewall. For development access, any known IP addresses should be manually added to the IP whitelist in Azure.
    • Serverless database auto-pause

      • The default configuration will use the Standard SKU pricing model for Atlas databases. In some cases, when very varied load is expected, a "serverless" database tier may be appropriate, which auto-scales with load. In such cases, terraform will automatically create a database with an "auto-pause" time of one hour - i.e. after one hour of inactivity, the database will shut off, saving on provisioned CPU cost. This comes with the trade-off that cold starts become much slower, and the first few requests to the database will fail as it "wakes up". To increase the auto-pause delay, or to disable it entirely, is not currently an available feature of terraform, so must be done manually. Any manual configuration of this setting will not be overridden by later terraform releases.

System Tests

Resources

The system tests require some Azure resources of their own: SQL databases, Azure Storage, and App Configuration. These resources are controlled by terraform scripts, which will be run by the test pipeline (see "test-pipeline.yml").

DevOps

Create a new DevOps variable group named "TestTerraform", with the following variables:

  • TerraformWorkspace - e.g., atlas-system-tests
  • TF_VAR_AZURE_SUBSCRIPTION_ID - ID of the Azure subscription into which the test resources will be deployed.
  • SQL server info:
    • TF_VAR_DATABASE_SERVER_ADMIN_LOGIN_PASSWORD
    • TF_VAR_DATABASE_SERVER_AZUREAD_ADMINISTRATOR_LOGIN_USERNAME - Name of the AD group used to control admin access to the system-tests SQL server
    • TF_VAR_DATABASE_SERVER_AZUREAD_ADMINISTRATOR_OBJECTID - Object ID of AD group
    • TF_VAR_DATABASE_SERVER_AZUREAD_ADMINISTRATOR_TENANTID - ID of Tenant where AD group resides

Azure Configuration

The second App Registration created during the previous Azure Configuration step must have the role of Contributor and App Configuration Data Owner on the ATLAS-SYSTEM-TEST-RESOURCE-GROUP to allow terraform to create and access the required resources. This can be set at the Azure subscription level.

One-time Set-up

Before the full suite of system tests can be run, some one-time set up must be performed.

HLA Metadata Dictionary

The HLA Metadata dictionary must be refreshed to version 3330 for the matching validation test suite to run.

This can be achieved by configuring an ATLAS installation to use the test storage account (locally is recommended, but you can use a deployed environment if local running is not an option), then triggering the refresh job.

This is triggered via an HTTP endpoint in the Matching Algorithm Functions App.

The HLA Metadata Dictionary needs to be set to version 3330, as that was when the validation tests were written. So send a POST request to http://localhost:7071/api/RefreshHlaMetadataDictionaryToSpecificVersion with the body:

{
	"Version": "3330"
}

MAC Dictionary

A full import of the latest MACs should be performed (this can be a one-off, as tests should not ues as-yet unpublished MAC values)

This can be achieved by configuring an ATLAS installation to use the test storage account (locally is recommended, but you can use a deployed environment if local running is not an option), then triggering the refresh job.

This is triggered via an HTTP endpoint in the Atlas Functions App.

Releasing to multiple environments

The expected use case for multiple environments is a development -> uat -> live route. This section details how to set up for such a case.

  • (Optional) You may prefer each environment to exist on a different Azure subscription. If so, create this new subscription and use this subscription ID when running terraform
  • A new terraform workspace should be created for each environment
  • New release stages can be created within one Azure Devops release pipeline. This can be useful for ensuring the same build artifacts that were tested in a test environment are deployed to live
  • New service connections will need to be set up in Azure Devops for each resource group