# CI/CD with GitHub Actions


## 1. Introduction

### 1.1 Overview
- Automate the IaC presented so far with Terraform 
- by building a complete CI/CD pipeline using GitHub Actions 
- The goal of our CI/CD pipeline is, for every new commit to the GitHub repository: 
    - automatically test a pull request (CI)
    - build and push container image to a registry (CD)
    - deploy the updated lambda service to production (CD)
    
![title](images/ci_cd.png)

### 1.2 About workflows

A workflow is a configurable automated process that will run one or more jobs. Workflows are defined by a YAML file checked in to your repository and will run when triggered by an event in your repository, or they can be triggered manually, or at a defined schedule.

Workflows are defined in the ```.github/workflows``` directory in a repository, and a repository can have multiple workflows, each of which can perform a different set of tasks. For example, you can have one workflow to build and test pull requests and another workflow to deploy your application every time a release is created.

GitHub Actions provides us with standard virtual machines to execute our jobs (such as tests) without setting up any extra infrastructure.

### 1.3 Summary CI/CD workflows

The **CI** workflow consists of two jobs and will be triggered when we create a pull request from a feature branch for a new commit to the repo. For example, given the image below, we can do a pull request from the branch ```feature/week6b-iac``` to the branch ```develop```:

![title](images/github-branches.png)

The two jobs are as follows:

1. Responsible to auto-test our inference service locally and on cloud
    - Unit tests
    - Integration tests
2. It will also run the Terraform plan on our specified Terraform state file to compile and validate any infrastructure changes
    - ```$ terraform init```
    - ```$ terraform plan```

The **CD** workflow consists of one job with a series of steps. It will be triggered only when your pull request is approved and merged to the main branch.
- Step 1: define infra using Terraform. This will be done automatically if ```$ terraform plan``` from CI detects any changes
    - ```$ terraform init```
    - ```$ terraform apply```
- Step 2: Build docker image and push it to ECR repo
- Step 3: once a new version of Lambda function is published, the Lambda config will be updated (infra is kept intact, just the Lambda application logic is updated). That's why Step 2 and 3 are kept separate. If Step 2 were integrated, it would change the infra every time the docker image is built.

## 2. CI workflow

*Note:*

- The stage environment is a pre-production environment that acts as an intermediary step between development and production.

- The production environment is the live and operational environment where the machine learning system serves real users or applications.

Let's create the folder ```.github/workflows``` in the project repo. Which this folder we will define two files:
- For CI: ```ci-tests.yaml```
- For CD: ```cd-deploy.yaml```

### 2.1 ```ci-tests.yaml```

Let's go through the CI workflow defined in ```ci-tests.yaml``` step by step:

1. The workflow is named "CI-Tests" and will be triggered on pull requests targeting the 'develop' branch and when changes are made to files under the ```06-best-practices/code/``` directory
    ```yaml
    name: CI-Tests
    on:
      pull_request:
        branches:
          - 'develop'
        paths:
          - '06-best-practices/code/**'
    ```

2. The workflow defines three environment variables: ```AWS_DEFAULT_REGION```, ```AWS_ACCESS_KEY_ID```, and ```AWS_SECRET_ACCESS_KEY```. These variables will be used to configure AWS credentials for later steps.
```yaml
    env:
      AWS_DEFAULT_REGION: 'eu-west-1'
      AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
      AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
```

3. The CI workflow defines two jobs: "test" and "tf-plan". Each job is an independent container and thus we will need to set up a new environment for each of them. Also, jobs run in parallel, so we need to make sure we don't have any interdependency. If we do, we need to explicitly define any interdependency.

4. Job "test":
    - It runs on an 'ubuntu-latest' runner, which means it will execute on a virtual machine (VM) with the latest version of Ubuntu.
    - The steps for this job include:
        - Clone the code repository to the VM using ```actions/checkout@v2```. It will also install docker
        - Setting up Python 3.9 in the VM using ```actions/setup-python@v2```.
        - Installing dependencies in VM using pipenv in the ```06-best-practices/code``` directory.
        - Running unit tests in VM with ```pipenv run pytest tests/``` in the ```06-best-practices/code``` directory.
        - Running linting in VM using ```pipenv run pylint --recursive=y .``` in the ```06-best-practices/code``` directory.
        - Configuring AWS credentials in VM using ```aws-actions/configure-aws-credentials@v1```.
        - Running an integration test in VM by executing the ```run.sh``` script in the ```06-best-practices/code/integraton-test``` directory. We have modified the original script to include ```GITHUB_ACTIONS``` to detect if the environment from which the script is being run is a GitHub CI/CD environment or a local environment

```yaml
    jobs:
      test:
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v2
          - name: Set up Python 3.9
            uses: actions/setup-python@v2
            with:
              python-version: 3.9

          - name: Install dependencies
            working-directory: "06-best-practices/code"
            run: pip install pipenv && pipenv install --dev

          - name: Run Unit tests
            working-directory: "06-best-practices/code"
            run: pipenv run pytest tests/

          - name: Lint
            working-directory: "06-best-practices/code"
            run: pipenv run pylint --recursive=y .

          - name: Configure AWS Credentials
            uses: aws-actions/configure-aws-credentials@v1
            with:
              aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
              aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
              aws-region: ${{ env.AWS_DEFAULT_REGION }}

          - name: Integration Test
            working-directory: '06-best-practices/code/integraton-test'
            run: |
              . run.sh
```

5. Job "tf-plan":

    - It runs on an 'ubuntu-latest' runner as well.
    - The steps for this job include:
        - Cloning the code repository using ```actions/checkout@v2```.
        - Configuring AWS credentials using ```aws-actions/configure-aws-credentials@v1```.
        - Setting up Terraform using ```hashicorp/setup-terraform@v2```.
        - Running ```terraform init``` and ```terraform plan``` commands in the ```06-best-practices/code/infrastructure``` directory wrt the production environment (when running our Terraform plan in the earlier notebook we passed the variable file wrt the staging environment). Additionally, we are overriding the backend key in ```code/infrastructure/main.tf``` (originally it was defined for stage environment).
      
```yaml
  tf-plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ env.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ env.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_DEFAULT_REGION }}

      - uses: hashicorp/setup-terraform@v2

      - name: TF plan
        id: plan
        working-directory: '06-best-practices/code/infrastructure'
        run: |
          terraform init -backend-config="key=mlops-zoomcamp-prod.tfstate" --reconfigure && terraform plan --var-file vars/prod.tfvars
```

Note: The steps involving AWS actions use the environment variables defined at the beginning of the workflow to configure AWS credentials. These environment variables are set using secrets in the GitHub repository (```Settings/Security/Secrets/Actions```), ensuring the credentials remain secure and not exposed in the workflow file.

![title](images/github-secrets.png)

### 2.2 Execute CI workflow

1. Go to the root of your repository
2. Make sure you are in a feature branch (not the main branch). In our case, we call it ```feature/week6b-iac```
    - Otherwise, you can ```$ git checkout -b feature/week6b-iac``` -> copy all of your commits to a new branch called ```feature/week6b-iac```
    - Also, you must also have a branch called ```develop```
3. Commit changes ```$ git commit -m "ci-tests"```
4. Push to the feature branch, ```$ git push origin feature/week6b-iac```

5. You can now do pull request ```feature/week6b-iac``` -> ```develop```

![title](images/github-pull-request1.png)

![title](images/github-pull-request2.png)

6. We can go to ```Actions``` and check the progress of the workflow (it will take a while to complete)

![title](images/github-ci1.png)

![title](images/github-ci2.png)

And... it's successful!

![title](images/github-ci3.png)

## 3. CD workflow

Once we have created the CI workflow, we move to our CD pipeline.

### 3.1 ```cd-deploy.yaml```

Let's go through the CD workflow defined in ```cd-deploy.yaml``` step by step:

1. The workflow is named "CD-Deploy" and is triggered when code changes are pushed to the 'develop' branch (we want this workflow to be triggered when the pull request from CI is merged to the 'develop' branch). Optionally you can add a path from where the commits are detected.
    ```yaml
    name: CD-Deploy
    on:
      push:
        branches:
          - 'develop'
    #    paths:
    #      - '06-best-practices/code/**'
    ```

2. The workflow defines a **single job** named "build-push-deploy" that runs on an 'ubuntu-latest' VM. We use only one job as we want the steps of this workflow to run sequentially (not in parallel) and have them depend on the other ones.
    ```yaml
    jobs:
      build-push-deploy:
        runs-on: ubuntu-latest
    ```

3. The steps for the "build-push-deploy" job are as follows:
    - *Step 1:* Clone the repository to VM using ```actions/checkout@v3```.
        ```yaml
        steps:
          - name: Check out repo
            uses: actions/checkout@v3        
        ```
    - *Step 2:* Configure AWS credentials using ```aws-actions/configure-aws-credentials@v1``` to allow access to AWS services in the subsequent steps.
```yaml
          - name: Configure AWS Credentials
            uses: aws-actions/configure-aws-credentials@v1
            with:
              aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
              aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
              aws-region: "eu-west-1"     
        ```
        
    - *Step 3:* Set up Terraform in VM using ```hashicorp/setup-terraform@v2```. The ```terraform_wrapper: false``` option indicates that the workflow is responsible for running Terraform commands directly.
```yaml
          - uses: hashicorp/setup-terraform@v2
            with:
              terraform_wrapper: false       
        ```
    - *Step 4:* Perform ```terraform plan``` in the  ```06-best-practices/code/infrastructure``` directory of the VM to identify changes to be made to the infrastructure. The plan output will be saved to the variable ```tf-plan```
```yaml
           - name: TF plan
             id: tf-plan
             working-directory: '06-best-practices/code/infrastructure'
             run: |
               terraform init -backend-config="key=mlops-zoomcamp-prod.tfstate" -reconfigure && terraform plan -var-file=vars/prod.tfvars
        ```

  
   - *Step 5:* If the ```terraform plan``` is successful, perform ```terraform apply``` in the same directory of the VM to apply the changes and create/update the infrastructure. The apply is conditional on the success of the previous plan step. The flag ```-auto-approve``` allows to skip the approve step, as we can't pass manually a 'yes' in a CI/CD workflow. Additionally, we are printing some output values of the resources we are creating via Terraform. For Terraform to be able to do this, we are adding the output variables in ```infrastructure/main.tf```.
```yaml
          - name: TF Apply
            id: tf-apply
            working-directory: '06-best-practices/code/infrastructure'
            if: ${{ steps.tf-plan.outcome }} == 'success'
            run: |
              terraform apply -auto-approve -var-file=vars/prod.tfvars
              echo "::set-output name=ecr_repo::$(terraform output ecr_repo | xargs)"
              echo "::set-output name=predictions_stream_name::$(terraform output predictions_stream_name | xargs)"
              echo "::set-output name=model_bucket::$(terraform output model_bucket | xargs)"
              echo "::set-output name=lambda_function::$(terraform output lambda_function | xargs)"
        ```
   
   - *Step 6:* After applying the Terraform changes, this step builds a Docker image from the code in the ```06-best-practices/code``` directory of the VM and pushes it to Amazon Elastic Container Registry (ECR). The image URI is saved to the output variable ```image_uri``` by the GitHub Actions special syntax ```echo ...```. The ```image_uri``` output variable can be then accessed in subsequent steps.
```yaml
          - name: Login to Amazon ECR
            id: login-ecr
            uses: aws-actions/amazon-ecr-login@v1

          - name: Build, tag, and push image to Amazon ECR
            id: build-image-step
            working-directory: "06-best-practices/code"
            env:
              ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
              ECR_REPOSITORY: ${{ steps.tf-apply.outputs.ecr_repo }}
              IMAGE_TAG: "latest"   # ${{ github.sha }}
            run: |
              docker build -t ${ECR_REGISTRY}/${ECR_REPOSITORY}:${IMAGE_TAG} .
              docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
              echo "::set-output name=image_uri::$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG"
```

   - *Step 7:* Retrieve model artifacts from a development S3 bucket and copy them to the production S3 bucket. The latest model version is identified and saved to the variable ```run_id```. The commands after ```run:``` reproduce the lines in ```deploy_manual.sh``` 
```yaml
          - name: Get model artifacts
          # The steps here are not suited for production.
          # In practice, retrieving the latest model version or RUN_ID from a service
          # like MLflow or DVC can also be integrated into a CI/CD pipeline.
          # But due to the limited scope of this workshop, we would be keeping 
          # things simple.
          # In practice, you would also have a separate training pipeline to write 
          # new model artifacts to your Model Bucket in Prod.

            id: get-model-artifacts
            working-directory: "06-best-practices/code"
            env:
              MODEL_BUCKET_DEV: "mlflow-models-alexey"
              MODEL_BUCKET_PROD: ${{ steps.tf-apply.outputs.model_bucket }}
            run: |
              export RUN_ID=$(aws s3api list-objects-v2 --bucket ${MODEL_BUCKET_DEV} \
              --query 'sort_by(Contents, &LastModified)[-1].Key' --output=text | cut -f2 -d/)
              aws s3 sync s3://${MODEL_BUCKET_DEV} s3://${MODEL_BUCKET_PROD}
              echo "::set-output name=run_id::${RUN_ID}"
```   

   - *Step 8:* Update the AWS Lambda function with the new model artifacts and other environment variables. The Lambda function is updated using the AWS CLI. 
       - The ```aws lambda get-function``` command is used to check the status of the Lambda function update. 
       - Once the update status is no longer "InProgress," the script proceeds to update the Lambda function's configuration using the ```aws lambda update-function-configuration``` command.
       
   Ideally, it's best to keep a sleep timer before the Lambda gets updated. In this case it's not really necessary, as we are creating the Terraform infra in the same workflow (sequentially). In a more realistic scenario they would be two different workflows and be executed in parallel, thus the sleep timer may be necessary to wait for the infra to be ready.
```yaml
          - name: Update Lambda
            env:
              LAMBDA_FUNCTION: ${{ steps.tf-apply.outputs.lambda_function }}
              PREDICTIONS_STREAM_NAME: ${{ steps.tf-apply.outputs.predictions_stream_name }}
              MODEL_BUCKET: ${{ steps.tf-apply.outputs.model_bucket }}
              RUN_ID: ${{ steps.get-model-artifacts.outputs.run_id }}
            run: |
              variables="{ \
                        PREDICTIONS_STREAM_NAME=$PREDICTIONS_STREAM_NAME, MODEL_BUCKET=$MODEL_BUCKET, RUN_ID=$RUN_ID \
                        }"

              STATE=$(aws lambda get-function --function-name $LAMBDA_FUNCTION --region "eu-west-1" --query 'Configuration.LastUpdateStatus' --output text)
                  while [[ "$STATE" == "InProgress" ]]
                  do
                      echo "sleep 5sec ...."
                      sleep 5s
                      STATE=$(aws lambda get-function --function-name $LAMBDA_FUNCTION --region "eu-west-1" --query 'Configuration.LastUpdateStatus' --output text)
                      echo $STATE
                  done

              aws lambda update-function-configuration --function-name $LAMBDA_FUNCTION \
                        --environment "Variables=${variables}"
```

### 3.2 Execute CD workflow

1. Go to the root of your repository
2. Make sure you are in the feature branch ```feature/week6b-iac```
3. Commit changes ```$ git commit -m "cd-deploy"```
4. Push to the feature branch, ```$ git push origin feature/week6b-iac```
5. Perform CI workflow (steps 5 & 6 from section 2.2)
6. Let's merge the pull-request to the ```develop``` branch

![title](images/cd-workflow.png)

7. This triggers our CD workflow now:

![title](images/cd-workflow2.png)

And in more detail:

![title](images/cd-workflow2_detail.png)

8. Now you have both the CI and CD workflows deployed!