# Container Basics - with CI/CD

This notebook runs on "conda_python3" kernel. 

In "container-basic" notebook, we used command line to create, build and run a few containers on the same instance where we run the notebook. 
In this notebook, we will use AWS CodeBuild to demonstrate how you use a CI/CD pipeline to build and run containers on an AWS managed service AWS Batch.

**Note**: After you create a service-role, most of the time, the role will take effect immmediately. However, sometimes it takes a few minutes to propagate. If you see an IAM error, wait a few minutes and try again. 

In [None]:
#You only need to do this once per kernel - used in analyzing fastq data. If you don't need to run the last step, you don't need this
!pip install bioinfokit 

In [None]:
import boto3
import botocore
import json
import time
import os
import project_path # path to helper methods
import importlib
from lib import workshop
from botocore.exceptions import ClientError
from IPython.display import display, clear_output

# create a bucket for the workshop to store output files. 
session = boto3.session.Session()

region_name = session.region_name
account_id = boto3.client('sts').get_caller_identity().get('Account')

proj_name = 'mysra-cicd-pipeline'
image_tag = 'mySRATools'

# we will use this bucket for some artifacts and the output of sratools. 
bucket = workshop.create_bucket(region_name, session, f"container-ws-{account_id}", False)
print(bucket)

# Automate the build process of SRA-Tools container

We will re-use the customized container from "Container Basic" notebook

We are going to use The NCBI SRA (Sequence Read Archive) SRA Tool (https://github.com/ncbi/sra-tools) fasterq-dump (https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump) to extract fastq from SRA-accessions.

The command takes a package name as an argument
```
$ fasterq-dump SRR000001
```

We will use the base Ubuntu image and install sra-tools from https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.10.0/sratoolkit.2.10.0-ubuntu64.tar.gz

The workflow of the program in the contianer: 
1. Upon start, container runs a script "sratest.sh".
3. sratest.sh will "prefetch" the data package, whose name is passed via an environment variable (PACKAGE_NAME). 
4. sratest.sh then run fasterq-dump on the data apackage
5. sratest.sh will then upload the result to S3://{bucket} - the output bucket will be passed passed via an environment variable (SRA_OUTPUT)

The output of the fasterq-dump will be stored in s3://{bucket}/data/sra-toolkit/fasterq/{{PACKAGE_NAME}

## Use AWS CodePipeline to automate the build process 

**AWS CodePipeline**

Unlike what we did in the "Container Basics" notebook, where we built and ran the container on the same instance that runs the notebook, we will use AWS CodePipeline to automate the CI/CD process.

The AWS CodePipeline consists of the following components

1. CodeCommit - code repo, which will contain a buildspec.yml, Dockerfile, and all files needed for the container.
2. CodeBuild - this will spawn an instance to run the docker build and push the image to Amazon ECR
3. CodeDeploy - we will not use CodeDeploy in this notebook

Each time we checkin code to CodeCommit, it will trigger the entire CodePipeline. When the pipeline finishes, we will have a new version of the docker image in the container registry (Amazon ECR)


In [None]:
PACKAGE_NAME='SRR000002'

# this is where the output will be stored
sra_prefix = 'data/sra-toolkit/fasterq'
sra_output = f"s3://{bucket}/{sra_prefix}"

print(sra_output)

### Step 1. Create the run script
This script will be the run command in the container. 
1. It will fetch the sra package by package name
2. run fasterq-dump on the package data 
3. copy the output to S3

In [None]:
sratest_content = """#!/bin/bash
set -x

prefetch $PACKAGE_NAME --output-directory /tmp
fasterq-dump $PACKAGE_NAME -e 18
aws s3 sync . $SRA_OUTPUT/$PACKAGE_NAME
"""

### Step 2. Create our own docker image file

Let's build our own image using a ubuntu base image. 

1. Install tzdata - this is a dependency of some of the other packages we need. Normally we do not need to install it specifically, however there is an issue with tzdata requireing an interaction to select timezone during the installation process, which would halt the docker built. so install it separately with -y. 
2. Install wget and awscli.
3. Download sratoolkit ubuntu binary and unzip into /opt
4. set the PATH to include sratoolkit/bin
5. USER nobody is needed to set the permission for sratookit configuration. 
6. use the same sratest.sh script 

Note: in this example, we will use the base ubuntu image from AWS public docker registry. As of Nov 2020, docker hub has set request limits on their public repos, you might get throttled if you use docker hub's base image. 

In [None]:
dockerfile_content="""FROM public.ecr.aws/ubuntu/ubuntu:latest

RUN apt-get update 
RUN DEBIAN_FRONTEND="noninteractive" apt-get -y install tzdata 
RUN apt-get install -y wget libxml-libxml-perl awscli
RUN wget -q https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.10.0/sratoolkit.2.10.0-ubuntu64.tar.gz -O /tmp/sratoolkit.tar.gz \
        && tar zxf /tmp/sratoolkit.tar.gz -C /opt/ && rm /tmp/sratoolkit.tar.gz
ENV PATH="/opt/sratoolkit.2.10.0-ubuntu64/bin/:${PATH}"
ADD sratest.sh /usr/local/bin/sratest.sh
ADD filelist.txt /tmp/filelist.txt
RUN chmod +x /usr/local/bin/sratest.sh
WORKDIR /tmp
USER nobody
ENTRYPOINT ["/usr/local/bin/sratest.sh"]
"""

### Step 3. Create the build spec file
We will be using the AWS CodeBuild to build the docker image and push it to AWS ECR (private docker image registry)

In [None]:

buildspec_content ="""version: 0.2

phases:
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...          
      - docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG -f Dockerfile.cicd . 
      - docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG      
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker image...
      - docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG
"""

#place holder for later use , add in container so we don't have to change the docker file
file_list_content = """
"""

### Step 4. Create an ECR repo
This repo is used in Step 6 as part of the build step. After the container image is built, we will push it to the repo. 

In [None]:
ecr_client = boto3.client('ecr')
try:
    resp = ecr_client.create_repository(repositoryName=proj_name)
except ClientError as e:
    if e.response['Error']['Code'] == 'RepositoryAlreadyExistsException':
        print(f"ECR Repo {proj_name} already exists, skip")

### Step 5. Create an AWS CodeCommit repo and checkin the files


In [None]:
# roleArn:
iam_client = session.client('iam')

codepipeline_service_role_name = f"{proj_name}-codepipeline-service-role"
codepipeline_policies = ['arn:aws:iam::aws:policy/AWSCodePipelineFullAccess', 
                         'arn:aws:iam::aws:policy/AWSCodeCommitFullAccess',
                         'arn:aws:iam::aws:policy/CloudWatchFullAccess',
                         'arn:aws:iam::aws:policy/AmazonS3FullAccess',
                         'arn:aws:iam::aws:policy/AWSCodeBuildAdminAccess'
                        ]
codepipeline_role_arn = workshop.create_service_role_with_policies(codepipeline_service_role_name, 'codepipeline.amazonaws.com', codepipeline_policies )
print(codepipeline_role_arn)
              
codebuild_service_role_name = f"{proj_name}-codebuild-service-role"
codebuild_policies = ['arn:aws:iam::aws:policy/AWSCodeBuildAdminAccess',
                      'arn:aws:iam::aws:policy/CloudWatchFullAccess',
                      'arn:aws:iam::aws:policy/AmazonS3FullAccess',
                      'arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess']
codebuild_role_arn = workshop.create_service_role_with_policies(codebuild_service_role_name, 'codebuild.amazonaws.com', codebuild_policies )
print(codebuild_role_arn)


In [None]:
# prepare the files for the checkin
put_files=[{
               'filePath': 'Dockerfile.cicd',
               'fileContent': dockerfile_content
            },
            {
               'filePath': 'sratest.sh',
               'fileContent': sratest_content
            },
            {
               'filePath': 'filelist.txt',
               'fileContent': file_list_content
            },
            {
               'filePath': 'buildspec.yml',
               'fileContent': buildspec_content
            }]

    
codecommit_client = boto3.client('codecommit')
try:
    resp = codecommit_client.create_repository(repositoryName=proj_name)
except ClientError as e:
    if e.response['Error']['Code'] == 'RepositoryNameExistsException':
        print(f"Repo {proj_name} exists, use that one")

try:
    resp = codecommit_client.get_branch(repositoryName=proj_name, branchName='main')
    parent_commit_id = resp['branch']['commitId']
except ClientError as e:
    if e.response['Error']['Code'] == 'BranchDoesNotExistException':
        # the repo is new, create it 
        workshop.commit_files(proj_name, "main", put_files,  None)
else:
    try:
        resp = workshop.commit_files(proj_name, "main",put_files, parent_commit_id)
    except ClientError as ee:
        if ee.response['Error']['Code'] == 'NoChangeException':
            print('No change detected. skip commit')


### Step 6. Create a CodeBuild project

The second step in the CI/CD pipeline is the build process. We use an instance managed by AWS to build the container. The CodeBuild process is triggered by the CodeCommit code checkins. 


**Pause for a minute**: codebuild-service-role takes a little longer to propagate. If you see a permission error, please try again




In [None]:
codebuild_client = boto3.client('codebuild')
codebuild_name = f"Build-{proj_name}" 
codecommit_name = f"Source-{proj_name}"
try: 
    resp = codebuild_client.create_project(name=codebuild_name, 
                                       description="CICD workshop build demo",
                                       source= {
                                           'type': "CODEPIPELINE"
                                       },
                                       artifacts= {
                                            "type": "CODEPIPELINE",
                                            "name": proj_name
                                       },
                                       environment= {
                                            "type": "LINUX_CONTAINER",
                                            "image": "aws/codebuild/amazonlinux2-x86_64-standard:3.0",
                                            "computeType": "BUILD_GENERAL1_SMALL",
                                            "environmentVariables": [
                                                {
                                                    "name": "AWS_DEFULT_REGION",
                                                    "value": region_name,
                                                    "type": "PLAINTEXT"
                                                },
                                                {
                                                    "name": "AWS_ACCOUNT_ID",
                                                    "value": account_id,
                                                    "type": "PLAINTEXT"
                                                },
                                                {
                                                    "name": "IMAGE_REPO_NAME",
                                                    "value": proj_name,
                                                    "type": "PLAINTEXT"
                                                },
                                                {
                                                    "name": "IMAGE_TAG",
                                                    "value": image_tag,
                                                    "type": "PLAINTEXT"
                                                }
                                            ],
                                            "privilegedMode": True,
                                            "imagePullCredentialsType": "CODEBUILD"               
                                       },
                                       logsConfig= {
                                                "cloudWatchLogs": {
                                                    "status": "ENABLED",
                                                    "groupName": proj_name
                                                },
                                                "s3Logs": {
                                                    "status": "DISABLED"
                                                }
                                        },
                                        serviceRole= codebuild_role_arn
                                      )
except ClientError as e:
    if e.response['Error']['Code'] == 'ResourceAlreadyExistsException':
        print(f"CodeBuild project {proj_name} exists, skip...")
    else:
        raise e


print(f"CodeBuild project name {codebuild_name}")

### Step 7. Build the AWS CodePipeline 

Now put what we built in Step 5, Step 6 together

In [None]:
codepipeline_client = boto3.client('codepipeline')

stage1 = {
    "name":f"{codecommit_name}",
    "actions": [
        {
            "name": "Source",
            "actionTypeId": {
                "category": "Source",
                "owner": "AWS",
                "provider": "CodeCommit",
                "version": "1"
            },
            "runOrder": 1,
            "configuration": {
                "BranchName": "main",
                "OutputArtifactFormat": "CODE_ZIP",
                "PollForSourceChanges": "true",
                "RepositoryName": proj_name
            },
            "outputArtifacts": [
                {
                    "name": "SourceArtifact"
                }
            ],
            "inputArtifacts": [],
            "region": "us-east-1",
            "namespace": "SourceVariables"
        }
    ]
}

stage2 = {
   "name": f"{codebuild_name}",
    "actions": [
        {
            "name": "Build",
            "actionTypeId": {
                "category": "Build",
                "owner": "AWS",
                "provider": "CodeBuild",
                "version": "1"
            },
            "runOrder": 1,
            "configuration": {
                "ProjectName": codebuild_name
            },
            "outputArtifacts": [
                {
                    "name": "BuildArtifact"
                }
            ],
            "inputArtifacts": [
                {
                    "name": "SourceArtifact"
                }
            ],
            "region": region_name,
            "namespace": "BuildVariables"
        }
    ]    
}


stages = [ stage1, stage2]


pipeline = {
    'name': proj_name,
    'roleArn': codepipeline_role_arn,
    'artifactStore': {
        'type': 'S3',
        'location': bucket
    }, 
    'stages': stages
}

try:
    resp = codepipeline_client.create_pipeline( pipeline= pipeline)
except ClientError as e:
    if e.response['Error']['Code'] == 'PipelineNameInUseException':
       print(f"Codepipeline {proj_name} already exists " )
    

### Step 8. Check the container image in the repo

The initial CodePipline process will take a few minutes. It will pull assets from CodeCommit, build the docker image on a managed instance, and push the result image into ECR. 

In [None]:
# We should see a container image with the image tag "mySRATools" - this is defined as an environment variable in CodeBuild
#resp = ecr_client.list_images(repositoryName=proj_name)
while True:
    resp = ecr_client.describe_images(repositoryName=proj_name)
    if resp['imageDetails']:
        for image in resp['imageDetails']:
            print("image tags: " + image['imageTags'][0])
            print("image pushed at: " + str(image['imagePushedAt']))
        break
    else:
        clear_output(wait=True)
        display("Build not done yet, please wait and retry this step. Please do not proceed until you see the 'image pushed' message")
        time.sleep(20)
# this is used later in job_definition for AWS Batch
image_uri= f"{account_id}.dkr.ecr.{region_name}.amazonaws.com/{proj_name}:{image_tag}"
print(image_uri)


# Run the container on AWS

Now we have our container image built using the CodePipeline and registerted in the container registry (ECR), we will next run the container on AWS managed services, including AWS Batch and Amazon EKS.

AWS Batch enable you to run large scale batch computing jobs without the need to install and manage softwares or clusters. 

Amazon Elastic Kubernetes Servcice (EKS) make it easy for you to run Kunernetes applications. It provides highly-available and secure clusters and automates key tasks such as patching, node provisioning, and updates. 

In the following two sections, we will run our sratool container as jobs on both Amazon EKS and AWS Batch. 

## Run the container in Amazon EKS

We will use **eksctl** CLI and kubernetes tool **kubctl** to to create an EKS cluster and interact with the cluster

### Step 1. Install eksctl, kubectl and aws CLIs (Command line tools) 

In [None]:
!curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
!sudo mv -v /tmp/eksctl /usr/local/bin
!eksctl version

!sudo curl --silent --location -o /usr/local/bin/kubectl https://amazon-eks.s3.us-west-2.amazonaws.com/1.19.6/2021-01-05/bin/linux/amd64/kubectl
!sudo chmod +x /usr/local/bin/kubectl

!sudo pip install --upgrade awscli && hash -r




### Step 2. Create the EKS cluster configuration file
We will use very simple default settings for the cluster and add a node group with 2 instances. 

In [None]:
from IPython.core.magic import register_line_cell_magic

@register_line_cell_magic
def writetemplate(line, cell):
    with open(line, 'w+') as f:
        f.write(cell.format(**globals()))

In [None]:
%%writetemplate eksworkshop.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: eksworkshop-eksctl
  region: us-east-1
  version: "1.19"

availabilityZones: ["us-east-1a", "us-east-1b"]

managedNodeGroups:
- name: nodegroup
  desiredCapacity: 2
  instanceType: t3.small


### Step 3. Create the EKS cluster

*Note*: this step can take up to 15 minutes

In [None]:
!eksctl create cluster -f eksworkshop.yaml

### Step 4. Add S3 write permission to the node execution role. 

With the default setting, eks nodes will have an execution role with the following permissions: 
1. AmazonEKSWorkerNodePolicy - AWS managed policy
1.  AmazonEC2ContainerRegistryReadOnly- AWS managed policy
1.  AmazonEKS_CNI_Policy - AWS managed policy

Our container needs permission to write to ${sra_output bucket}, so we need to add S3 write permission to the bucket. 

In [None]:
# check the node status
!kubectl get nodes # if we see our 2 nodes, we know we have authenticated correctly

# EKS clusters are deployed using CloudFormation stacks in the backend. 
# get the node group stack name and use that to get the nodegroup instance role name
STACK_NAME = !eksctl get nodegroup --cluster eksworkshop-eksctl -o json | jq -r '.[].StackName'
STACK_NAME=STACK_NAME[0]

ROLE_NAME= !aws cloudformation describe-stack-resources --stack-name $STACK_NAME | jq -r '.StackResources[] | select(.ResourceType=="AWS::IAM::Role") | .PhysicalResourceId'
ROLE_NAME=ROLE_NAME[0]

resp = iam_client.put_role_policy(RoleName=ROLE_NAME, 
                                  PolicyName='S3AccessPolicy', 
                                  PolicyDocument='{"Version": "2012-10-17","Statement": [{"Sid": "wsbucket","Effect": "Allow","Action": ["s3:PutObject","s3:GetObject","s3:ListBucket"],"Resource": ["arn:aws:s3:::'+bucket+'/*","arn:aws:s3:::'+bucket+'"]}]}')
                                  
print(resp)

# Add s3:PutObject to $sra_output bucket so container can upload result to that bucket
sra_eks_output = f"""s3://{bucket}/data/sra-toolkit-eks/fasterq"""


### Step 5. Create job and deploy it to the eks cluster

In [None]:
%%writetemplate sratool-deploy.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: my-pod
  namespace: default
  labels:
    app: my-sratool
spec:
  template:
    metadata:
      labels:
        app: my-sratool
    spec:
      containers:
      - name: sratool
        image: {image_uri}
        env:
        - name: PACKAGE_NAME
          value: {PACKAGE_NAME}
        - name: SRA_OUTPUT
          value: {sra_eks_output}
      restartPolicy: Never

In [None]:

!aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 716665088992.dkr.ecr.us-east-1.amazonaws.com

#in case you are running this multiple times, we will delete the job first. 
#If it's the first time you are running the block, you will see an  "Error from server (NotFound): ". you can ignore this 
!kubectl delete -f sratool-deploy.yaml 

!kubectl apply -f sratool-deploy.yaml    

### Step 6. Monitor the job execution.

In the previous step, we ran the container as a job on the eks cluster. Kubectl provides tools for you to monitor the job status. 


In [None]:
!kubectl describe jobs

In [None]:
!kubectl logs job/my-pod

## Run the container in AWS Batch


If you want a deeper dive on AWS Batch, please refer to notebooks/hpc/batch-fastqc.ipynb, in this workshop repo. 

In this notebook, we will create an AWS Batch environemnt and run the container job using the image we created. 

### Step 1. Create a compute environment
To create a Compute Environment, you need the following
1. EC2 instance role
2. EC2 instance profile
3. Batch service role
4. VPC and subnets 
5. Security group
6. Compute resource definition

**Note**: This step will take up to 10 minutes. 

In [None]:

try: 
    ce_response = workshop.create_simple_compute_environment(proj_name)
    ce_name = ce_response['computeEnvironmentName']
except ClientError as ee:
    if ee.response['Error']['Message'] == 'Object already exists':
        print('Compute environemnt already exists. skip')



### Step 2. Create a job queue
To create a job queue, you need the compute environment



In [None]:
queue_name, queue_arn  = workshop.create_job_queue(ce_name, 1)


### Step 3. Create job definition



In [None]:
importlib.reload(workshop)

batch_task_role_name = f"batch_task_role_{proj_name}"
batch_task_policies = ["arn:aws:iam::aws:policy/AmazonS3FullAccess"]
taskRole = workshop.create_service_role_with_policies(batch_task_role_name, "ecs-tasks.amazonaws.com", batch_task_policies)
print(taskRole)


jd = workshop.create_job_definition(proj_name, image_uri, taskRole )
jd_name = jd['jobDefinitionName']
print(jd_name)

### Step 4. We are now ready to submit the job

In [None]:
PACKAGE_NAME='SRR000004'

batch_client = boto3.client('batch')
response = batch_client.submit_job(
    jobName=f"job-{proj_name}",
    jobQueue=queue_name,
    jobDefinition=jd_name,
    containerOverrides={
        'environment': [
            {
                'name': 'PACKAGE_NAME',
                'value': PACKAGE_NAME
            },
            {
                'name': 'SRA_OUTPUT',
                'value': sra_output
            }            
        ]
    })

job_id = response['jobId']
print(f"Job submitted: {job_id}")

# Check the output

Go back to AWS Batch console, and check the job queue. Or you can run the following code. If you do not see the SUCCESS status, please retry after a few seconds.



In [None]:
#jobs = batch_client.list_jobs(jobQueue=queue_name)
jobs = batch_client.describe_jobs(jobs=[job_id])

for j in jobs['jobs']:
    print(f"JobName: {j['jobName']} Status: {j['status']}")

### Please wait till the previous step returns a SUCCESS status

In [None]:
# checkou the outfiles on S3
s3_client = session.client('s3')
objs = s3_client.list_objects(Bucket=bucket, Prefix=sra_prefix)
for obj in objs['Contents']:
    fn = obj['Key']
    p = os.path.dirname(fn)
    if not os.path.exists(p):
        os.makedirs(p)
    s3_client.download_file(bucket, fn , fn)



In [None]:

# you can use interactive python interpreter, jupyter notebook, google colab, spyder or python code
# I am using interactive python interpreter (Python 3.8.2)
from bioinfokit.analys import fastq
fastq_iter = fastq.fastq_reader(file=f"{sra_prefix}/{PACKAGE_NAME}/{PACKAGE_NAME}.fastq") 
# read fastq file and print out the first 10, 
i = 0
for record in fastq_iter:
    # get sequence headers, sequence, and quality values
    header_1, sequence, header_2, qual = record
    # get sequence length
    sequence_len = len(sequence)
    # count A bases
    a_base = sequence.count('A')
    if i < 10:
        print(sequence, qual, a_base, sequence_len)
    i +=1

print(f"Total number of records for package {PACKAGE_NAME} : {i}")

# Let's process multiple files at the same time 

Now we learned how to run a single job in batch. Now we can try to run multiple jobs with the same container with different input parameters. 

You can also use AWS Batch Array Jobs (https://docs.aws.amazon.com/batch/latest/userguide/array_index_example.html) to process multiple jobs at the same time. $AWS_BATCH_JOB_ARRAY_INDEX will be passed to each instance of the container. You can use that to identify input or different paramenters for your job. We will use the array index to identify the package_name. 

Since our previous container was designed to run a single job with PACKAGE_NAME passed down as environment variable, we will modify the Dockerfile a little to take an array of package names. We will take this opportunity to show you how the container CI/CD pipeline can help us with the automation. Once you submit the file changes to CodeCommit, it will trigger the code pipeline to kick off the build and the image will be updated. 

This approach uses a file that contains the list of all files that needs to be processed. This requires the rebuild of the container image. You can also pass the list of all the package names in a list as an environment variable or parameter to the container. But that only works if the list is relatively small.  


In [None]:
package_list=['SRR000011', 'SRR000012', 'SRR000013', 'SRR000014', 'SRR000016']

file_list_content = ''
for p in package_list:
    file_list_content +=p+'\n'
    
sratest_content = """#!/bin/bash

set -x
LINE=$((AWS_BATCH_JOB_ARRAY_INDEX+1))
FILE_NAME=$(sed -n ${LINE}p /tmp/filelist.txt)
prefetch $FILE_NAME --output-directory /tmp
fasterq-dump $FILE_NAME -e 18
aws s3 sync . $SRA_OUTPUT/$FILE_NAME
"""

put_files=[{
               'filePath': 'filelist.txt',
               'fileContent': file_list_content
            },
            {
               'filePath': 'sratest.sh',
               'fileContent': sratest_content
            }]

resp = codecommit_client.get_branch(repositoryName=proj_name, branchName='main')
parent_commit_id = resp['branch']['commitId']
try:
    resp = workshop.commit_files(proj_name, "main",put_files, parent_commit_id)
except ClientError as ee:
    if ee.response['Error']['Code'] == 'NoChangeException':
        print('No change detected. skip commit')




**Note** After changing the two files, the CodePipeline will kickoff, please go to the CodePipeline console to check the build status. Please go to the [codepipeline console](https://console.aws.amazon.com/codesuite/codepipeline/pipelines/mysra-cicd-pipeline/view?region=us-east-1) to check the status of the pipeline. You need to wait till the build complete to submit the array job below

**Note**: Please wait till the build complete to submit the array job below

In [None]:
response = batch_client.submit_job(
    jobName=f"job-{proj_name}",
    jobQueue=queue_name,
    arrayProperties={
        'size': len(package_list)
    },
    jobDefinition=jd_name,
    containerOverrides={
        'environment': [
            {
                'name': 'SRA_OUTPUT',
                'value': sra_output
            }            
        ]
    })

job_id = response['jobId']
print(f"Job submitted: {job_id}")

In [None]:
jobs = batch_client.describe_jobs(jobs=[job_id])

for j in jobs['jobs']:
    print(f"JobName: {j['jobName']} Status: {j['status']}")

# Don't forget to clean up 

In [None]:

workshop.delete_job_queue(queue_name)
workshop.delete_job_definition(jd_name)
workshop.delete_simple_compute_environment(proj_name)
workshop.delete_service_role_with_policies(batch_task_role_name,  batch_task_policies)


In [None]:

# delete the pipeline
codepipeline_client.delete_pipeline(name=proj_name)
codebuild_client.delete_project(name=codebuild_name)
codecommit_client.delete_repository(repositoryName=proj_name)
workshop.delete_codecommit_repo(proj_name)
workshop.delete_ecr_repo(proj_name)
workshop.delete_service_role_with_policies(codepipeline_service_role_name, codepipeline_policies )
workshop.delete_service_role_with_policies(codebuild_service_role_name, codebuild_policies )



In [None]:
workshop.delete_bucket_with_version(bucket)

In [None]:
!kubectl delete -f sratool-deploy.yaml 

In [None]:
try:
    resp = iam_client.delete_role_policy(RoleName=ROLE_NAME, PolicyName='S3AccessPolicy')
except:
    print("Policy might have been deleted alareay. Ignore")
!eksctl delete cluster -f eksworkshop.yaml