Skip to content

4. Using Dockerised Knetminer with Amazon AWS and ElasticBeanStalk

Marco Brandizi edited this page Mar 27, 2023 · 5 revisions

**** WARNING! Needs updates! ****

This section needs re-testing and review.

Using pre-defined scripts

See here

Currently, KnetMiner logs information regarding what searches are made (including qtl regions and gene IDs) in addition to what networks are opened.

If this information is of use to you and you wish to log it and have an S3 bucket to use with AWS, then you can simply use the docker-run-aws.sh script instead.

More details are below.

Prerequisites for pre-defined scripts.

First, create a folder containing the analytics-s3-sync.sh script and a secret AWS folder containing your credentials, structured as follows:

├── .aws
│   ├── credentials
├── analytics-s3-sync.sh

N.b. the folder can be named anything, you MUST have a ./aws file inside it with the credentials and an analytics-s3-sync.sh script

The credential file will contain your AWS access code and AWS secret access code, which will look like the following:

[default]
aws_access_key_id = AKIA-REPLACE-ACCESS-KEYXE2BI2Z
aws_secret_access_key = uuUMg/tRIes-REPLACE-SECRET-ACCESS-KEY/pg5XYG

N.b. replace the codes with your relevant access & secret access code.

You can change the default directory in the analytics-s3-sync.sh script by modifying the s3 bucket directory in an editor, which by default is given as s3://knetminer-logs-test/logs.

A simple awk command can also be used to do this, i.e. awk 'NR==3{$5="$NEW_S3_DIR"}1' analytics-s3-sync.sh > tmp && tmp analytics-s3-synch.sh (replace the $NEW_S3_DIR line with your S3 bucket directory in this case).

As done with the docker-run.sh script, you'll simply pass all of your other docker options, as normal. You MUST use the --aws flag first. For example, if you wanted to only deploy the aratiny dataset, do the following ( in the knetminer/common/quickstart directory) :

./docker-run-aws.sh  \ 
    --aws '$HOME/AWS' \
    --container-name 'aratiny-docker-aws' \ 
    --container-memory 2G \   

N.B. this assumes you placed your AWS folder in your user home directory. It can be placed anywhere. Ensure you do not close the file directory argument (--AWS) with another forward slash (e.g. $HOME/aws/).

It may be necessary to modify your cron job if it fails to work and your credentials are correct and have mounted correctly.

Potential cron-issues

Cron will start with a minimal environment; a common issue is not having the correct environment and bash profile settings being imported to the cron environment. If you don't see analytics logs being dumped to your S3 bucket, this may be a result of this issue.

To resolve this, first create a bash file containing your environment settings (i.e. env >> ~/file_with_env.sh) and then importing your profile and environment settings into the cronjob i.e.

* * * * * * . /etc/profile; env - 'cat ~/file_with_env.sh /bin/sh /root/knetminer-build/common/quickstart/analytics-s3-sync.sh

Ensure you use the correct shell for your environment, in our case it's sh and ensure there's a space after your cronjob line. In this instance, it's also important to ensure you use the full directory to point to your commands, i.e. /bin/usr/aws . You can find where the programs have installed by executing where $program i.e. where aws for aws.

If you want to set the cronjob frequency at one run per minute, it would be good practice to ensure that there is no chance that the jobs overlap (i.e. the analytics upload shouldn't take longer than a minute). This can be achieved using flock.

For example, you could add the following to your crontab job:

* * * * * /usr/bin/flock -w 0 /path/to/cron.lock /bin/sh /path/to/analytics-s3-sync.sh

N.b. to edit crontab jobs, execute crontab -e

Using Dockerised Knetminer with Amazon AWS and ElasticBeanStalk

In this section, we give details on deploying Knetminer via AWS Beanstalk, using the command line interface or the Beanstalk section on the AWS Console.

Prerequisites

Installing AWS Beanstalk CLI - ONLY FOR CLI APPROACH

To kick start a Knetminer deployment on AWS Beanstalk at command line, you'll need to install the AWS Beanstalk CLI.

  • Use the setup scripts (provided by AWS) to install Beanstalk CLI.
  • Setup and configure the CLI on your machine. During the configuration step, an existing AWS Beanstalk application can be used or a new application can be created.

Dataset and Permissions

To use a dataset of your choice for your AWS KnetMiner deployment, perform the following:

  • Upload your dataset directory to a S3 bucket. The dataset directory should follow the convention explained in this section.
  • Make the bucket public, or use the appropriate IAM policy (to enable listing and reading of the S3 bucket), which need to be attached to the Beanstalk IAM role that's being used.

Deployment

Assuming that you've downloaded the KnetMiner gitHub, you'll need to need to perform the following instructions to customise and deploy your KnetMiner.

Beanstalk S3 configuration

You'll need to edit Beanstalk S3 configuration file to use the right S3 bucket for your dataset. Perform the following to open the configuration file and use an editor of your choice (we use vim, here):

cd knetminer/docker/aws
vi .ebextensions/01_s3.config

Then, change the S3 bucket path from

aws s3 cp s3://knetminer-testing-bucket/arabadopsis/ /home/ec2-user/knetminer-dataset --recursive

to

aws s3 cp s3://<MY-KNETMINER-BUCKET/<DATASET-FOLDER>/ /home/ec2-user/knetminer-dataset --recursive

Beanstalk EC2 instance configuration

Depending on the dataset size, you'll need to pick an appropriate AWS instance type for Beanstalk to use to deploy KnetMiner. AWS instance types, their specifications, and pricing for different AWS regions can be found at https://aws.amazon.com/ec2/pricing/on-demand/ . Below are some sample instance types to pick with different CPU and MEMORY configurations.

INSTANCE-TYPE vCPUs MEMORY
t2.medium 2 4 GiB
m4.large 4 8 GiB
m4.xlarge 8 16 GiB
m4.2xlarge 2 32 GiB

Edit the Beanstalk instance configuration file to use the correct instance type (in .ebextensions/00_instance.config , as shown below)

cd knetminer/docker/aws
vi .ebextensions/00_instance.config

Add the appropriate instance type value:

InstanceType: <INSTANCE-TYPE>

Example:

InstanceType: m4.xlarge

Add or delete EC2KeyName entity. This is OPTIONAL and required only to login (via SSH) to the Beanstalk EC2 instance for troubleshooting. Delete this line if SSH login to the instance isn't required.

EC2KeyName: <SSH-KEY-NAME>

Example:

EC2KeyName: mysshkeyname

Edit Docker run file - ONLY FOR PREDEFINED DATASET

When using a predefined dataset in the knetminer gitHub, add a command entity in the Dockerrun JSON file.

vi Dockerrun.aws.json

Change the

  "Entrypoint": "./runtime-helper.sh"
}

to

  "Entrypoint": "./runtime-helper.sh",
  "Command": "arabidopsis /root/knetminer-dataset"
}

###Create a new AWS Beanstalk environment - via AWS Console

Prepare the code zip file

To deploy via AWS Console, you'll need to prepare a Zip file with the relevant and appropriate Docker files, along with the customisation/configuration files, described above.

cd knetminer/docker/aws
zip -r code.zip .

Then, log on to Beanstalk section on the AWS Console. You may proceed by either creating a new application, or selecting an application that already exists. In the selected application, start a new environment by clicking on the 'Actions' button on the right hand side of the page and selecting 'Create environment'. Use the following value configuration within the New Environment wizard.

  • Environment: 'Web server environment'
  • Environment name: User friendly name (E.g: knetminer-test)
  • Domain: User friendly DNS prefix (E.g: knetminer-test)
  • Platform: Preconfigured platform -> Docker
  • Application code: Select 'Upload your code' and select the code.zip file created above.

Create a new AWS Beanstalk environment - via CLI

You can proceed with creating a new environment in the selected AWS Beanstalk application. This step will provision AWS resources (instance, load balancer) and the KnetMiner Docker container is added to the AWS instance launched.

eb create

This will prompt for:

  • unique environment name - provide a userfriendly name (e.g: knetminer-test)
  • DNS CNAME prefix - can be left with default value
  • load balancer type - can be left with default value

Browsing Knetminer UI

The deployment process will provision any AWS resources (instance, load balancer) and the KnetMiner Docker container will be added to the AWS instance launched. This normally takes ~15 minutes, depending on the dataset size.

Login to Beanstalk section in the AWS Web Console, browse to the application, and newly launched environment, and find the relevant URL (e.g: knetminer-test.eu-west-2.elasticbeanstalk.com). Copy the URL suffix, with /client (e.g: knetminer-test.eu-west-2.elasticbeanstalk.com/client), to browse the KnetMiner UI.

Delete KnetMiner environment

The deployed KnetMiner AWS Beanstalk environment can be terminated via use of either the AWS console, or the beanstalk CLI, as follows:

AWS Console

Within the Beanstalk section of AWS Console, browse to the environment page, click on the 'Actions' button (on the right hand side of the page) and select 'Terminate environment' from the drop-down list.

AWS Beanstalk CLI

eb terminate <environment-name> # e.g: eb terminate knetminer-test

For further help, please refer to the following link

Clone this wiki locally