# Cloud Computing Assignment 2022-2023
Implementation of an application processing large data sets in parallel on a distributed Cloud environment (ie. AWS)

© Copyright 2022, All rights reserved to Hans Haller, CSTE-CIDA Student at Cranfield Uni. SATM, Cranfield, UK.

### Solution setup - Pre-requisites:
1. Make sure the aws credentials taken from the Learner Lab are updated in the ~/.aws/credentials file (Test connection locally using aws sts get-caller-identity)
2. Specify the "labsuser.pem" perm-key's (taken from the Learner Lab) path, needed by paramiko to connect to the EC2 instances and execute ssh commands.
3. Create EC2, S3 and SQS resources and clients using boto3.
### Solution setup steps (Using Boto3):
1. Create a cluster of EC2 instances on AWS, using the AWS Linux 2 images.
2. Create a S3 bucket to store the data.
3. Create a SQS queue to store stacks of messages.

### IMPORTS:

The following controllers defines functions that use boto3 packaged functions to make AWS API calls. By importing the controllers, a Boto3 resource is automatically created for each AWS service that is needd for the solution (EC2, SQS, SSM, S3, etc) in order for these functions to work.

The Boto3 resources uses the AWS credentials that are located in the .aws local folder of the user who executes this software.

As a result, it is important that they are updated before running the following. Thus please make sure to restart the kernel and re-execute the imports if the credentials expired (ie. the Learner Lab session ended).

In [None]:
# CONTROLLERS=
from backend.controllers.boto3_controller import *
from backend.controllers.ec2_controller import *
from backend.controllers.matrix_controller import *
from backend.controllers.s3_controller import *
from backend.controllers.spark_controller import *
from backend.controllers.sqs_controller import *
from backend.controllers.ssm_controller import *
from backend.controllers.app_controller import *

# SERVICES=
from backend.work_service import *

In [None]:
# TODO: Replace every pre-steps of exec_shell() with the following procedure:
# Create a setup.sh script that installs all the packages needed for the application to run, and then execute it on each instance.

# TODO (BONUS): BULK VERIFICATIONS

## AWS - SOLUTION SETUP AND TASKS EXECUTION:

In [None]:
# SETTINGS=
worker_amount = 2
backend_path = os.path.join(os.getcwd(), 'backend')     # !! IMPORTANT: Make sure to update this path to the backend folder of the project !!

# NAMES=
instances_names = np.concatenate((np.array(['master']), np.array(['worker' + str(i) for i in range(1, worker_amount+1)]))).tolist()
queues_names = ['main-protected-jobs.fifo', 'main-protected-results.fifo']
bucket_name = 'main-protected-bucket'

### ENVIRONMENT SETUP:

In [None]:
# START TIMER:
print('Beginning AWS environment setup. Starting timer...')
envsetup_timer = time.time()

# SQS QUEUES:
create_sqs_queues(queues_names)
# S3 BUCKET:
create_s3_bucket(bucket_name)
# EC2 INSTANCES:
create_instances_and_wait_for_running(instances_names)

# SEND BACKEND FOLDER -> S3 BUCKET -> EC2 INSTANCES:
upload_dir_to_s3(backend_path, bucket_name, 'backend')

# INSTALL PACKAGES ON EC2 INSTANCES (SEND COMMANDS & RUNS IN THE BACKGROUND)
instances_ids = get_instance_ids_by_names(instances_names)
commands = [
    'aws s3 sync s3://' + bucket_name + '/backend /home/ec2-user/backend',
    'cd /home/ec2-user/backend && sudo mkdir data && cd data && sudo mkdir input && sudo mkdir output && cd output && sudo mkdir mx && sudo mkdir add',
    'sudo yum install tree -y',
    'pip3 install boto3',
    'pip3 install numpy',
    'pip3 install pandas'
]
responses = exec_SSHs_on_instances_using_SSM(instances_ids, commands)

# UPDATE PACKAGES ON EC2 INSTANCES:
update_instances_credentials_using_boto3_session_credentials(get_instance_ids_by_names(instances_names))

# STOP TIMER:
print('Environment setup took: ' + str(np.round(time.time() - envsetup_timer, 2)) + ' seconds.')

In [None]:
# SEND BACKEND FOLDER -> S3 BUCKET -> EC2 INSTANCES:
instances_ids = get_instance_ids_by_names(instances_names)
upload_dir_to_s3(backend_path, bucket_name, 'backend')
exec_SSHs_on_instances_using_SSM(instances_ids, ['rm -rf /home/ec2-user/backend', 'aws s3 sync s3://' + bucket_name + '/backend /home/ec2-user/backend', 'cd /home/ec2-user/backend && sudo mkdir data && cd data && sudo mkdir input && sudo mkdir output && cd output && sudo mkdir mx && sudo mkdir add'])

In [None]:
view_all_instances(False)

In [None]:
# create_matrix_then_split_and_send_jobs(matrix_shape, queues_names[0], bucket_name)
# gather_jobs_then_compute_and_send_results(queues_names[0], queues_names[1])
# gather_results_and_reconstruct_matrix(queues_names[1], bucket_name)
# verify_result_matrix(bucket_name, '707566791', 'mx')

### SOLUTION EXECUTION:

In [None]:
# PREFERENCES=
matrix_shape = 500
used_workers = 1

sol_timer = time.time()

# JOB: CREATE A MATRIX, SPLIT IT INTO SLICES, AND SEND THE SLICES TO THE JOBS QUEUE:
command = 'sudo python3 backend/work_service.py create ' + str(matrix_shape) + ' ' + queues_names[0] + ' ' + bucket_name
stdout, stderr = exec_SSH_on_instance(instances_names[0], command)
outprint(stdout, stderr)

# JOB: GATHER JOBS FROM THE JOBS QUEUE, COMPUTE THE RESULTS, AND SEND THE RESULTS TO THE RESULTS QUEUE:
for instance_name in instances_names[1:used_workers+1]:
    command = 'sudo python3 backend/work_service.py getjobs ' + queues_names[0] + ' ' + queues_names[1]
    stdout, stderr = exec_SSH_on_instance(instance_name, command)
    outprint(stdout, stderr)

# JOB: GATHER RESULTS FROM THE RESULTS QUEUE, AND RECONSTRUCT THE RESULT MATRIX:
command = 'sudo python3 backend/work_service.py getresults ' + queues_names[1] + ' ' + bucket_name
stdout, stderr = exec_SSH_on_instance(instances_names[0], command)
outprint(stdout, stderr)

print('The computation took: ' + str(np.round(time.time() - sol_timer, 2)) + ' seconds.')

In [None]:
id = '59993422'
op = 'mx'
# JOB: VERIFY THE RESULTS COMPUTED ON AWS DISTRIBUTED CLOUD ENVIRONMENT, USING NUMPY'S MATRIX/ADD FUNCTIONS:
command = 'sudo python3 backend/work_service.py verify ' + bucket_name + ' ' + id + ' ' + op
stdout, stderr = exec_SSH_on_instance(instances_names[0], command)
outprint(stdout, stderr)

### CLEAN UP:

In [None]:
for queue_name in queues_names:
    purge_queue(queue_name)
    delete_sqs_queue(queue_name)
for instances_name in instances_names:
    stop_instance_by_name(instances_name)
    terminate_instance_by_name(instances_name)
delete_s3_bucket(bucket_name)