# Cloud Computing Assignment 2022-2023
Implementation of an application processing large data sets in parallel on a distributed Cloud environment (ie. AWS)

© Copyright 2022, All rights reserved to Hans Haller, CSTE-CIDA Student at Cranfield Uni. SATM, Cranfield, UK.

https://www.github.com/Hnshlr

### Solution setup - Pre-requisites:
1. Make sure the aws credentials taken from the Learner Lab are updated in the ~/.aws/credentials file (Test connection locally using aws sts get-caller-identity)
2. Specify the "labsuser.pem" perm-key's (taken from the Learner Lab) path, needed by paramiko to connect to the EC2 instances and execute ssh commands.
### Solution setup steps (Using the environment setup function):
1. Create a cluster of EC2 instances on AWS, using the AWS Linux 2 images.
2. Create a S3 bucket to store the data.
3. Create a SQS queue to store stacks of messages.

### IMPORTS:

The following controllers defines functions that use boto3 packaged functions to make AWS API calls. By importing the controllers, a Boto3 resource is automatically created for each AWS service that is needd for the solution (EC2, SQS, SSM, S3, etc) in order for these functions to work.

The Boto3 resources uses the AWS credentials that are located in the .aws local folder of the user who executes this software.

As a result, it is important that they are updated before running the following. Thus please make sure to restart the kernel and re-execute the imports if the credentials expired (ie. the Learner Lab session ended).

In [None]:
# CONTROLLERS=
from backend.controllers.boto3_controller import *
from backend.controllers.ec2_controller import *
from backend.controllers.matrix_controller import *
from backend.controllers.s3_controller import *
from backend.controllers.spark_controller import *
from backend.controllers.sqs_controller import *
from backend.controllers.ssm_controller import *
from backend.controllers.app_controller import *

# SERVICES=
from backend.work_service import *
from backend.app_service import *

## AWS - SOLUTION SETUP AND TASKS EXECUTION:

In [None]:
# SETTINGS=             [IMPORTANT: Update the following settings before running my the solution]
worker_amount = 8
backend_path = os.path.join(os.getcwd(), 'backend')
username = 'hans'
aws_credentials_path = os.path.join('C:', os.sep, 'Users', username, '.aws', 'credentials')

# NAMES=
instances_names = np.concatenate((np.array(['master']), np.array(['worker' + str(i) for i in range(1, worker_amount+1)]))).tolist()
queues_names = ['main-protected-jobs.fifo', 'main-protected-results.fifo']
bucket_names = ['main-protected-bucket', 'main-protected-ssm-outputs']

### ENVIRONMENT SETUP:

In [None]:
# START TIMER:
print('Beginning AWS environment setup. Starting timer...')
envsetup_timer = time.time()

# SQS QUEUES:
create_sqs_queues(queues_names)
# S3 BUCKET:
create_s3_buckets(bucket_names)
# EC2 INSTANCES:
create_instances_and_wait_for_running(instances_names)

# SEND BACKEND FOLDER -> S3 BUCKET -> EC2 INSTANCES:
upload_dir_to_s3(backend_path, bucket_names[0], 'backend')

# INSTALL PACKAGES ON EC2 INSTANCES (SEND COMMANDS & RUNS IN THE BACKGROUND)
instances_ids = get_instance_ids_by_names(instances_names)
commands = [
    'aws s3 sync s3://' + bucket_names[0] + '/backend /home/ec2-user/backend',
    'cd /home/ec2-user/backend && sudo mkdir data && cd data && sudo mkdir input && sudo mkdir output && cd output && sudo mkdir mx && sudo mkdir add',
    'sudo yum install tree -y',
    'pip3 install boto3',
    'pip3 install numpy',
    'pip3 install pandas'
]
responses = exec_SSHs_on_instances_using_SSM(instances_ids, commands)

# STOP TIMER:
print('Environment setup took: ' + str(np.round(time.time() - envsetup_timer, 2)) + ' seconds.')

In [None]:
# UPDATE BACKEND FOLDER ON S3, AND EC2 INSTANCES:
instances_ids = get_instance_ids_by_names(instances_names)
purge_s3_bucket(bucket_names[0])
upload_dir_to_s3(backend_path, bucket_names[0], 'backend')
exec_SSHs_on_instances_using_SSM(instances_ids, [
    'rm -rf /home/ec2-user/backend',
    'aws s3 sync s3://' + bucket_names[0] + '/backend /home/ec2-user/backend',
    'cd /home/ec2-user/backend && sudo mkdir data && cd data && sudo mkdir input && sudo mkdir output && cd output && sudo mkdir mx && sudo mkdir add'])

### SOLUTION EXECUTION:

In [None]:
# RECOMMENDED PRE-REQUISITES:
stop_all_instances_and_wait_for_stopped()
start_all_instances_and_wait_for_running()

In [None]:
# PREFERENCES=
matrix_shape = 1000
used_workers = 8

# START JOB TIMER:
sol_timer = time.time()

# WORKERS'S LOOP: READY UP THE WORKERS, HAVING THEM WAIT FOR JOBS:
command = 'cd /home/ec2-user && sudo python3 backend/work_service.py worker ' + queues_names[0] + ' ' + queues_names[1]
work_resp = exec_SSHs_on_instances_using_SSM(get_instance_ids_by_names(instances_names)[1:min(used_workers, worker_amount)+1], [command], bucket_names[1])

# MASTER'S LOOP: CREATE A MATRIX, SPLIT IT INTO SLICES, AND SEND THE SLICES TO THE JOBS QUEUE:
command = 'cd /home/ec2-user && sudo python3 backend/work_service.py master ' + str(matrix_shape) + ' ' + queues_names[0] + ' ' + queues_names[1] + ' ' + bucket_names[0]
stdout, stderr = exec_SSH_on_instance(instances_names[0], command)
outprint(stdout, stderr)
# print(command)

# STOP THE WORKERS LOOPS:
stop_SSHs_on_instances_using_SSM(get_instance_ids_by_names(instances_names)[1:min(used_workers, worker_amount)+1], work_resp['Command']['CommandId'])

# STOP THE TIMER:
print('The computation took: ' + str(np.round(time.time() - sol_timer, 2)) + ' seconds.')

### VERIFY:

In [None]:
# SETTINGS=
op_ids = view_all_s3_buckets_filenames(bucket_names[0], 'backend/data/output/mx/')
op_type = 'mx'

# VERIFY A COMPUTATION (RUN THE COMPARISON ONLINE USING AN INSTANCE):
verify_multiple_jobs(bucket_names[0], instances_names[0], op_ids, op_type)

### CLEAN UP:

In [None]:
kill_all()