# Cloud Computing Assignment 2022-2023
Implementation of an application processing large data sets in parallel on a distributed Cloud environment (ie. AWS)

© Copyright 2022, All rights reserved to Hans Haller, CSTE-CIDA Student at Cranfield Uni. SATM, Cranfield, UK.

### Solution setup - Pre-requisites:
1. Make sure the aws credentials taken from the Learner Lab are updated in the ~/.aws/credentials file (Test connection locally using aws sts get-caller-identity)
2. Specify the "labsuser.pem" perm-key's (taken from the Learner Lab) path, needed by paramiko to connect to the EC2 instances and execute ssh commands.
3. Create EC2, S3 and SQS resources and clients using boto3.
### Solution setup steps (Using Boto3):
1. Create a cluster of EC2 instances on AWS, using the AWS Linux 2 images.
2. Create a S3 bucket to store the data.
3. Create a SQS queue to store stacks of messages.

### IMPORTS:

The following controllers defines functions that use boto3 packaged functions to make AWS API calls. By importing the controllers, a Boto3 resource is automatically created for each element of the solution (EC2, SQS, SSM, S3, etc) in order for these functions to work. The Boto3 resources uses the AWS credentials that are located in the .aws local folder of the user who executes this software. As a result, it is important that they are updated before running the following. Thus please make sure to restart the kernel and re-execute the imports if the credentials expired (ie. the Learner Lab session ended).

In [None]:
from backend.controllers.boto3_controller import *
from backend.controllers.ec2_controller import *
from backend.controllers.matrix_controller import *
from backend.controllers.s3_controller import *
from backend.controllers.spark_controller import *
from backend.controllers.sqs_controller import *
from backend.controllers.ssm_controller import *
from backend.controllers.app_controller import *

In [None]:
from backend.work_service import *

## AWS - SOLUTION SETUP AND TASKS EXECUTION:

In [None]:
# SETTINGS=
worker_amount = 2
backend_path = os.path.join(os.getcwd(), 'backend')     # !! IMPORTANT: Make sure to update this path to the backend folder of the project !!

# NAMES=
instances_names = np.concatenate((np.array(['master']), np.array(['worker' + str(i) for i in range(1, worker_amount+1)]))).tolist()
queues_names = ['main-protected-jobs.fifo', 'main-protected-results.fifo']
bucket_name = 'main-protected-bucket'

In [None]:
# EC2 INSTANCES:
create_instances_and_wait_for_running(instances_names)
# SQS QUEUES:
create_sqs_queues(queues_names)
# S3 BUCKET:
create_s3_bucket(bucket_name)

In [None]:
print(get_instance_public_dns_by_name(instances_names[0]))

In [None]:
# UPLOAD BACKEND FOLDER -> S3 BUCKET:
upload_dir_to_s3(backend_path, bucket_name, 'backend')
for instance_name in instances_names:
    # DOWNLOAD BACKEND FOLDER : S3 BUCKET -> EC2 INSTANCES:
    download_directory_on_instance_from_s3_bucket(instance_name, bucket_name, 'backend', 'backend')

In [None]:
# EC2 INSTANCES - SETUP:
for instance_name in instances_names:
    exec_SSH_on_instance(instance_name, 'pip3 install boto3')
    exec_SSH_on_instance(instance_name, 'pip3 install numpy')
    update_instance_credentials_using_boto3_session_credentials(instance_name)

In [None]:
create_matrix_then_split_and_send_jobs(500, queues_names[0])

In [None]:
# CREATE A MATRIX - GIVING THE SIDE SIZE:
matrix_shape = 500
matrix = create_random_square_matrix(matrix_shape)

# SPLIT MATRIX INTO BLOCKS - USING OPTIMAL BLOCKS AMOUNT:
max_SQS_msg_size = get_max_message_size_from_sqs_queue(queues_names[0])
blocks = split_matrix_in_blocks(matrix, find_optimal_blocks_amount(matrix, max_SQS_msg_size))

# STORE IN AN ARRAY THE SLICES REQUIRED TO COMPUTE EACH BLOCK OF THE RESULT MATRIX:
slices = np.empty((blocks.shape[0], blocks.shape[1], 3), dtype=np.ndarray)
for i in range(0, blocks.shape[0]):
    for j in range(0, blocks.shape[1]):
        slices[i][j][0] = np.concatenate([blocks[i][k] for k in range(blocks.shape[0])], axis=1)
        slices[i][j][1] = np.concatenate([blocks[k][j] for k in range(blocks.shape[0])], axis=0)

# CREATE A LIST OF MESSAGES TO SEND TO SQS QUEUE:
messages = []
for i in range(0, blocks.shape[0]):
    for j in range(0, blocks.shape[1]):
        message_body = {
            'i': i,
            'j': j,
            'left-slice': slices[i][j][0].tolist(),
            'right-slice': slices[i][j][1].tolist()
        }
        json_message_body = json.dumps(message_body)
        messages.append(json_message_body)

# BULK SEND MESSAGES TO THE "JOBS" QUEUE:
send_bulk_messages_to_sqs_queue(queues_names[0], messages, 'work')

In [None]:
# EXECUTE THE WORK SERVICE - JOB N°1:
for instance_name in instances_names[1:2]:
    command = 'python3 backend/work_service.py job1 ' + queues_names[0] + ' ' + queues_names[1]
    stdout, stderr = exec_SSH_on_instance(instance_name, command)
    print('STDOUT: ', stdout)
    print('STDERR: ', stderr)

In [None]:
# command = 'python3 backend/work_service.py job2 ' + queues_names[1] + ' ' + str(blocks.shape[0])
# stdout, stderr = exec_SSH_on_instance(instances_names[0], command)
print('STDOUT:\n', stdout.decode('utf-8'))
print('STDERR:\n', stderr.decode('utf-8'))

In [None]:
print('The result matrix is equal to the self computed product: ' + str(np.array_equal(result_matrix, np.dot(matrix, matrix))))

In [None]:
# CLEAN-UP:
for queue_name in queues_names:
    purge_queue(queue_name)
    delete_sqs_queue(queue_name)
for instances_name in instances_names:
    stop_instance_by_name(instances_name)
    terminate_instance_by_name(instances_name)
delete_s3_bucket(bucket_name)