This notebook is intended to be run from the host machine to submit jobs to Kubernetes.

In [89]:
PROJECT_NAME = 'hello_world'

In [90]:
%load_ext autoreload
%autoreload 2

import sys
sys.path.extend([
    "../../",
    "../../execution",
    "../../orchestration",
])

import os
from orchestration.hello_world import orchestrate as orch
from orchestration.submit import submit_job

os.environ['PROJECT_NAME'] = PROJECT_NAME

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [None]:
# Build Local Execution Image
os.environ['K8S_ENV'] = 'minikube'
os.environ['DATA_DIR'] = '/'.join(os.getcwd().split("/")[:-2] + ['data'])
! ../../build_scripts/build_local.sh

Let's run the hello_world data transformation.

Our hello_world_orchestration function will take in a directory path (relative to our DATA_DIR).  It will:
- List the data files in the directory.
- Assign 1 task to handle each data file.
- Each task will request that the hello_world_execution function is run on a given input file.

In [100]:
job_config = orch.hello_world_orchestration(
    input_directory="hello_world/raw",
    output_directory="hello_world/processed",
)
job_config

{'tasks': [{'function': 'hello_world_execution',
   'args': {'input_path': 'hello_world/raw/part_2.txt',
    'output_path': 'hello_world/processed/part_2.txt'}},
  {'function': 'hello_world_execution',
   'args': {'input_path': 'hello_world/raw/part_1.txt',
    'output_path': 'hello_world/processed/part_1.txt'}}],
 'memory': '250Mi',
 'cpu': '200m',
 'job_name': 'hello-world-execution-6ad67aec'}

Now, let's submit the job to the cluster.

Since we have 2 data files, it should spin up 2 tasks.

Each task will create a new file that is the same as the input file, but with "Hello World" added to the start of each line.

In [101]:
submit_job(job_config)

Task Config uploaded to S3: hello-world-execution-6ad67aec/tasks.json
Job hello-world-execution-6ad67aec created!
Job hello-world-execution-6ad67aec completed!


The above code should finish in a few seconds.  If you run `kubectl get pods` in a terminal, you should see 2 complete tasks

In [94]:
! kubectl get pods

NAME                                     READY   STATUS      RESTARTS   AGE
hello-world-execution-a857ec8c-0-swpwf   0/1     Completed   0          11s
hello-world-execution-a857ec8c-1-kbbpp   0/1     Completed   0          11s


And you should now have 2 processed data files:

In [95]:
! ls ../../data/hello_world/processed/

part_1.txt part_2.txt


If you open a data file, you'll see the results:

In [96]:
! echo '\n--- Input ---'
! cat ../../data/hello_world/raw/part_1.txt
! echo '\n--- Output ---'
! cat ../../data/hello_world/processed/part_1.txt


--- Input ---
A
B
--- Output ---
Hello World: A
Hello World: B

# Running in EKS

Now, go back to the README and follow the setup instructions for running transformations in EKS.

Then, come back to this point.


In [None]:
# Build the image, push to ECR, and configure your local kubenetes to point to the remote cluster

import boto3
os.environ['AWS_ACCOUNT_ID'] = boto3.client("sts").get_caller_identity()["Account"]
os.environ['K8S_ENV'] = 'eks'
os.environ['DATA_DIR'] = 's3://kube-transform-data-bucket'
! ../../build_scripts/build_eks.sh

Now let's upload the Hello World raw data to S3

In [98]:
! aws s3 cp ../../data/hello_world/raw s3://kube-transform-data-bucket/hello_world/raw --recursive


upload: ../../data/hello_world/raw/part_1.txt to s3://kube-transform-data-bucket/hello_world/raw/part_1.txt
upload: ../../data/hello_world/raw/part_2.txt to s3://kube-transform-data-bucket/hello_world/raw/part_2.txt


Now, go back up to the top of this notebook, just after the local build.

Run the cells with:
`job_config = ...`
and
`submit_job(job_config)`

You can run the exact same code - but now, it should all run in your EKS cluster.

Once that's done, return here.

In [102]:
# Check your status with the following command
! kubectl get pods

# Note that there may be some additional overhead time to spin up an EKS node.

NAME                                     READY   STATUS      RESTARTS   AGE
hello-world-execution-6ad67aec-0-b56rw   0/1     Completed   0          57s
hello-world-execution-6ad67aec-1-mbg2d   0/1     Completed   0          57s


In [103]:
# Once complete, check the results in S3
! echo '\n--- Input ---'
! aws s3 cp s3://kube-transform-data-bucket/hello_world/raw/part_1.txt - 
! echo '\n--- Output ---'
! aws s3 cp s3://kube-transform-data-bucket/hello_world/processed/part_1.txt - 



--- Input ---
A
B
--- Output ---
Hello World: A
Hello World: B