# CROCUS Flows using Globus:

The Globus Compute SDK simplifies remote task execution on distributed computing resources, such as cloud environments, high-performance computing (HPC) systems, or clusters. It allows users to seamlessly execute Python functions on remote machines without needing to manage the underlying infrastructure manually.


This tutorial will guide you through the steps of:
- Setting up a Python environment to use the Globus Compute SDK.
- Writing and registering a function that can be executed remotely.
- Configuring and starting a Globus Compute Endpoint.
- Submitting tasks and retrieving results.

### 0. create a new Conda environment.
`conda create -n globus python=3.12`

`conda activate globus`

`pip install globus-compute-sdk`

`pip install python-dotenv`

### 1a. Processing function

this is the main function that will do everything, naming it the `main()` in our program should not cause any issues. This function  will handle all your processing. The function processes a file, converts its content to uppercase, and stores the result in an output file.  Nothing globus related in this code, except that all the processing code should be in one single function and all the imports should be inside the function. Also make sure that the function returns something like file names that are processed.

In [6]:
def gc_test_func(file_path):
    from dataclasses import dataclass
    import os
    
    # this will replace our argparser somehow
    @dataclass
    class ArgsClass:
        file_path: str
    args = ArgsClass(file_path=file_path)

    # create output filepath
    output_file_path = args.file_path.replace("input", "output")

    try:
        # Read file
        with open(args.file_path, 'r') as input_file:
            content = input_file.read()

        # Convert to uppercase
        converted_content = content.upper()

        # Write the output file
        with open(output_file_path, 'w') as output_file:
            output_file.write(converted_content)

        return output_file_path

    except Exception as e:
        return f"Error processing file {args.file_path}: {e}"


### 1b. Register function

Now, register the function you wrote above with Globus Compute using the Client from the Globus Compute SDK. This code is also part of the above file but outside the function. This will return a UUID for your function that you will need later for executing it. Everytime we register the function we will get unique id even if the function is same.

In [7]:
import globus_compute_sdk
# make Globus Compute client
gcc = globus_compute_sdk.Client()

# Register the function
COMPUTE_FUNCTION_ID = gcc.register_function(gc_test_func)

# Write function UUID in a file, is this filename always in this format
uuid_file_name = "gc_test_func_uuid.txt"
with open(uuid_file_name, "w") as file:
    file.write(COMPUTE_FUNCTION_ID)
    file.write("\n")
file.close()

# End of script
print("Function registered with UUID -", COMPUTE_FUNCTION_ID)
print("The UUID is stored in " + uuid_file_name + ".")


Function registered with UUID - a83e1606-38e6-4282-a52c-5ba160c0e5d2
The UUID is stored in gc_test_func_uuid.txt.


### 2a. Configure the Globus Compute Endpoint
Now we will create globus compute endpoint, that will execute the registered function.

`globus-compute-endpoint configure gc_test`

or with confg file

`globus-compute-endpoint configure --endpoint-config gc_config.yaml gc_test`

>Created profile for endpoint named <gc_test>
	Configuration file: /Users/bhupendra/.globus_compute/gc_test/config.yaml
 Use the `start` subcommand to run it:
	$ globus-compute-endpoint start gc_test


### 2b. Start the endpoint

 `(globus) globus % globus-compute-endpoint start gc_test`
 
> Starting endpoint; registered ID: 2e8888cf-e462-4993-bd0f-fe0cf0c8fb9b


Remember to note the registration id of the endpoint, we will need it while running the function.

### 3. Set Up the .env File
Create a `.env` file in your project directory where the test function is stored with following structure.

> ENDPOINT_UUID="2e8888cf-e462-4993-bd0f-fe0cf0c8fb9b"

> FUNCTION_UUID="a83e1606-38e6-4282-a52c-5ba160c0e5d2"

### 4. Submit a Job to Globus Compute
Now that the function is registered and the endpoint is set up.

In [2]:
from globus_compute_sdk import Client, Executor
from dotenv import load_dotenv
import os

# Load variables from the .env file
load_dotenv()
endpoint_uuid = os.getenv("ENDPOINT_UUID")
function_uuid = os.getenv("FUNCTION_UUID")

print(f"endpoint uuid {endpoint_uuid}")
print(f"function uuid {function_uuid}")

# Create Globus Compute SDK Client and Executor
gcc = Client()
gce = Executor(endpoint_id=endpoint_uuid, client=gcc, amqp_port=443)


data = {
    "file_path": "/Users/bhupendra/learning/globus/data/input.txt",
}

# Submit the function to the Globus Compute service
future = gce.submit_to_registered_function(function_uuid, kwargs=data)

# Retrieve the result
result = future.result()
print(result)

# Check if the result is a valid output file path
if result and isinstance(result, str) and os.path.exists(result):
    print(f"Check the Output file located at: {result}")
else:
    print(f"Failed. {result}")


endpoint uuid 2e8888cf-e462-4993-bd0f-fe0cf0c8fb9b
function uuid a83e1606-38e6-4282-a52c-5ba160c0e5d2
/Users/bhupendra/learning/globus/data/output.txt
Check the Output file located at: /Users/bhupendra/learning/globus/data/output.txt


The example function is a simple test case to demonstrate how to register and run tasks using the Globus Compute SDK.