# Running SageMaker Studio Terminal Commands Remotely via API

This notebook demonstrates use of the [JupyterServer API](https://github.com/jupyter/jupyter/wiki/Jupyter-Notebook-Server-API#Kernel-API) and the [Jupyter Client (websocket) API](https://jupyter-client.readthedocs.io/en/latest/messaging.html) to remotely run commands on the *system terminal* in SageMaker Studio.

It's presented as a notebook to give more space for commentary, and because I used a SageMaker Notebook Instance in the same AWS region to test it out 😁 ...But you could re-purpose the same code in some other environment (like a Lambda function) to run whatever automations you need.

The main constraint is that your execution environment **needs IAM permission** `sagemaker:CreatePresignedDomainUrl` on the target `DomainId` and `UserProfileName` - which lets this script **log in as the SageMaker Studio user** to run the commands.

In [1]:
# Python Built-Ins:
import asyncio
from datetime import datetime
import json
import re
import time
import uuid

# External Dependencies:
import boto3
import requests
import websocket

smclient = boto3.client("sagemaker")

## Log in

For access to the APIs, we'll need to:

- Generate the initial presigned login URL via SageMaker API
- Open a `requests.session` to persist the headers/cookies/etc that get set when we first open the URL and then make requests
- Remember to set the required **cross-site request forgery protection token** from cookies, on update request types like `POST`, `DELETE`, etc (if you're not familiar with this CSRF/XSRF protection mechanism, you can read more [here](https://en.wikipedia.org/wiki/Cross-site_request_forgery#Cookie-to-header_token))

In [2]:
# Generate the presigned URL which facilitates login:
presigned_resp = smclient.create_presigned_domain_url(
    DomainId="d-ngfhxewhrmqe",
    UserProfileName="baseuser",
)

# Login like https://d-....studio.{AWSRegion}.sagemaker.aws/auth?token=...
login_url = presigned_resp["AuthorizedUrl"]
# API relative to https://d-....studio.{AWSRegion}.sagemaker.aws/jupyter/default
api_base_url = login_url.partition("?")[0].rpartition("/")[0] + "/jupyter/default"
print(api_base_url)

https://d-ngfhxewhrmqe.studio.ap-southeast-1.sagemaker.aws/jupyter/default


In [3]:
# Create an HTTP session (for cookie/header memory) and use it to log in:
reqsess = requests.Session()
login_resp = reqsess.get(presigned_resp["AuthorizedUrl"])
print(login_resp)

# (See login_resp.headers and login_resp.text (the loading page HTML) for more details)

<Response [200]>


In [4]:
# TODO: Need to wait here if the JupyterServer 'default' app is not ready?

## Initialise terminal session

Although it's possible to see open terminals...

In [5]:
terminals = reqsess.get(f"{api_base_url}/api/terminals").json()
terminals

[{'name': '1'}]

...automation applications will probably want to create their own terminals most of the time:

In [6]:
print(f"Creating terminal...\n")
terminal_resp = reqsess.post(
    f"{api_base_url}/api/terminals",
    params={ "_xsrf": reqsess.cookies["_xsrf"] },  # Seems like this can be put in either header or query
)
print(terminal_resp)
terminal = terminal_resp.json()
print(json.dumps(terminal, indent=2))

Creating terminal...

<Response [200]>
{
  "name": "2"
}


Note that unlike kernels, we don't need to separately initialise a 'session' on top of this - the terminal itself is sufficient.

## Run the code

Actual interaction with a terminal is via WebSocket APIs, rather than REST: So we'll need to create a websocket connection, carrying over the required cookies from our REST session.

The wire protocol for Jupyter terminals is simple to the point of being limiting. Messages are a 2-element list with `[stream, content]`, where stream may be e.g. `stdout`, `stderr` for output messages - or conversely `stdin` when we want to send in commands.

The disadvantage of this is that we have no formal visibility of whether each command has "finished executing" or what its exit code was.

Below, we use a regex for stdout terminal prompts (e.g. `bash-4.2$ `) to guess when the terminal is ready for the next command.

In [7]:
# Execution request/reply is done on websockets channels
ws_base_url = "wss://" + api_base_url.partition("://")[2] + "/terminals/websocket"
cookies = reqsess.cookies.get_dict()

print(f"Connecting to:\n{ws_base_url}/{terminal['name']}")
ws = websocket.create_connection(
    f"{ws_base_url}/{terminal['name']}",
    cookie="; ".join(["%s=%s" %(i, j) for i, j in cookies.items()]),
)
print("Connected\n")

try:
    # Wait for setup:
    setup = None
    while setup is not None:
        res = json.loads(ws.recv())
        print(res)
        if res[0] == "setup":
            print(f"Got setup")
            setup = res[1]

    # Send commands one by one, waiting for each to complete and re-show prompt:
    code = ["echo 'Hi, world!'", "pwd"]
    prompt_exp = re.compile(r"\n.*\$ $", re.MULTILINE)
    for ix, c in enumerate(code):
        ws.send(json.dumps(["stdin", c + "\n"]))
        # Assuming echo is on, stdin messages will be echoed to stdout anyway so no need to print

        while True:
            res = json.loads(ws.recv())
            # res[0] is the stream so will be e.g. 'stdout', 'stderr'
            # res[1] is the content
            print(res[1], end="")
            if res[0] == "stdout" and prompt_exp.search(res[1]):
                break

    print("\n\nDone")
finally:
    ws.close()

Connecting to:
wss://d-ngfhxewhrmqe.studio.ap-southeast-1.sagemaker.aws/jupyter/default/terminals/websocket/2
Connected

[?1034hbash-4.2$ {}echo 'Hi, world!'
Hi, world!
bash-4.2$ pwd
/home/sagemaker-user
bash-4.2$ 

Done


## Clean-up

Just close the terminal once we're done

In [8]:
terminal_del_resp = reqsess.delete(
    f"{api_base_url}/api/terminals/{terminal['name']}",
    params={ "_xsrf": reqsess.cookies["_xsrf"] },
)
print(terminal_del_resp)
terminal = None

<Response [204]>


## All done!

Remotely executing system terminal commands in SageMaker Studio (as permitted by the runtime environment's IAM `sagemaker:CreatePresignedDomainUrl` access) could be used to automate a range of tasks, such as:

- Copying, git cloning, or updating content into the user's Studio home folder
- Installing compatible JupyterLab extensions
- Standardising environments for consistency or policy compliance
- Detective security controls to detect unwanted customizations