Before you begin, open this experiment on Trovi:

-   Use this link: [Large-scale model training on Chameleon](https://chameleoncloud.org/experiment/share/39a536c6-6070-4ccf-9e91-bc47be9a94af) on Trovi
-   Then, click “Launch on Chameleon”. This will start a new Jupyter server for you, with the experiment materials already in it.

You will see several notebooks inside the `llm-chi` directory - look for the one titled `1_create_server.ipynb`. Open this notebook and continue there.

## Bring up a GPU server

At the beginning of the lease time, we will bring up our GPU server. We will use the `python-chi` Python API to Chameleon to provision our server.

We will execute the cells in this notebook inside the Chameleon Jupyter environment.

Run the following cell, and make sure the correct project is selected:

In [1]:
from chi import server, context, lease
import os

context.version = "1.0" 
context.choose_project()
context.choose_site(default="CHI@TACC")

VBox(children=(Dropdown(description='Select Project', options=('CHI-251409',), value='CHI-251409'), Output()))

VBox(children=(Dropdown(description='Select Site', options=('CHI@TACC', 'CHI@UC', 'CHI@EVL', 'CHI@NCAR', 'CHI@…

Change the string in the following cell to reflect the name of *your* lease (**with your own net ID**), then run it to get your lease:

In [3]:
l = lease.get_lease(f"project17_liq2") # or llm_single_netID, or llm_multi_netID
l.show()

HTML(value='\n        <h2>Lease Details</h2>\n        <table>\n            <tr><th>Name</th><td>project17_liq2…

Lease Details:
Name: project17_liq2
ID: 790bb75d-f20e-4241-8221-f8516f086c4e
Status: PENDING
Start Date: 2025-05-11 21:00:00
End Date: 2025-05-12 02:30:00
User ID: 80faa74e719af4d9b94f9792fcb80236a036e83f75b2d23667c917eda74a7179
Project ID: d3c6e101843a4ba79e665ebf59b521a2

Node Reservations:
ID: 2e8b8d05-4784-4849-ae79-dd4131417816, Status: pending, Min: 1, Max: 1

Floating IP Reservations:

Network Reservations:

Events:


The status should show as “ACTIVE” now that we are past the lease start time.

The rest of this notebook can be executed without any interactions from you, so at this point, you can save time by clicking on this cell, then selecting Run \> Run Selected Cell and All Below from the Jupyter menu.

As the notebook executes, monitor its progress to make sure it does not get stuck on any execution error, and also to see what it is doing!

We will use the lease to bring up a server with the `CC-Ubuntu24.04-CUDA` disk image. (Note that the reservation information is passed when we create the instance!) This will take up to 10 minutes.

In [7]:
username = os.getenv('USER') # all exp resources will have this prefix
s = server.Server(
    f"node-llm-{username}", 
    reservation_id=l.node_reservations[0]["id"],
    image_name="CC-Ubuntu24.04-CUDA"
)
s.submit(idempotent=True)

Waiting for server node-llm-rh3884_nyu_edu's status to become ACTIVE. This typically takes 10 minutes, but can take up to 20 minutes.


HBox(children=(Label(value=''), IntProgress(value=0, bar_style='success')))

Server has moved to status ERROR


Attribute,node-llm-rh3884_nyu_edu
Id,722c9391-3ece-47a4-9544-612bec113abe
Status,ERROR
Image Name,CC-Ubuntu24.04-CUDA
Flavor Name,baremetal
Addresses,
Network Name,sharednet1
Created At,2025-05-11T20:07:50Z
Keypair,trovi-b104fa3
Reservation Id,2e8b8d05-4784-4849-ae79-dd4131417816
Host Id,


Note: security groups are not used at Chameleon bare metal sites, so we do not have to configure any security groups on this instance.

Then, we’ll associate a floating IP with the instance, so that we can access it over SSH.

In [5]:
s.associate_floating_ip()

ResourceError: None of the ports can route to floating ip 129.114.108.211 on server c58b81fe-d5fd-4ba6-9fea-f8690b670326

In [None]:
s.refresh()
s.check_connectivity()

In [None]:
s.refresh()
s.show(type="widget")

## Set up Docker with NVIDIA container toolkit

To use common deep learning frameworks like Tensorflow or PyTorch, we can run containers that have all the prerequisite libraries necessary for these frameworks. Here, we will set up the container framework.

In [6]:
s.execute("curl -sSL https://get.docker.com/ | sudo sh")
s.execute("sudo groupadd -f docker; sudo usermod -aG docker $USER")
s.execute("docker run hello-world")

AttributeError: 'NoneType' object has no attribute 'rsplit'

We will also install the NVIDIA container toolkit, with which we can access GPUs from inside our containers.

In [8]:
# get NVIDIA container toolkit 
s.execute("curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list")
s.execute("sudo apt update")
s.execute("sudo apt-get install -y nvidia-container-toolkit")
s.execute("sudo nvidia-ctk runtime configure --runtime=docker")
s.execute("sudo systemctl restart docker") 

deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /
#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/experimental/deb/$(ARCH) /






Hit:1 https://download.docker.com/linux/ubuntu noble InRelease
Get:2 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease [1477 B]
Get:3 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  Packages [18.6 kB]
Get:4 http://nova.clouds.archive.ubuntu.com/ubuntu noble InRelease [256 kB]
Hit:5 http://security.ubuntu.com/ubuntu noble-security InRelease
Hit:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64  InRelease
Get:7 http://nova.clouds.archive.ubuntu.com/ubuntu noble-updates InRelease [126 kB]
Get:8 http://nova.clouds.archive.ubuntu.com/ubuntu noble-backports InRelease [126 kB]
Fetched 528 kB in 1s (477 kB/s)
Reading package lists...
Building dependency tree...
Reading state information...
4 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  libnvidia-container-tools libnvidia-co

debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 


Fetched 5849 kB in 2s (2860 kB/s)
Selecting previously unselected package libnvidia-container1:amd64.
(Reading database ... 113813 files and directories currently installed.)
Preparing to unpack .../libnvidia-container1_1.17.6-1_amd64.deb ...
Unpacking libnvidia-container1:amd64 (1.17.6-1) ...
Selecting previously unselected package libnvidia-container-tools.
Preparing to unpack .../libnvidia-container-tools_1.17.6-1_amd64.deb ...
Unpacking libnvidia-container-tools (1.17.6-1) ...
Selecting previously unselected package nvidia-container-toolkit-base.
Preparing to unpack .../nvidia-container-toolkit-base_1.17.6-1_amd64.deb ...
Unpacking nvidia-container-toolkit-base (1.17.6-1) ...
Selecting previously unselected package nvidia-container-toolkit.
Preparing to unpack .../nvidia-container-toolkit_1.17.6-1_amd64.deb ...
Unpacking nvidia-container-toolkit (1.17.6-1) ...
Setting up nvidia-container-toolkit-base (1.17.6-1) ...
Setting up libnvidia-container1:amd64 (1.17.6-1) ...
Setting up lib

debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype

Running kernel seems to be up-to-date.

The processor microcode seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.
time="2025-05-10T13:19:48Z" level=info msg="Config file does not exist; using empty config"
time="2025-05-10T13:19:48Z" level=info msg="Wrote updated config to /etc/docker/daemon.json"
time="2025-05-10T13:19:48Z" level=info msg="It is recommended that docker daemon be restarted."


<Result cmd='sudo systemctl restart docker' exited=0>

In the following cell, we will verify that we can see our NVIDIA GPUs from inside a container, by passing `--gpus-all`. (The `-rm` flag says to clean up the container and remove its filesystem when it finishes running.)

In [9]:
s.execute("docker run --rm --gpus all ubuntu nvidia-smi")

Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
0622fac788ed: Pulling fs layer
0622fac788ed: Verifying Checksum
0622fac788ed: Download complete
0622fac788ed: Pull complete
Digest: sha256:6015f66923d7afbc53558d7ccffd325d43b4e249f41a6e93eef074c9505d2233
Status: Downloaded newer image for ubuntu:latest


Sat May 10 13:19:58 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05              Driver Version: 560.35.05      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-PCIE-40GB          Off |   00000000:97:00.0 Off |                    0 |
| N/A   24C    P0             33W /  250W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off |   00

<Result cmd='docker run --rm --gpus all ubuntu nvidia-smi' exited=0>

Let’s pull the actual container images that we are going to use,

-   For the “Single GPU” section: a Jupyter notebook server with PyTorch and CUDA libraries
-   For the “Multiple GPU” section: a PyTorch image with NVIDIA developer tools, which we’ll need in order to install DeepSpeed

## Pull container for “Multiple GPU” section

In [10]:
s.execute("docker pull pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel")

2.5.1-cuda12.4-cudnn9-devel: Pulling from pytorch/pytorch
7021d1b70935: Pulling fs layer
0d6448aff889: Pulling fs layer
0a7674e3e8fe: Pulling fs layer
b71b637b97c5: Pulling fs layer
56dc85502937: Pulling fs layer
ec6d5f6c9ed9: Pulling fs layer
47b8539d532f: Pulling fs layer
fd9cc1ad8dee: Pulling fs layer
83525caeeb35: Pulling fs layer
8e79813a7b9d: Pulling fs layer
312a542960e3: Pulling fs layer
0acb777129a5: Pulling fs layer
e725174e3835: Pulling fs layer
4f4fb700ef54: Pulling fs layer
3093b7e1cc2f: Pulling fs layer
47b8539d532f: Waiting
fd9cc1ad8dee: Waiting
312a542960e3: Waiting
83525caeeb35: Waiting
4f4fb700ef54: Waiting
56dc85502937: Waiting
ec6d5f6c9ed9: Waiting
e725174e3835: Waiting
0acb777129a5: Waiting
8e79813a7b9d: Waiting
b71b637b97c5: Waiting
0d6448aff889: Verifying Checksum
0d6448aff889: Download complete
b71b637b97c5: Verifying Checksum
b71b637b97c5: Download complete
0a7674e3e8fe: Verifying Checksum
0a7674e3e8fe: Download complete
7021d1b70935: Verifying Checksum
7021d1b

<Result cmd='docker pull pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel' exited=0>

and let’s also install some software on the host that we’ll use in the “Multiple GPU” section:

In [11]:
s.execute("sudo apt update; sudo apt -y install nvtop")





Hit:1 https://download.docker.com/linux/ubuntu noble InRelease
Hit:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64  InRelease
Hit:3 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease
Get:4 http://nova.clouds.archive.ubuntu.com/ubuntu noble InRelease [256 kB]
Hit:5 http://security.ubuntu.com/ubuntu noble-security InRelease
Hit:6 http://nova.clouds.archive.ubuntu.com/ubuntu noble-updates InRelease
Hit:7 http://nova.clouds.archive.ubuntu.com/ubuntu noble-backports InRelease
Fetched 256 kB in 1s (412 kB/s)
Reading package lists...
Building dependency tree...
Reading state information...
4 packages can be upgraded. Run 'apt list --upgradable' to see them.






Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
  nvtop
0 upgraded, 1 newly installed, 0 to remove and 4 not upgraded.
Need to get 62.8 kB of archives.
After this operation, 180 kB of additional disk space will be used.
Get:1 http://nova.clouds.archive.ubuntu.com/ubuntu noble/multiverse amd64 nvtop amd64 3.0.2-1 [62.8 kB]


debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 


Fetched 62.8 kB in 0s (126 kB/s)
Selecting previously unselected package nvtop.
(Reading database ... 113837 files and directories currently installed.)
Preparing to unpack .../nvtop_3.0.2-1_amd64.deb ...
Unpacking nvtop (3.0.2-1) ...
Setting up nvtop (3.0.2-1) ...
Processing triggers for man-db (2.12.0-4build2) ...


debconf: unable to initialize frontend: Dialog
debconf: (Dialog frontend will not work on a dumb terminal, an emacs shell buffer, or without a controlling terminal.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype

Running kernel seems to be up-to-date.

The processor microcode seems to be up-to-date.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

No VM guests are running outdated hypervisor (qemu) binaries on this host.


<Result cmd='sudo apt update; sudo apt -y install nvtop' exited=0>

In [12]:
s.execute("git clone https://github.com/RaghuHemadri/mlops_test_raghu.git")

Cloning into 'mlops_test_raghu'...


<Result cmd='git clone https://github.com/RaghuHemadri/mlops_test_raghu.git' exited=0>

In [29]:
s.execute("export HOST_IP=$(curl --silent http://169.254.169.254/latest/meta-data/public-ipv4); cd mlops_test_raghu; docker compose -f docker-compose-ray.yaml up --build -d")
s.execute("docker ps")

Compose can now delegate builds to bake for better performance.
 To do so, set COMPOSE_BAKE=true.
#0 building with "default" instance using docker driver

#1 [ray-worker-2 internal] load build definition from Dockerfile.ray-worker
#1 transferring dockerfile: 191B done
#1 DONE 0.0s

#2 [ray-worker-1 internal] load build definition from Dockerfile.ray-worker
#2 transferring dockerfile: 191B done
#2 DONE 0.0s

#3 [jupyter internal] load build definition from Dockerfile.jupyter-ray
#3 transferring dockerfile: 340B done
#3 DONE 0.0s

#4 [ray-worker-1 internal] load metadata for docker.io/rayproject/ray:2.42.1
#4 DONE 0.0s

#5 [ray-worker-2 internal] load .dockerignore
#5 transferring context: 2B done
#5 DONE 0.0s

#6 [ray-worker-1 internal] load .dockerignore
#6 transferring context: 2B done
#6 DONE 0.0s

#7 [ray-worker-1 1/2] FROM docker.io/rayproject/ray:2.42.1
#7 DONE 0.0s

#8 [ray-worker-1 2/2] RUN pip install --no-cache-dir     torch     "lightning<2.5.0.post0"     "litgpt[all]==0.5.7"

 jupyter  Built
 ray-worker-1  Built
 ray-worker-2  Built
 Container ray-head  Created
 Container postgres  Created
 Container minio  Created
 Container jupyter-ray  Created
 Container ray_cluster-minio-create-bucket-1  Created
 Container grafana  Created
 Container mlflow  Created
 Container ray-worker-1  Created
 Container ray-worker-2  Created
 Container postgres  Starting
 Container ray-head  Starting
 Container minio  Starting
 Container minio  Started
 Container minio  Waiting
 Container postgres  Started
 Container ray-head  Started
 Container jupyter-ray  Starting
 Container grafana  Starting
 Container ray-worker-1  Starting
 Container ray-worker-2  Starting
 Container jupyter-ray  Started
 Container grafana  Started
 Container ray-worker-1  Started
 Container ray-worker-2  Started
 Container minio  Healthy
 Container ray_cluster-minio-create-bucket-1  Starting
 Container ray_cluster-minio-create-bucket-1  Started
 Container mlflow  Starting
 Container mlflow  Started


CONTAINER ID   IMAGE                           COMMAND                   CREATED       STATUS                            PORTS                                                                                                                                                                                                                             NAMES
f99625091163   ray_cluster-ray-worker-2        "ray start --address…"    3 hours ago   Up 4 seconds                                                                                                                                                                                                                                                        ray-worker-2
ab3594408a98   ray_cluster-jupyter             "tini -g -- start.sh…"    3 hours ago   Up 4 seconds (health: starting)   0.0.0.0:8888->8888/tcp, [::]:8888->8888/tcp                                                                                                                           

<Result cmd='docker ps' exited=0>

In [30]:
s.execute("cd mlops_test_raghu; sudo chmod -R 777 workspace")

<Result cmd='cd mlops_test_raghu; sudo chmod -R 777 workspace' exited=0>

In [None]:
s.execute("cd mlops_test_raghu; sudo cp train.py workspace/train.py; sudo cp retrain.py workspace/retrain.py; sudo cp raytune.py workspace/raytune.py; sudo cp ray_requirements.txt workspace/ray_requirements.txt; sudo cp ray_runtime.json workspace/ray_runtime.json")

<Result cmd='cd mlops_test_raghu; sudo cp train_ray.py workspace/train_ray.py; sudo cp ray_requirements.txt workspace/ray_requirements.txt; sudo cp ray_runtime.json workspace/ray_runtime.json' exited=0>

In [32]:
s.execute("docker logs jupyter-ray")

Entered start.sh with args: start-notebook.py
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d


Executing: jupyter lab


Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: start-notebook.py
[I 2025-05-10 14:02:35.726 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2025-05-10 14:02:35.729 ServerApp] jupyter_server_mathjax | extension was successfully linked.
[I 2025-05-10 14:02:35.731 ServerApp] jupyter_server_terminals | extension was successfully linked.
[I 2025-05-10 14:02:35.735 ServerApp] jupyterlab | extension was successfully linked.
[I 2025-05-10 14:02:35.735 ServerApp] jupyterlab_git | extension was successfully linked.
[I 2025-05-10 14:02:35.737 ServerApp] nbclassic | extension was successfully linked.
[I 2025-05-10 14:02:35.737 ServerApp] nbdime | extension was successfully linked.
[I 2025-05-10 14:02:35.740 ServerApp] notebook | extension was successfully linked.
[I 2025-05-10 14:02:35.742 ServerApp] 


  _   _          _      _


 
    
    To access the server, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/jpserver-7-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/lab?token=243112237f14ceae7ebc03ac78a5808bdec40d85b00c05c6
        http://127.0.0.1:8888/lab?token=243112237f14ceae7ebc03ac78a5808bdec40d85b00c05c6
[I 2025-05-10 14:02:36.162 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[W 2025-05-10 14:02:45.851 ServerApp] Clearing invalid/expired login cookie username-129-114-109-173-8888
[W 2025-05-10 14:02:45.865 TerminalsExtensionApp] 403 GET /t

 | | | |_ __  __| |__ _| |_ ___
 | |_| | '_ \/ _` / _` |  _/ -_)
  \___/| .__/\__,_\__,_|\__\___|
       |_|

Read the migration plan to Notebook 7 to learn about the new features and the actions to take if you are using extensions.

https://jupyter-notebook.readthedocs.io/en/latest/migrate_to_notebook7.html

Please note that updating to Notebook 7 might break some of your extensions.

Executing: jupyter lab


cabbe1703584415a46046e155fc6c58@216.165.95.86) 1.04ms
[W 2025-05-10 14:03:04.315 LabApp] Could not determine jupyterlab build status without nodejs
[I 2025-05-10 14:03:07.819 ServerApp] New terminal with automatic name: 1
[I 2025-05-10 14:08:06.429 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 14:08:32.950 ServerApp] Saving file at /work/ray_runtime.json
[I 2025-05-10 14:08:33.268 ServerApp] Saving file at /work/ray_runtime.json
[I 2025-05-10 14:08:33.502 ServerApp] Saving file at /work/ray_runtime.json
[C 2025-05-10 14:12:34.236 ServerApp] received signal 15, stopping
[I 2025-05-10 14:12:34.237 ServerApp] Shutting down 9 extensions
Entered start.sh with args: start-notebook.py
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Done running hooks i


  _   _          _      _


le at /work/train_ray.py
[I 2025-05-10 15:50:39.744 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 15:50:48.832 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 15:51:04.991 ServerApp] Saving file at /work/ray_runtime.json
[I 2025-05-10 15:51:05.231 ServerApp] Saving file at /work/ray_runtime.json
[I 2025-05-10 15:51:17.236 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 15:52:39.976 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 15:52:56.017 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 15:52:57.246 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 15:53:04.201 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:06:08.940 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:06:09.233 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:06:09.457 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:06:11.116 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:06:11.416 Serv

 | | | |_ __  __| |__ _| |_ ___
 | |_| | '_ \/ _` / _` |  _/ -_)
  \___/| .__/\__,_\__,_|\__\___|
       |_|

Read the migration plan to Notebook 7 to learn about the new features and the actions to take if you are using extensions.

https://jupyter-notebook.readthedocs.io/en/latest/migrate_to_notebook7.html

Please note that updating to Notebook 7 might break some of your extensions.

Executing: jupyter lab


erApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:06:11.668 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:06:11.910 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:06:12.152 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:08:49.662 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:08:49.889 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:09:37.746 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:18:43.706 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:18:46.158 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:22:01.810 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:36:51.548 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:36:51.776 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:47:16.152 ServerApp] Saving file at /work/train_ray.py
[I 2025-05-10 16:48:01.186 ServerApp] Saving file at /work/train_ray.py
[C 2025-05-10 16:53:54.

<Result cmd='docker logs jupyter-ray' exited=0>