In [1]:
pip install novae

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install 'novae[multimodal,conch]'

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install openpyxl

Note: you may need to restart the kernel to use updated packages.


In [4]:
import novae
import scanpy as sc
import os
import time
import pandas as pd
import openpyxl

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
input_dir = "/workspace/Projects/FM/Final data/H5AD/Visium novae"

# Output directory for embedded files
output_dir = "/workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune"
os.makedirs(output_dir, exist_ok=True)

# Collect all .h5ad files
h5ad_files = [f for f in os.listdir(input_dir) if f.endswith(".h5ad")]

In [6]:
summary_records = []

for f in h5ad_files:
    file_path = os.path.join(input_dir, f)
    print(f"\nProcessing {file_path}")


    adata = sc.read_h5ad(file_path)
    adata.obsm["spatial"] = adata.obsm["spatial"].to_numpy().astype("float32")

    start_time = time.time()
    
    # if you have multiple samples in the same adata object, specify `slide_key`
    novae.spatial_neighbors(adata, radius=200)

    # Option 2: fine-tuning (recommended for better performance)
    model = novae.Novae.from_pretrained("MICS-Lab/novae-human-0")
    model.fine_tune(adata, accelerator="gpu", num_workers=4)
    model.compute_representations(adata, accelerator="gpu", num_workers=4)

    elapsed = time.time() - start_time

    shape = adata.obsm["novae_latent"].shape
    print(f"Embedding shape: {shape}")
    print(f"Time taken: {elapsed:.2f} seconds")

    out_path = os.path.join(output_dir, f.replace(".h5ad", "_novae.h5ad"))
    adata.write(out_path)
    print(f" Saved embeddings → {out_path}")

    summary_records.append({
        "sample": f,
        "n_cells": shape[0],
        "embedding_dim": shape[1],
        "runtime_seconds": round(elapsed, 2),
        "status": "success"
    })

summary_df = pd.DataFrame(summary_records)
excel_path = os.path.join(output_dir, "embedding_runtime_summary.xlsx")
summary_df.to_excel(excel_path, index=False)

print(f"\n Summary saved to {excel_path}")


Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/53430.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 4,271 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 9/9 [00:02<00:00,  4.40it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

`Trainer.fit` stopped: `max_epochs=20` reached.


Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 9/9 [00:01<00:00,  5.35it/s]


Embedding shape: (4271, 64)
Time taken: 122.72 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/53430_novae.h5ad

Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/26933.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 2,031 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 4/4 [00:01<00:00,  3.49it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 4/4 [00:00<00:00,  4.67it/s]


Embedding shape: (2031, 64)
Time taken: 64.33 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/26933_novae.h5ad

Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/26934.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 2,505 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 5/5 [00:01<00:00,  3.28it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 5/5 [00:01<00:00,  4.71it/s]


Embedding shape: (2505, 64)
Time taken: 78.01 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/26934_novae.h5ad

Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/53433.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 3,511 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 7/7 [00:01<00:00,  4.60it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

`Trainer.fit` stopped: `max_epochs=20` reached.


Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 7/7 [00:01<00:00,  6.62it/s]


Embedding shape: (3511, 64)
Time taken: 102.53 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/53433_novae.h5ad

Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/26935.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 2,154 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 5/5 [00:02<00:00,  1.87it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 5/5 [00:01<00:00,  3.96it/s]


Embedding shape: (2154, 64)
Time taken: 72.16 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/26935_novae.h5ad

Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/26932.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 2,182 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 5/5 [00:01<00:00,  3.68it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 5/5 [00:00<00:00,  5.51it/s]


Embedding shape: (2182, 64)
Time taken: 70.00 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/26932_novae.h5ad

Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/53435.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 1,910 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 4/4 [00:01<00:00,  2.73it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 4/4 [00:01<00:00,  3.87it/s]


Embedding shape: (1910, 64)
Time taken: 73.61 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/53435_novae.h5ad

Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/53431.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 2,718 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 6/6 [00:01<00:00,  4.09it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 6/6 [00:01<00:00,  4.24it/s]


Embedding shape: (2718, 64)
Time taken: 68.92 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/53431_novae.h5ad

Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/53434.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 3,176 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 7/7 [00:01<00:00,  3.68it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 7/7 [00:01<00:00,  5.14it/s]


Embedding shape: (3176, 64)
Time taken: 103.83 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/53434_novae.h5ad

Processing /workspace/Projects/FM/Final data/H5AD/Visium novae/53432.h5ad


[36;20m[INFO] (novae.utils.build)[0m Computing graph on 3,376 cells (coord_type=generic, delaunay=True, radius=[0.0, 200.0], n_neighs=None)
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
[36;20m[INFO] (novae.utils._validate)[0m Preprocessed 1 adata object(s) with sc.pp.normalize_total and sc.pp.log1p (raw counts were saved in adata.layers['counts'])
Computing representations: 100%|██████████| 7/7 [00:01<00:00,  3.84it/s]
Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
GPU available: True (cuda), used: True
TPU available: Fals

Trainer will use only 1 of 2 GPUs because it is running inside an interactive / notebook environment. You may try to set `Trainer(devices=2)` but please note that multi-GPU inside interactive / notebook environments is considered experimental and unstable. Your mileage may vary.
Computing representations: 100%|██████████| 7/7 [00:01<00:00,  6.00it/s]


Embedding shape: (3376, 64)
Time taken: 85.32 seconds
 Saved embeddings → /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/53432_novae.h5ad

 Summary saved to /workspace/Projects/FM/Final data/Visium_embeddings_novae_finetune/embedding_runtime_summary.xlsx
