<a href="https://colab.research.google.com/github/ImagingDataCommons/CloudSegmentator/blob/v1.3.0/workflows/TotalSegmentator/Notebooks/inferenceTotalSegmentatorNotebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **This Notebook performs inference using TotalSegmentator (v1.5.6) with CT NIfTI file as input and produces multilabel Segmentation Maps NIfTI file**

Please cite:

Jakob Wasserthal, Manfred Meyer, Hanns-Christian Breit, Joshy Cyriac, Shan Yang, & Martin Segeroth. (2022). TotalSegmentator: robust segmentation of 104 anatomical structures in CT images. https://doi.org/10.48550/arXiv.2208.05868

Isensee, F., Jaeger, P.F., Kohl, S.A.A. et al. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18, 203–211 (2021). https://doi.org/10.1038/s41592-020-01008-z

<img src="https://raw.githubusercontent.com/ImagingDataCommons/CloudSegmentator/v1.3.0/workflows/TotalSegmentator/Docs/images/inference.png">

Expected file directory
```
dcm2niix
 └─── $series_id_1
       ├─── $CT_NIfTI.nii.gz
       │
 └───  $series_id_2
       ├─── $CT_NIfTI.nii.gz
       ├───  ...
       │
 └───  $series_id_n
       └─── $CT_NIfTI.nii.gz

```

### **Ways to utilize this notebook**


*   **Colab**
*   **DockerContainer/Terra/SB-CGC**


#### **Colab**
*  This notebook was initally developed and tested on Colab, and a working version is saved on github
*  To run this notebook with Colab, Click 'Open In Colab' icon on top left
*  A sample lz4 file is provided for convenience and can be downloaded when running the notebook in interactive mode
*  Run each cell to install the packages and to run the inference using TotalSegmentator and the segmentation maps in NIfTI format are saved in lz4 compressed format


#### **Docker**
*  This notebook is primarly developed to be used on Terra/SB-CGC platforms using Docker
*  Running this notebook in a docker container ensures reproduciblity, as we lock the run environment beginning from the base docker image to pip packages in the docker image
*  Docker images can be found @ https://hub.docker.com/repository/docker/imagingdatacommons/inference_totalseg/tags
*  The link to dockerfile along with git commit hash used for building the docker image can be found in one of the layers called 'LABEL'

    <img src="https://raw.githubusercontent.com/ImagingDataCommons/CloudSegmentator/v1.3.0/workflows/TotalSegmentator/Docs/images/inference_docker.png">

* We use a python package called Papermill, that can run the notebook with out having to convert it to python script. This allows us maintain one copy of code instead of two.
* A sample papermill command is
    <pre>
    papermill -p niftiFilePath path_to_ct_nifti_files.lz4 inferenceTotalSegmentatorNotebook.ipynb.ipynb  output_inferenceTotalSegmentatorNotebook.ipynb
    </pre>

### **Installing Packages**

In [None]:
%%capture
import sys
if 'google.colab' in sys.modules:
    !sudo apt-get update && apt-get install -y --no-install-recommends \
        build-essential\
        ffmpeg\
        lz4\
        python3-dev\
        python3-pip\
        wget\
        unzip\
        xvfb\
    && rm -rf /var/lib/apt/lists/*

In [None]:
%%capture
if 'google.colab' in sys.modules:
    !pip install --no-cache-dir \
        ipykernel==6.22.0\
        ipython==8.12.0\
        ipywidgets==8.0.6\
        jupyter==1.0.0\
        papermill==2.4.0\
        nvidia-ml-py3==7.352.0\
        requests==2.27.1\
        TotalSegmentator==1.5.6\
    && pip install --no-cache-dir \
        pyradiomics==3.0.1

### **Importing Packages**

In [None]:
import glob
import os
import sys
import shutil
from pathlib import Path
import time
import subprocess
from concurrent.futures import ThreadPoolExecutor
from time import sleep
from datetime import datetime
import psutil
import pandas as pd
import matplotlib.pyplot as plt
import nvidia_smi

### **Current Environment**

In [None]:
curr_dir   = Path().absolute()

print(time.asctime(time.localtime()))
print("\nCurrent directory :{}".format( curr_dir))
print("Python version    :", sys.version.split('\n')[0])

### **For local testing**

By default a sample manifest containing CT NIfTI files are chosen  here. However, you can modify them to your usecase.

Below cell is also tagged as `parameters`, so that when running this notebook in non interactive mode on Terra or Seven Bridges Genomics- Cancer Genomics Cloud platforms, papermill will inject a cell to pass the path of the CT NIfTI files

In [None]:
%%capture
if 'google.colab' in sys.modules:
    !wget -q https://github.com/ImagingDataCommons/CloudSegmentator/releases/download/v1.0.0/downloadDicomAndConvertNiftiFiles.tar.lz4
    niftiFilePath=glob.glob('*.lz4')[0]

### **Decompressing NIFTI files from first step**

In [None]:
!lz4 -d --rm {niftiFilePath} -c | tar  --strip-components=0  -xvf -

### **Defining Functions**

In [None]:
#create directory for TotalSegmentator Output files
try:
  shutil.rmtree('Inference')

except OSError:
  pass
os.mkdir('Inference')

In [None]:
class MemoryMonitor:
    def __init__(self):
        self.keep_measuring = True
        self.working_disk_path = self.get_working_disk_path()

    def get_working_disk_path(self):
        partitions = psutil.disk_partitions()
        for partition in partitions:
            if partition.mountpoint == '/':
                return '/'
            elif '/cromwell_root' in partition.mountpoint:
                return '/cromwell_root'
        return '/'  # Default to root directory if no specific path is found
    def measure_usage(self):
        cpu_usage = []
        ram_usage_mb=[]
        gpu_usage_mb=[]
        disk_usage_all=[]
        time_stamps = []
        start_time = time.time()
        while self.keep_measuring:
            cpu = psutil.cpu_percent()
            ram = psutil.virtual_memory()
            disk_usage = psutil.disk_usage(self.working_disk_path)
            disk_used = disk_usage.used / 1000 / 1000 / 1000
            disk_total = disk_usage.total / 1000 / 1000 / 1000
            ram_total_mb = psutil.virtual_memory().total / 1000 / 1000
            ram_mb = (ram.total - ram.available) / 1000 / 1000

            try:
                nvidia_smi.nvmlInit()
                handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
                info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
                gpu_type = nvidia_smi.nvmlDeviceGetName(handle)
                gpu_total_mb = info.total/1000/1000
                gpu_mb = info.used/1000/1000
                nvidia_smi.nvmlShutdown()
            except:
                gpu_type = ''
                gpu_total_mb = 0
                gpu_mb = 0

            cpu_usage.append(cpu)
            ram_usage_mb.append(ram_mb)
            disk_usage_all.append(disk_used)
            gpu_usage_mb.append(gpu_mb)
            time_stamps.append(time.time()- start_time)
            sleep(1)

        return cpu_usage, ram_usage_mb, time_stamps, ram_total_mb, gpu_usage_mb, gpu_total_mb, gpu_type, disk_usage_all, disk_total

In [None]:
def check_total_segmentator_errors(series_id: str):
  """
  This function checks if the output files from TotalSegmentator exist.

  Args:
  series_id (str): The DICOM Tag SeriesInstanceUID of the DICOM series to be checked.

  Returns:
  bool: True if any of the output files do not exist, False otherwise.
  """

  # Define the output files from TotalSegmentator
  output_files = [f"Inference/{series_id}/segmentations.nii"]

  # Check if all output files exist
  if not all(os.path.exists(file) for file in output_files):
      # If any of the output files do not exist, log an error
      with open('totalsegmentator_errors.txt', 'a') as f:
          f.write(f"Error: TotalSegmentator failed for series {series_id}\n")
      return True

  return False


In [None]:
def inferenceTotalSegmentator(series_id: str, runtime_stats: pd.DataFrame) -> pd.DataFrame:
    """
    This function performs inference using TotalSegmentator on a given series.

    Args:
    series_id (str): The DICOM Tag SeriesInstanceUID of the DICOM series to be processed.
    runtime_stats: DataFrame to store runtime statistics.

    Returns:
    Updated DataFrame with runtime statistics.
    """

    # Remove existing directories and files if they exist
    shutil.rmtree(f"Inference/{series_id}", ignore_errors=True)
    shutil.rmtree(f"metadata/{series_id}", ignore_errors=True)
    for file in ["segmentations.nii.gz"]:
        try:
            os.remove(file)
        except OSError:
            pass

    # Create a new directory for the series
    os.makedirs(f"Inference/{series_id}", exist_ok=True)

    print(f"Processing series: {series_id}")

    log = pd.DataFrame({"SeriesInstanceUID": [series_id]})
    series_id_folder_path = os.path.join("dcm2niix", series_id)

    # Get the first (and only) file in the list
    nifti_filename = os.listdir(series_id_folder_path)[0]
    nifti_filename_path = os.path.join(series_id_folder_path, nifti_filename)

    start_time = time.time()
    result = subprocess.run(
        ["TotalSegmentator", "-i", nifti_filename_path, "-o", "segmentations", "--ml"],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        universal_newlines=True,
    )
    print(result.stdout)
    total_segmentator_time = time.time() - start_time

    # Move the output files to the appropriate directory
    try:
        shutil.move(f"segmentations.nii", f"Inference/{series_id}/")
        print("Files moved successfully using the first command")
    except FileNotFoundError:
        try:
            shutil.move("segmentations/segmentations.nii", f"Inference/{series_id}/")
            print("Files moved successfully using the second command")
        except FileNotFoundError:
            print("Error: Failed to move files using both commands")

    check_total_segmentator_errors(series_id)

    shutil.move(
        f"Inference/{series_id}/segmentations.nii",
        f"Inference/{series_id}/{series_id}.nii",
    )

    start_time = time.time()
    subprocess.run(
        [
            "lz4",
            "--rm",
            f"Inference/{series_id}/{series_id}.nii",
            f"Inference/{series_id}/{series_id}.nii.lz4",
        ],
        check=True,
    )

    archiving_time = time.time() - start_time

    log["total_segmentator_time"] = total_segmentator_time
    log["archiving_time"] = archiving_time

    shutil.rmtree(f"dcm2niix/{series_id}", ignore_errors=True)


    runtime_stats = pd.concat([runtime_stats, log], ignore_index=True, axis=0)

    return runtime_stats

### **Total Segmentator**

In [None]:
runtime_stats = pd.DataFrame(columns=['SeriesInstanceUID','total_segmentator_time',
                                      'archiving_time', 'cpu_usage','ram_usage_mb', 'ram_total_mb',
                                      'gpu_usage_mb', 'gpu_total_mb', 'gpu_type', 'disk_usage_all', 'disk_total'
                                      ])
if __name__ == "__main__":
    for series_id in os.listdir('dcm2niix'):
        with ThreadPoolExecutor() as executor:
            monitor = MemoryMonitor()
            mem_thread = executor.submit(monitor.measure_usage)
            try:
                proc_thread = executor.submit(inferenceTotalSegmentator, series_id, runtime_stats)
                runtime_stats = proc_thread.result()
            finally:
                monitor.keep_measuring = False
                cpu_usage, ram_usage_mb, time_stamps, ram_total_mb, gpu_usage_mb, gpu_total_mb, gpu_type, disk_usage_all, disk_total= mem_thread.result()

                cpu_idx = runtime_stats.index[runtime_stats['SeriesInstanceUID'] == series_id][0]
                runtime_stats.iloc[cpu_idx, runtime_stats.columns.get_loc('cpu_usage')] = [[cpu_usage]]

                ram_usage_mb_idx = runtime_stats.index[runtime_stats['SeriesInstanceUID'] == series_id][0]
                runtime_stats.iloc[ram_usage_mb_idx, runtime_stats.columns.get_loc('ram_usage_mb')] = [[ram_usage_mb]]

                ram_total_mb_idx = runtime_stats.index[runtime_stats['SeriesInstanceUID'] == series_id][0]
                runtime_stats.iloc[ram_total_mb_idx, runtime_stats.columns.get_loc('ram_total_mb')] = [[ram_total_mb]]

                gpu_total_mb_idx = runtime_stats.index[runtime_stats['SeriesInstanceUID'] == series_id][0]
                runtime_stats.iloc[gpu_total_mb_idx, runtime_stats.columns.get_loc('gpu_total_mb')] = [[gpu_total_mb]]

                gpu_usage_mb_idx = runtime_stats.index[runtime_stats['SeriesInstanceUID'] == series_id][0]
                runtime_stats.iloc[gpu_usage_mb_idx, runtime_stats.columns.get_loc('gpu_usage_mb')] = [[gpu_usage_mb]]

                disk_usage_gb_idx = runtime_stats.index[runtime_stats['SeriesInstanceUID'] == series_id][0]
                runtime_stats.iloc[disk_usage_gb_idx, runtime_stats.columns.get_loc('disk_usage_all')] = [[disk_usage_all]]

                runtime_stats['gpu_type']=gpu_type
                runtime_stats['disk_total']=disk_total

                fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2,2, figsize=(8, 6))

                ax1.plot(time_stamps, cpu_usage)
                ax1.set_ylim(0, 100)
                ax1.set_xlabel('Time (s)')
                ax1.set_ylabel('CPU usage (%)')

                ax2.plot(time_stamps, ram_usage_mb)
                ax2.set_ylim(0, ram_total_mb)
                ax2.set_xlabel('Time (s)')
                ax2.set_ylabel('Memory usage (MB)')

                ax3.plot(time_stamps, gpu_usage_mb)
                ax3.set_ylim(0, gpu_total_mb)
                ax3.set_xlabel('Time (s)')
                ax3.set_ylabel('GPU Memory usage (MB)')

                ax4.plot(time_stamps, disk_usage_all)
                ax4.set_ylim(0, disk_total)
                ax4.set_xlabel('Time (s)')
                ax4.set_ylabel('Disk usage (GB)')
                plt.show()

### **Compressing Output Files**

In [None]:
start_time = time.time()
try:
  os.remove('inferenceNiftiFiles.tar.lz4')
  #os.remove('metadata.tar.lz4')
except OSError:
  pass
!tar cvf - -C {curr_dir} Inference | lz4 > inferenceNiftiFiles.tar.lz4
archiving_time = time.time() - start_time


### **Utilization Metrics**

In [None]:
runtime_stats.to_csv('runtime.csv')
runtime_stats['archiving_time']=archiving_time
try:
  os.remove('inferenceUsageMetrics.lz4')
except OSError:
  pass
!lz4 runtime.csv inferenceUsageMetrics.lz4
runtime_stats