https://www.upwork.com/jobs/Google-Colab-Developer-Needed-for-Rib-Segmentation-Algorithm-Implementation_~021871373632866913324/?referrer_url_path=find_work_home

create a Google Colab environment for implementing the Rib Segmentation Algorithm (Rib Seg V2). The ideal candidate will have experience in setting up Python environments and working with machine learning models in Colab. You should be familiar with algorithm integration and have a strong understanding of image segmentation techniques. If you're technical, detail-oriented, and passionate about building effective solutions, we want to hear from you! Heres the link to it here: https://github.com/M3DV/RibSeg/tree/ribsegv2


Access to Datasets:

RibSeg V2 Dataset: https://drive.google.com/file/d/1ZZGGrhd0y1fLyOZGo_Y-wlVUP4lkHVgm/view?usp=sharing

RibSeg V2 Description Document: https://docs.google.com/spreadsheets/d/1lz9liWPy8yHybKCdO3BCA9K76QH8a54XduiZS_9fK70/edit?usp=sharing

RibSeg V2 Annotations as Mesh: https://drive.google.com/file/d/1b_qcg99efU8cF2pXshl2ZFxi4LCQOmpw/view?usp=sharing

RibFrac Dataset: Accessible via the MICCAI 2020 RibFrac Challenge. Note: You need to join the challenge or request access to download the dataset.




**MICCAI 2020 RibFrac Challenge:**

https://ribfrac.grand-challenge.org/dataset/

Rib Fracture Detection and Classification
Download

The RibFrac dataset consists of Training Set, Validation Set and Test Set.  We provide our dataset on zenodo.org. Due to size limit of zenodo.org, we split the whole RibFrac Training Set into 2 parts (Training Set Part 1 and Training Set Part 2).

The challenge releases both images and annotations of Training Set and Validation Set; We recommend all participarts to use our offcial data split. Images of Test Set will be released in a short period (refer to Overview: Important Dates), whose prediction results submitted by participants will be evaluated online.

Links of all subsets on zenodo.org:

RibFrac Dataset: A Benchmark for Rib Fracture Detection, Segmentation and Classification (Training Set Part 1): 300 chest-abdomen CTs and annotations in NII format (nii.gz).
RibFrac Dataset: A Benchmark for Rib Fracture Detection, Segmentation and Classification (Training Set Part 2): 120 chest-abdomen CTs and annotations in NII format (nii.gz).
RibFrac Dataset: A Benchmark for Rib Fracture Detection, Segmentation and Classification (Tuning/Validation Set): 80 chest-abdomen CTs and annotations in NII format (nii.gz).
RibFrac Dataset: A Benchmark for Rib Fracture Detection, Segmentation and Classification (Test Set): 160 chest-abdomen CTs in NII format (nii.gz) without annotations.
Ranking will be based on evaluation performed on RibFrac Test Set.

Description for info files (ribfrac-train-info-1.csv, ribfrac-train-info-2.csv, ribfrac-val-info.csv)

public_id: anonymous patient ID to match images and annotations.
label_id: discrete label value in the NII annotations.
label_code: 0, 1, 2, 3, 4, -1
- 0: it is background
- 1: it is a displaced rib fracture
- 2: it is a non-displaced rib fracture
- 3: it is a buckle rib fracture
- 4: it is a segmental rib fracture
- -1: it is a rib fracture,  but we could not define its type due to
  ambiguity, diagnosis difficulty, etc. Ignore it in the
  classification task.
For participants from China mainland, we provide a mirror link of CT images. Note that you still need to download annotations from zenodo.org.

RibFrac Training Set images (including Part1 and Part2). (提取码: 1wxl)
RibFrac Validation Set images.  (提取码: whs1)
RibFrac Test Set images. (提取码: rvr8)
License
The RibFrac dataset is a research effort of thousands of hours by experienced radiologists, computer scientists and engineers. We kindly ask you to respect our effort by appropriate citation and keeping data license.

Please note that this dataset is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which means:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use
NonCommercial — You may not use the material for commercial purposes.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

**Clong RibSeg Repo**

clone the RibSeg repository and checkout the ribsegv2 branch.



In [1]:
# Install Git if not already installed
!apt-get install git -y

# Clone the RibSeg repository
!git clone https://github.com/M3DV/RibSeg.git

# Navigate to the RibSeg directory
%cd RibSeg

# Fetch all branches
!git fetch --all

# Checkout the ribsegv2 branch
!git checkout ribsegv2

# Verify the current branch
!git branch


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git is already the newest version (1:2.34.1-1ubuntu1.11).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.
Cloning into 'RibSeg'...
remote: Enumerating objects: 182, done.[K
remote: Counting objects: 100% (29/29), done.[K
remote: Compressing objects: 100% (23/23), done.[K
remote: Total 182 (delta 14), reused 10 (delta 6), pack-reused 153 (from 1)[K
Receiving objects: 100% (182/182), 18.86 MiB | 18.53 MiB/s, done.
Resolving deltas: 100% (84/84), done.
/content/RibSeg
Fetching origin
Branch 'ribsegv2' set up to track remote branch 'ribsegv2' from 'origin'.
Switched to a new branch 'ribsegv2'
  main[m
* [32mribsegv2[m


Copying Necessary Scripts from Main Branch to ribsegv2

The ribsegv2 branch may not contain all the necessary scripts required for data preparation, training, and inference. We'll copy these scripts from the main branch into ribsegv2.



In [2]:
# Ensure you're on the ribsegv2 branch
!git checkout ribsegv2

# List of necessary scripts
scripts = ['data_prepare.py', 'inference.py', 'post_proc.py', 'train_ribseg.py', 'test_ribseg.py']

# Copy each script from main branch to ribsegv2 branch
for script in scripts:
    !git checkout main -- {script}


Already on 'ribsegv2'
Your branch is up to date with 'origin/ribsegv2'.
error: pathspec 'test_ribseg.py' did not match any file(s) known to git


Installing Dependencies

RibSeg V2 requires several Python packages. Install them using pip. It's recommended to use a requirements.txt if available, but based on the GitHub brief, we'll install the necessary packages manually.

In [3]:
# Upgrade pip
!pip install --upgrade pip

# Install core dependencies
!pip install numpy nibabel torch torchvision torchaudio

# Install additional dependencies
!pip install scikit-learn matplotlib trimesh open3d pyntcloud h5py tqdm

# Install gdown for downloading files from Google Drive
!pip install gdown

# Install TensorBoard for monitoring training
!pip install tensorboard


Collecting pip
  Downloading pip-24.3.1-py3-none-any.whl.metadata (3.7 kB)
Downloading pip-24.3.1-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m30.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-24.3.1
Collecting trimesh
  Downloading trimesh-4.5.3-py3-none-any.whl.metadata (18 kB)
Collecting open3d
  Downloading open3d-0.18.0-cp310-cp310-manylinux_2_27_x86_64.whl.metadata (4.2 kB)
Collecting pyntcloud
  Downloading pyntcloud-0.3.1-py2.py3-none-any.whl.metadata (4.6 kB)
Collecting dash>=2.6.0 (from open3d)
  Downloading dash-2.18.2-py3-none-any.whl.metadata (10 kB)
Collecting configargparse (from open3d)
  Downloading ConfigArgParse-1.7-py3-none-any.whl.metadata (23 kB)
Collecting ipywidgets>=8.0.4 (from open3d)
  Downloading ipywidg

**Mount Google Drive**

If your datasets are large or you prefer to store them in Google Drive, you can mount your drive to Colab.

In [4]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


**Downloading the RibSeg V2 Dataset**

Download the RibSeg V2 dataset from the provided Google Drive link and place it in the appropriate directory.

In [5]:
import os
import zipfile
import gdown

# Define the URL and output path for the RibSeg V2 dataset
ribsegv2_dataset_url = 'https://drive.google.com/uc?id=1ZZGGrhd0y1fLyOZGo_Y-wlVUP4lkHVgm'
ribsegv2_zip_path = '/content/RibSeg/data/ribsegv2.zip'

# Create the data directory if it doesn't exist
os.makedirs('/content/RibSeg/data', exist_ok=True)

# Download the dataset using gdown
!gdown --id 1ZZGGrhd0y1fLyOZGo_Y-wlVUP4lkHVgm -O {ribsegv2_zip_path}

# Extract the dataset
with zipfile.ZipFile(ribsegv2_zip_path, 'r') as zip_ref:
    zip_ref.extractall('/content/RibSeg/data/ribsegv2')

# Remove the zip file to save space
os.remove(ribsegv2_zip_path)


Downloading...
From (original): https://drive.google.com/uc?id=1ZZGGrhd0y1fLyOZGo_Y-wlVUP4lkHVgm
From (redirected): https://drive.google.com/uc?id=1ZZGGrhd0y1fLyOZGo_Y-wlVUP4lkHVgm&confirm=t&uuid=eca1b207-48e4-4e84-be35-831d6ded2fed
To: /content/RibSeg/data/ribsegv2.zip
100% 608M/608M [00:11<00:00, 50.8MB/s]


**Downloading the RibSeg V2 Description Document**

Access the RibSeg V2 description document from Google Sheets. Since it's a spreadsheet, it's best to keep it in Google Drive for reference.

In [6]:
# Install gdown to download files from Google Drive
!pip install gdown
import os

# Define the URL and output path for the Google Sheet
google_sheet_url = 'https://docs.google.com/spreadsheets/d/1lz9liWPy8yHybKCdO3BCA9K76QH8a54XduiZS_9fK70/export?format=xlsx'
output_path = '/content/RibSeg/data/ribsegv2_description.xlsx'

# Download the Google Sheet and save as an Excel file
!wget -O $output_path $google_sheet_url

# Confirm the download
if not os.path.exists(output_path):
    print("Download failed.")
else:
    print(f"Downloaded file is saved to: {output_path}")


--2024-12-24 08:01:49--  https://docs.google.com/spreadsheets/d/1lz9liWPy8yHybKCdO3BCA9K76QH8a54XduiZS_9fK70/export?format=xlsx
Resolving docs.google.com (docs.google.com)... 142.251.2.102, 142.251.2.113, 142.251.2.101, ...
Connecting to docs.google.com (docs.google.com)|142.251.2.102|:443... connected.
HTTP request sent, awaiting response... 307 Temporary Redirect
Location: https://doc-0g-b0-sheets.googleusercontent.com/export/54bogvaave6cua4cdnls17ksc4/j3028mtvvjajgkk9rq96360ltk/1735027310000/110401750412225248239/*/1lz9liWPy8yHybKCdO3BCA9K76QH8a54XduiZS_9fK70?format=xlsx [following]
--2024-12-24 08:01:50--  https://doc-0g-b0-sheets.googleusercontent.com/export/54bogvaave6cua4cdnls17ksc4/j3028mtvvjajgkk9rq96360ltk/1735027310000/110401750412225248239/*/1lz9liWPy8yHybKCdO3BCA9K76QH8a54XduiZS_9fK70?format=xlsx
Resolving doc-0g-b0-sheets.googleusercontent.com (doc-0g-b0-sheets.googleusercontent.com)... 142.251.2.132, 2607:f8b0:4023:c0d::84
Connecting to doc-0g-b0-sheets.googleusercontent

**Downloading the RibSeg V2 Annotations as Mesh**

Download the RibSeg V2 annotations provided as mesh files.

In [7]:
# Define the URL and output path for the RibSeg V2 annotations as mesh
annotations_mesh_url = 'https://drive.google.com/uc?id=1b_qcg99efU8cF2pXshl2ZFxi4LCQOmpw'
annotations_zip_path = '/content/RibSeg/data/ribsegv2_annotations_mesh.zip'

# Download the annotations using gdown
!gdown --id 1b_qcg99efU8cF2pXshl2ZFxi4LCQOmpw -O {annotations_zip_path}

# Create the destination directory
os.makedirs('/content/RibSeg/data/ribsegv2_annotations_mesh', exist_ok=True)

# Extract the annotations
with zipfile.ZipFile(annotations_zip_path, 'r') as zip_ref:
    zip_ref.extractall('/content/RibSeg/data/ribsegv2_annotations_mesh')

# Remove the zip file to save space
os.remove(annotations_zip_path)


Downloading...
From (original): https://drive.google.com/uc?id=1b_qcg99efU8cF2pXshl2ZFxi4LCQOmpw
From (redirected): https://drive.google.com/uc?id=1b_qcg99efU8cF2pXshl2ZFxi4LCQOmpw&confirm=t&uuid=6c82bcc5-247c-4b6f-952f-9ca0d6f1ed6d
To: /content/RibSeg/data/ribsegv2_annotations_mesh.zip
100% 5.64G/5.64G [01:15<00:00, 74.6MB/s]


**Make a directory in MyDrive**

/content/drive/MyDrive/upwork/RibSeg/RibFrac



In [8]:
import os

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Define the directory path
directory_path = '/content/drive/MyDrive/upwork/RibSeg/RibFrac'

# Create the directory if it doesn't exist
os.makedirs(directory_path, exist_ok=True)

# Confirm the directory was created
if os.path.exists(directory_path):
    print(f"Directory created successfully at: {directory_path}")
else:
    print("Failed to create directory.")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Directory created successfully at: /content/drive/MyDrive/upwork/RibSeg/RibFrac


**Downloading the RibFrac Dataset**

The RibFrac Dataset is essential for training the RibSeg V2 model.

It consists of multiple parts: Training Set Part 1, Training Set Part 2, Validation Set, and Test Set.


Steps:

Visit the RibFrac Challenge Page.
Click on Join Challenge and follow the instructions to gain access.

Download the following subsets:

*   Training Set Part 1: https://doi.org/10.5281/zenodo.3893507 [300 chest-abdomen CTs and annotations in NII format (nii.gz).]
Link: https://zenodo.org/records/3893508/files/ribfrac-train-images-1.zip?download=1

*   Training Set Part 2: https://doi.org/10.5281/zenodo.3893497 [120 chest-abdomen CTs and annotations in NII format (nii.gz).]
Link: https://zenodo.org/records/3893498/files/ribfrac-train-images-2.zip?download=1

*   Validation Set: https://doi.org/10.5281/zenodo.3893495 [80 chest-abdomen CTs and annotations in NII format (nii.gz).]
Link: https://zenodo.org/records/3893496/files/ribfrac-val-images.zip?download=1

*   Test Set: https://zenodo.org/record/3993380 [160 chest-abdomen CTs in NII format (nii.gz) without annotations.]
Link: https://zenodo.org/records/3993380/files/ribfrac-test-images.zip?download=1


In [None]:
import os
import zipfile
import gdown

# Define the Google Drive path where the RibFrac dataset will be saved
ribfrac_drive_path = '/content/drive/MyDrive/upwork/RibSeg/RibFrac'

# Create the RibFrac directory in Google Drive if it doesn't exist
os.makedirs(ribfrac_drive_path, exist_ok=True)

# List of RibFrac dataset parts with their corresponding download links and filenames
ribfrac_datasets = {
    'ribfrac-train-images-1.zip': 'https://zenodo.org/records/3893508/files/ribfrac-train-images-1.zip?download=1',
    'ribfrac-train-images-2.zip': 'https://zenodo.org/records/3893498/files/ribfrac-train-images-2.zip?download=1',
    'ribfrac-val-images.zip': 'https://zenodo.org/records/3893496/files/ribfrac-val-images.zip?download=1',
    'ribfrac-test-images.zip': 'https://zenodo.org/records/3993380/files/ribfrac-test-images.zip?download=1'
}

# Function to download files using wget
def download_ribfrac(file_name, url, destination_folder):
    output_path = os.path.join(destination_folder, file_name)
    if not os.path.exists(output_path):
        print(f'Downloading {file_name}...')
        !wget -c "{url}" -O "{output_path}"
    else:
        print(f'{file_name} already exists. Skipping download.')

# Download all RibFrac dataset parts
for file_name, url in ribfrac_datasets.items():
    download_ribfrac(file_name, url, ribfrac_drive_path)

# Define extraction paths
extraction_paths = {
    'ribfrac-train-images-1.zip': '/content/RibSeg/data/ribfrac/train_part1',
    'ribfrac-train-images-2.zip': '/content/RibSeg/data/ribfrac/train_part2',
    'ribfrac-val-images.zip': '/content/RibSeg/data/ribfrac/val',
    'ribfrac-test-images.zip': '/content/RibSeg/data/ribfrac/test'
}

# Extract the datasets
def extract_zip(file_name, extract_to):
    zip_path = os.path.join(ribfrac_drive_path, file_name)
    if not os.path.exists(extract_to):
        os.makedirs(extract_to, exist_ok=True)
        print(f'Extracting {file_name}...')
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(extract_to)
    else:
        print(f'{extract_to} already exists. Skipping extraction.')

for file_name, extract_to in extraction_paths.items():
    extract_zip(file_name, extract_to)

# Optionally, remove the zip files from Google Drive to save space
for file_name in ribfrac_datasets.keys():
    zip_path = os.path.join(ribfrac_drive_path, file_name)
    if os.path.exists(zip_path):
        print(f'Removing {file_name} from Google Drive...')
        os.remove(zip_path)
    else:
        print(f'{file_name} not found in Google Drive. Skipping removal.')


Downloading ribfrac-train-images-1.zip...
--2024-12-24 08:14:07--  https://zenodo.org/records/3893508/files/ribfrac-train-images-1.zip?download=1
Resolving zenodo.org (zenodo.org)... 188.185.43.25, 188.185.48.194, 188.185.45.92, ...
Connecting to zenodo.org (zenodo.org)|188.185.43.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36605667168 (34G) [application/octet-stream]
Saving to: ‘/content/drive/MyDrive/upwork/RibSeg/RibFrac/ribfrac-train-images-1.zip’

         /content/d   0%[                    ]  64.81M  1017KB/s    eta 12h 41m^C
Downloading ribfrac-train-images-2.zip...
--2024-12-24 08:15:33--  https://zenodo.org/records/3893498/files/ribfrac-train-images-2.zip?download=1
Resolving zenodo.org (zenodo.org)... 188.185.43.25, 188.185.48.194, 188.185.45.92, ...
Connecting to zenodo.org (zenodo.org)|188.185.43.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14562971473 (14G) [application/octet-stream]
Saving to: ‘/content/dri

**Explanation of above code**

*   Define the Destination Path:

All RibFrac dataset parts will be saved to /content/drive/MyDrive/upwork/RibSeg/RibFrac.

*   Download Each Dataset Part:

The download_ribfrac function uses wget to download each zip file from the provided URLs.
It checks if the file already exists to avoid redundant downloads.

*   Extract Each Dataset Part:

The extract_zip function extracts each zip file into its respective directory within Colab's filesystem (/content/RibSeg/data/ribfrac/...).
It checks if the extraction directory already exists to prevent re-extraction.

*   Clean Up:

After extraction, the zip files are removed from Google Drive to save space. This step is optional and can be skipped if you prefer to keep the zip files.


*Important: Ensure that your Google Drive has sufficient storage space to accommodate the RibFrac dataset (several GBs). If you encounter storage issues, consider upgrading your Google Drive storage or using alternative storage solutions.*

**Data Preparation**

Prepare the data for training by running the data_prepare.py script. This script binarizes the CT scans and annotations for rib segmentation.



**Merging the Training Sets**

Since the RibFrac Training Set is split into two parts (train_part1 and train_part2), we'll merge them into a single training directory to simplify the data preparation process.

In [None]:
import shutil

# Define source directories
source_train_part1 = '/content/RibSeg/data/ribfrac/train_part1'
source_train_part2 = '/content/RibSeg/data/ribfrac/train_part2'

# Define destination directory
destination_train = '/content/RibSeg/data/ribfrac/train'

# Create the destination directory if it doesn't exist
os.makedirs(destination_train, exist_ok=True)

# Move all files from train_part1 to train
for filename in os.listdir(source_train_part1):
    src_file = os.path.join(source_train_part1, filename)
    dst_file = os.path.join(destination_train, filename)
    shutil.move(src_file, dst_file)

# Move all files from train_part2 to train
for filename in os.listdir(source_train_part2):
    src_file = os.path.join(source_train_part2, filename)
    dst_file = os.path.join(destination_train, filename)
    shutil.move(src_file, dst_file)

# Optionally, remove the now-empty train_part1 and train_part2 directories
shutil.rmtree(source_train_part1)
shutil.rmtree(source_train_part2)

print("Merged Training Set Part 1 and Part 2 into a single 'train' directory.")


**Running the Data Preparation Script**

Now, execute the data_prepare.py script to preprocess the data.

In [None]:
# Ensure you're in the RibSeg directory
%cd /content/RibSeg

# Run the data preparation script
!python data_prepare.py


**Training the Model**

Train the RibSeg V2 model using the provided training script. Training on a GPU significantly speeds up this process.

In [None]:
# Define the directory to save model checkpoints
model_directory = '/content/RibSeg/models/ribsegv2_model'

# Create the model directory
!mkdir -p {model_directory}

# Run the training script
!python train_ribseg.py --model pointnet2_part_seg_msg --log_dir {model_directory}


**Monitoring Training**

Monitor the training progress through the console logs. Additionally, you can utilize TensorBoard for a more detailed visualization.

In [None]:
# Load TensorBoard extension
%load_ext tensorboard

# Launch TensorBoard pointing to the log directory
%tensorboard --logdir /content/RibSeg/models/ribsegv2_model


Customizing Training Parameters

You can modify training hyperparameters such as learning rate, batch size, and number of epochs by editing the train_ribseg.py script or by adding command-line arguments.

In [None]:
# Example: Running training with a custom learning rate and number of epochs
# !python train_ribseg.py --model pointnet2_part_seg_msg --log_dir {model_directory} --learning_rate 0.001 --epochs 100


**Testing and Inference**

After training, evaluate the model's performance on the test dataset and perform inference on new CT scans.

Testing the Model

Evaluate the trained model's performance on the test dataset.

**NOTE: test_ribseg.py NOT found in the github repo**

In [None]:
# Run the testing script
# !python test_ribseg.py --log_dir /content/RibSeg/models/ribsegv2_model


**Running Inference**

Perform inference on new CT scans using the trained model.

In [None]:
# Define the directory containing new CT scans for inference
inference_input_dir = '/content/RibSeg/data/new_ct_scans'  # Update as necessary

# Define the output directory for inference results
inference_output_dir = '/content/RibSeg/inference_results'

# Create the inference output directory
!mkdir -p {inference_output_dir}

# Run the inference script
!python inference.py --log_dir /content/RibSeg/models/ribsegv2_model --input_dir {inference_input_dir} --output_dir {inference_output_dir}


**Post-processing**

Obtain volume-wise test results by running the post-processing script.

In [None]:
# Define input and output directories for post-processing
post_proc_input_dir = '/content/RibSeg/inference_results'
post_proc_output_dir = '/content/RibSeg/final_results'

# Create the post-processing output directory
!mkdir -p {post_proc_output_dir}

# Run the post-processing script
!python post_proc.py --input_dir {post_proc_input_dir} --output_dir {post_proc_output_dir}
