This notebook runs the basic trining code for cellpose.

We always tried to stay up to date with the current approaches shared in the kaggle discussion forum. When we saw great results with cellpose, we also wanted to try this approach.

This notebook was used in google colab, which is why there exist some colab specific method calls. The code which converts the kaggle dataset into the required data format for cellpose can be found in "preprocess-dataset.py".



In [1]:
# Install dependencies that are not preinstalled on google colab
!pip install --upgrade pip
!pip uninstall kaggle --yes
!pip install cellpose kaggle tiffile
!pip install --upgrade opencv-python

Collecting pip
  Downloading pip-21.3.1-py3-none-any.whl (1.7 MB)
[K     |████████████████████████████████| 1.7 MB 5.2 MB/s 
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-21.3.1
Found existing installation: kaggle 1.5.12
Uninstalling kaggle-1.5.12:
  Successfully uninstalled kaggle-1.5.12
Collecting cellpose
  Downloading cellpose-0.7.2-py3-none-any.whl (365 kB)
     |████████████████████████████████| 365 kB 5.2 MB/s            
[?25hCollecting kaggle
  Downloading kaggle-1.5.12.tar.gz (58 kB)
     |████████████████████████████████| 58 kB 4.8 MB/s             
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tiffile
  Downloading tiffile-2018.10.18-py2.py3-none-any.whl (2.7 kB)
Collecting edt
  Downloading edt-2.1.1-cp37-cp37m-manylinux2014_x86_64.whl (1.9 MB)
     |████████████████████████████████| 1

In [2]:
from google.colab import files

# Connect with kaggle to download all datasets
uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  
# Then move kaggle.json into the folder where the API expects to find it.
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json

Saving kaggle.json to kaggle.json
User uploaded file "kaggle.json" with length 73 bytes


In [3]:
# Clone our repository so we can run our data preparation scripts
!git clone https://github.com/BigKingXXL/Sartorius.git #Removed git key
%cd Sartorius
!git checkout pytorch-lightning

Cloning into 'Sartorius'...
remote: Enumerating objects: 1170, done.[K
remote: Counting objects: 100% (1170/1170), done.[K
remote: Compressing objects: 100% (876/876), done.[K
remote: Total 1170 (delta 321), reused 1122 (delta 273), pack-reused 0[K
Receiving objects: 100% (1170/1170), 4.71 MiB | 19.46 MiB/s, done.
Resolving deltas: 100% (321/321), done.
/content/Sartorius
Branch 'pytorch-lightning' set up to track remote branch 'pytorch-lightning' from 'origin'.
Switched to a new branch 'pytorch-lightning'


In [5]:
# Download all the required data for the competition.
!bash get-data.sh

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: LIVECell_dataset_2021/images/livecell_train_val_images/Huh7/Huh7_Phase_A11_2_02d12h00m_4.tif  
  inflating: LIVECell_dataset_2021/images/livecell_train_val_images/Huh7/Huh7_Phase_A11_2_02d16h00m_1.tif  
  inflating: LIVECell_dataset_2021/images/livecell_train_val_images/Huh7/Huh7_Phase_A11_2_02d16h00m_2.tif  
  inflating: LIVECell_dataset_2021/images/livecell_train_val_images/Huh7/Huh7_Phase_A11_2_02d16h00m_3.tif  
  inflating: LIVECell_dataset_2021/images/livecell_train_val_images/Huh7/Huh7_Phase_A11_2_02d16h00m_4.tif  
  inflating: LIVECell_dataset_2021/images/livecell_train_val_images/Huh7/Huh7_Phase_A11_2_02d20h00m_1.tif  
  inflating: LIVECell_dataset_2021/images/livecell_train_val_images/Huh7/Huh7_Phase_A11_2_02d20h00m_2.tif  
  inflating: LIVECell_dataset_2021/images/livecell_train_val_images/Huh7/Huh7_Phase_A11_2_02d20h00m_3.tif  
  inflating: LIVECell_dataset_2021/images/livecell_train_val_images/Huh

In [6]:
# We now convert the images and annotations from kaggle into .tif files.
!python3 preprocess-dataset.py

INFO:root:converting masks to seperated layers
INFO:root:converting masks to one layers
INFO:root:converting input images to tif
INFO:root:done


In [None]:
# The training process is already implemented by the cellpose libary. We only neeeded to set the parameters.
%cd /content/Sartorius
!python3 -m cellpose --use_gpu --train --dir ./dataset/tif/train --pretrained_model cyto --save_each --chan 0 --chan2 0 --batch_size 32 --learning_rate 0.0002 --n_epochs 1000 --diameter 19

/content/Sartorius
2021-12-30 12:48:22,290 [INFO] WRITING LOG OUTPUT TO /root/.cellpose/run.log
2021-12-30 12:48:32,185 [INFO] ** TORCH CUDA version installed and working. **
2021-12-30 12:48:32,185 [INFO] >>>> using GPU
2021-12-30 12:48:32,186 [INFO] Downloading: "https://www.cellpose.org/models/cytotorch_0" to /root/.cellpose/models/cytotorch_0

100% 25.3M/25.3M [00:00<00:00, 71.6MB/s]
Not all flows are present. Run flow generation again.
2021-12-30 12:48:33,708 [INFO] >>>> pretrained model /root/.cellpose/models/cytotorch_0 is being used
2021-12-30 12:48:33,708 [INFO] >>>> during training rescaling images to fixed diameter of 30.0 pixels
2021-12-30 12:48:33,918 [INFO] Training with rescale = 1.00
2021-12-30 12:48:41,271 [INFO] train channels = 2
2021-12-30 12:48:41,271 [INFO] NOTE: test data not provided OR labels incorrect OR not same number of channels as train data
2021-12-30 12:48:41,271 [INFO] NOTE: computing flows for labels (could be done before to save time)
100% 606/606 [01

In [None]:
# After finishing the training we downloaded the resulting model.
from google.colab import files
files.download('/content/Sartorius/dataset/tif/train/models/cellpose_residual_on_style_on_concatenation_off_train_2021_12_21_16_30_40.512659_epoch_999') 

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>