# IceFinder Tool Workflow

Welcome to the Ice Finder tool tutorial.

![workflow diagram](../imgs/workflow.svg)




**Important:** Before proceeding, download the [pre-trained model weights](https://drive.google.com/file/d/1wrDRUb9blkyHka5sBio05gGvnxhCtq9D/view?usp=sharing). Place the downloaded weights in the `logs/` folder within your project directory.


## Step 0: Set Working Directory

Ensure you are in the correct directory.  If using a Jupyter Notebook, the default is notebooks/. 

Move to the project root with:

In [1]:
%cd ..

/ssd/homes/alma/Documents/cryo-et-ice-det-test/cryo-et-ice-det


## Step 1: Data Preparation

The workflow accepts .st and .mrc files. Ensure your frames are motion-corrected and stacked.

### Directory Structure

Create the following structure in your project directory:
- `data/`
  - `selected/`: A subset for initial preprocessing and annotation.
  - `full/`: The complete dataset for final analysis and inference.

Execute the command below to set up the directories:

In [2]:
mkdir -p data/selected data/full

Make sure to place your tilt series files (`.st` or `.mrc`) in the appropriate folders! A single tilt series in the `selected/` folder should be enough for initial preprocessing.


## Step 2: Preprocess

The preprocessing step prepares your images for annotation. Micrographs will be binned, squared, and normalized. At this stage, only the files in the `data/selected/` directory will be processed.

**Note**: If your images are not square, consider enabling the `--pad` option to preserve the original proportions during processing. This helps prevent distortion that might affect later analysis.

Execute the command to preprocess your data:

In [3]:
!python src/cryo_et_ice_det/tools/preprocess_dir.py 

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 1/1 [00:01<00:00,  1.57s/it]


The new images are located in `data/selected/preprocessed` as '*.tiff' files.

**Note**: If the images appear blank upon inspection, do not panic. This is a typical result of the normalization process for CVAT compatibility. The images are correctly prepared and usable for subsequent steps.


## Step 3: Anotate

Proceed with annotation by following the steps provided in the [CVAT labeling guide](../docs/annotation-guide.pdf). This guide will assist you through the CVAT platform to annotate your data.

After downloading the annotations as a ZIP file from CVAT, unzip them directly into the designated folder with the following command:

In [4]:
!unzip ./data/selected/labels.zip -d ./data/selected/

Archive:  ./data/selected/labels.zip
 extracting: ./data/selected/annotations/instances_default.json  


This places the annotations into ./data/selected/annotations/instances_default.json. Ensure the ZIP file is in your current directory or modify the path to where the ZIP file is located.

To reformat the labels into a more user-friendly CSV format, run the following command:


In [5]:
!python  src/cryo_et_ice_det/tools/coco_json_to_csv.py 

Data saved to ./data/selected/annotations/data.csv


## Step 4: Fine Tune

For fine-tuning, the data is initially split into training and testing sets. You can adjust the number of training samples for fine-tuning with the `--num_train_samples` flag (default is 5). Run the following command to generate the split:


In [6]:
!python src/cryo_et_ice_det/utils/split_data.py

Split files saved to:
  Train: /ssd/homes/alma/Documents/cryo-et-ice-det-test/cryo-et-ice-det/data/selected/annotations/train.csv
  Test: /ssd/homes/alma/Documents/cryo-et-ice-det-test/cryo-et-ice-det/data/selected/annotations/test.csv


After splitting, you may want to manually adjust the `train.csv` and `test.csv` files to optimize your dataset. While a random split can often provide robust results, you might choose to carefully select samples for the training set to improve fine-tuning or remove less relevant images from the test set for cleaner evaluations. The order on which the sample is presented also matters. If so, ensure that the training and testing datasets are mutually exclusive, meaning the intersection of the two datasets is the empty set $(\text{Train} \cap \text{Test} = \emptyset)$.

Finally, we are ready to fine-tune our model through k-shot testing. Run the function below. If you prefer to use the CPU, avoid the `--device` flag. You can also adjust the number of shots via the `--shots_to_test` argument to match your specific needs:


In [7]:
!python  src/cryo_et_ice_det/fine_tune_k_shots.py --device 1 --shots_to_test 0 1 2 3 4

Global seed set to 10
Fine tunning model for 0 shot.
Fine tunning model for 1 shot.
Fine-tuning Progress: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 30/30 [00:01<00:00, 21.72it/s, Loss=0.000201]
Fine tunning model for 2 shot.
Fine-tuning Progress: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 30/30 [00:02<00:00, 12.66it/s, Loss=0.055170]
Fine tunning model for 3 shot.
Fine-tuning Progress: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 30/30 [00:03<00:00,  8.96it/s, Loss=0.042682]
Fine tunning model for 4 shot.
Fine-tuning Progress: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 30/30 [00:04<00:00,  6.70it/s, Loss=0.039794]
Best model performance saved to: /ssd/homes/alma/Documents/cryo-et-ice-det-test/cryo-et-ice-det/k-shot-results/23-12-2024/best_f1_performance.txt


We will plot performance metrics to aid in selecting the best model. Run the command below to generate the performance plot:

In [8]:
!python src/cryo_et_ice_det/tools/plot_performance.py 

Plot saved to:
/ssd/homes/alma/Documents/cryo-et-ice-det-test/cryo-et-ice-det/k-shot-results/23-12-2024/k_shot_performance_plot_23-12-2024.png


After running the evaluation, you may want to save output images for better intuition and rerun the script focusing on the best-performing model. Use the `--save_images` flag to visualize predictions and set `--shots_to_test` to the best shot (e.g., `--shots_to_test 2`):

In [9]:
!python src/cryo_et_ice_det/fine_tune_k_shots.py --device 1 --shots_to_test 1 --save_images

Global seed set to 10
Fine tunning model for 1 shot.
Fine-tuning Progress: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 30/30 [00:01<00:00, 17.98it/s, Loss=0.000202]
Best model performance saved to: /ssd/homes/alma/Documents/cryo-et-ice-det-test/cryo-et-ice-det/k-shot-results/23-12-2024/best_f1_performance.txt


After inspecting the images, you might gain insights and decide to hand-pick some samples to boost your metrics. If you are satisfied, congratulations! ðŸŽ‰ Your baseline fine-tuning process is complete.

## Step 5: Inference

Place the rest of your dataset in the data/full folder and run the preprocessing step with the following command:

In [10]:
!python src/cryo_et_ice_det/tools/preprocess_dir.py data/full

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 2/2 [00:02<00:00,  1.46s/it]


To run your model in inference mode, use the following command. If you have a specific GPU device available, specify it with the `--device` flag (e.g., `--device 1`).  Use the --date argument to specify the date of the experiment results (default is the current day):

In [12]:
!python src/cryo_et_ice_det/run_inference_and_quantify.py --device 0

Global seed set to 10
Processing images: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 82/82 [00:01<00:00, 48.98it/s]
Total inference time: 0.6843 s
Average inference time per image: 8.3 ms
Quantification results saved to: /ssd/homes/alma/Documents/cryo-et-ice-det-test/cryo-et-ice-det/inference-results/23-12-2024/quantification.csv
Inference complete. Results are saved in: 23-12-2024


Depending on the size of your dataset, you might want to export the annotations (`--export_annotations`) or save the visualized output images (`--save_images`).

By default, even without these flags, the script will provide per-micrograph quantification, helping you evaluate the ice distribution and overall quality of the dataset.

## Step 6: Analyze / Quantify


The final step of the workflow focuses on assessing vitrification quality by quantifying crystalline areas across your dataset. This phase provides global and per-tilt statistics, helping you identify problematic tilt series and isolate micrographs that meet quality thresholds through filtering.

In [13]:
!python src/cryo_et_ice_det/utils/analyze_quantifications.py --filter_threshold 5

Global Quantification Statistics
----------------------------------------
Mean: 1.23%, 95% CI: Â±0.45%
Min: 0.00%
Max: 9.77%

Per Tilt Series Statistics
----------------------------------------
Tilt Series 0:
  Mean: 1.10%, 95% CI: Â±0.54%
  Min: 0.00%
  Max: 9.77%

Tilt Series 1:
  Mean: 1.36%, 95% CI: Â±0.71%
  Min: 0.00%
  Max: 9.03%

Summary saved to: /ssd/homes/alma/Documents/cryo-et-ice-det-test/cryo-et-ice-det/inference-results/23-12-2024/quantification_summary.csv
Filtering Threshold: 5.0%
Filtered Micrographs Saved To:
/ssd/homes/alma/Documents/cryo-et-ice-det-test/cryo-et-ice-det/inference-results/23-12-2024/filtered_micrographs.csv
----------------------------------------
Micrographs Passing Filter: 6 / 82


Key outputs include:
- Global statistics (mean, min, max, and confidence intervals)
- Per-tilt series analysis
- Filtered micrographs list

Results are saved as `quantification_summary.csv` and `filtered_micrographs.csv` for further inspection. 

We hope this step empowers you to efficiently evaluate vitrification quality and gain valuable insights into your dataset.