# Run whole image QC pipeline in CellProfiler

To determine if there are images that of poor quality, we run a CellProfiler pipeline specific to extracting image quality metrics.
We extract blur and saturation metrics, we can use to identify thresholds for these metrics to separate the good and poor quality images.


## Import libraries

In [1]:
import pathlib
import pprint

import sys

sys.path.append("../utils")
import cp_parallel

## Set paths and variables

In [2]:
# set the run type for the parallelization
run_name = "quality_control"

# set path for pipeline for whole image QC
path_to_pipeline = pathlib.Path("./pipeline/whole_image_qc.cppipe").resolve(strict=True)

# set main output dir for all plates if it doesn't exist
output_dir = pathlib.Path("./qc_results")
output_dir.mkdir(exist_ok=True)

# directory where images are located within folders (parent folder is the plate and the child folders are wells containing images)
images_dir = pathlib.Path("../data/raw_images").resolve(strict=True)

# list for plate names based on folders to use to create dictionary
plate_names = []
# iterate through 0.download_data and append plate names from folder names that contain image data from that plate
for file_path in images_dir.iterdir():
    plate_names.append(str(file_path.stem.split("_")[0]))

print("There are a total of", len(plate_names), "plates. The names of the plates are:")
for plate in plate_names:
    print(plate)

There are a total of 3 plates. The names of the plates are:
NF0014
NF0018
NF0016


## Generate dictionary with plate info to run CellProfiler

In [3]:
# create plate info dictionary with all parts of the CellProfiler CLI command
plate_info_dictionary = {
    name: {
        "path_to_images": pathlib.Path(list(images_dir.rglob(f"{name}_raw_images"))[0]).resolve(
            strict=True
        ),
        "path_to_output": pathlib.Path(f"{output_dir}/{name}_qc_results"),
        "path_to_pipeline": path_to_pipeline,
    }
    for name in plate_names
}

# view the dictionary to assess that all info is added correctly
pprint.pprint(plate_info_dictionary, indent=4)

{   'NF0014': {   'path_to_images': PosixPath('/media/18tbdrive/1.Github_Repositories/GFF_3D_organoid_profiling_pipeline/data/raw_images/NF0014_raw_images'),
                  'path_to_output': PosixPath('qc_results/NF0014_qc_results'),
                  'path_to_pipeline': PosixPath('/media/18tbdrive/1.Github_Repositories/GFF_3D_organoid_profiling_pipeline/1.image_quality_control/pipeline/whole_image_qc.cppipe')},
    'NF0016': {   'path_to_images': PosixPath('/media/18tbdrive/1.Github_Repositories/GFF_3D_organoid_profiling_pipeline/data/raw_images/NF0016_raw_images'),
                  'path_to_output': PosixPath('qc_results/NF0016_qc_results'),
                  'path_to_pipeline': PosixPath('/media/18tbdrive/1.Github_Repositories/GFF_3D_organoid_profiling_pipeline/1.image_quality_control/pipeline/whole_image_qc.cppipe')},
    'NF0018': {   'path_to_images': PosixPath('/media/18tbdrive/1.Github_Repositories/GFF_3D_organoid_profiling_pipeline/data/raw_images/NF0018_raw_images'),
    

## Run QC pipeline in CellProfiler

In [None]:
cp_parallel.run_cellprofiler_parallel(
    plate_info_dictionary=plate_info_dictionary, run_name=run_name
)