# Preprocessing SMK Test

This notebook should be used as a test for ensuring correct SBS/phenotype image preprocessing.
Cells marked with `SET PARAMETERS` contain crucial variables that need to be set according to your specific experimental setup and data organization.
Please review and modify these variables as needed before proceeding with the analysis.

## Imports

In [1]:
import glob
from ops.preprocessing_smk import *

In [5]:
SBS_INPUT_PATTERN_METADATA = "input/sbs/*C{cycle}_Wells-{well}_Points-*__Channel*.nd2"
files_list = glob.glob(
    SBS_INPUT_PATTERN_METADATA.format(
        cycle=1, well="A1"
    )
)
files_list

['input/sbs/P001_SBS_10x_C1_Wells-A1_Points-001__Channel_Cy7,Cy5,AF594,Cy3_SBS,DAPI_SBS.nd2',
 'input/sbs/P001_SBS_10x_C1_Wells-A1_Points-100__Channel_Cy7,Cy5,AF594,Cy3_SBS,DAPI_SBS.nd2']

In [7]:
PARSE_FUNCTION_HOME = "input"
PARSE_FUNCTION_DATASET = "example_dataset"
Snake_preprocessing._extract_metadata_tile(
    files_list, 
    parse_function_home=PARSE_FUNCTION_HOME,
    parse_function_dataset=PARSE_FUNCTION_DATASET,
    parse_function_tiles=True,
)

Unnamed: 0,x_data,y_data,z_data,pfs_offset,field_of_view,filename
0,33049.0,-35283.0,3139.66,8063,1,input/sbs/P001_SBS_10x_C1_Wells-A1_Points-001_...
1,34602.4,-26403.1,3125.04,8063,100,input/sbs/P001_SBS_10x_C1_Wells-A1_Points-100_...


## Helper Functions

In [2]:
def find_and_parse_file(parse_function_home, parse_function_dataset, pattern, well, cycle, tiles=None):
    if tiles == None:
        filled_pattern = pattern.format(cycle=cycle, well=well)
        # Find files matching the pattern
        matching_files = glob.glob(filled_pattern)
        
        if matching_files:
            # Select the first matching file
            print(f"Found files to parse: {matching_files}")
            
            # Parse the file
            try:
                file_to_parse = matching_files[0]
                file_description = parse_file(file_to_parse, home=parse_function_home, dataset=parse_function_dataset)
                print(f"File description for first file: {file_description}")
                print("-" * 50)
            except Exception as e:
                print(f"Error parsing file for tile {tile}: {e}")
        else:
            print(f"No files found matching pattern for tile {tile}: {filled_pattern}")

        return
    
    for tile in tiles:
        # Replace placeholders in the pattern
        filled_pattern = pattern.format(cycle=cycle, well=well, tile=f"{tile:03d}")
        
        # Find files matching the pattern
        matching_files = glob.glob(filled_pattern)
        
        if matching_files:
            # Select the first matching file
            print(f"Found files to parse: {matching_files}")
            
            # Parse the file
            try:
                file_to_parse = matching_files[0]
                file_description = parse_file(file_to_parse, home=parse_function_home, dataset=parse_function_dataset)
                print(f"File description for tile {tile}: {file_description}")
                print("-" * 50)
            except Exception as e:
                print(f"Error parsing file for tile {tile}: {e}")
        else:
            print(f"No files found matching pattern for tile {tile}: {filled_pattern}")

## SET PARAMETERS

### Check if file patterns are according to plan

- `PARSE_FUNCTION_HOME` and `PARSE_FUNCTION_DATASET`: The base directory and dataset name for the parsing function.
- `SBS_INPUT_PATTERN_METADATA` and `PH_INPUT_PATTERN_METADATA`: The file naming conventions and directory structures for SBS and PH images without respect to tile. These images are used across tiles to compile metadata.
- `SBS_INPUT_PATTERN` and `PH_INPUT_PATTERN`: The file naming conventions and directory structures for SBS and PH images.

Ensure these variables accurately reflect your experimental setup to guarantee correct data processing and analysis.

Acceptable ND2 File Format:
The parsing functions expect ND2 files to follow these naming conventions:
1. Cycle information (for SBS only) should be in a subdirectory named '/c{number}/' in the file path.
2. Well information should be present as 'Wells-XX_' or 'WellXX_' in the filename.
3. For multi-tile experiments, tile information should be present as 'Points-####' in the filename.
4. Channel information should be present as 'Channel{name}_' in the filename.
5. Phenotype images should have 'input_ph' in the file path.
6. SBS images should have 'input_sbs' in the file path.

Example acceptable filenames:
- SBS: /lab/example/screens/dataset/input_sbs/c1/acquisition_date_folder/Wells-A1_Points-0001_ChannelDAPI_Seq0000.nd2
- PH:  /lab/example/screens/dataset/input_ph/acquisition_date_folder/Wells-A1_Points-0001_ChannelDAPI_Seq0000.nd2

In [6]:
# Parse function parameters
PARSE_FUNCTION_HOME = "input"
PARSE_FUNCTION_DATASET = "example_dataset"

# File patterns for SBS and PH images with placeholders (find all tiles to compile metadata)
SBS_INPUT_PATTERN_METADATA = 'input/sbs/*C{cycle}_Wells-{well}_Points-*__Channel*.nd2'
PH_INPUT_PATTERN_METADATA = 'input/ph/*Wells-{well}_Points-*__Channel*.nd2'

# File patterns for SBS and PH images
SBS_INPUT_PATTERN = 'input/sbs/*C{cycle}_Wells-{well}_Points-{tile:0>3}__Channel*.nd2'
PH_INPUT_PATTERN = 'input/ph/*Wells-{well}_Points-{tile:0>3}__Channel*.nd2'
# phenotpye example files too large to be included on GitHub

# Test SBS_INPUT_PATTERN_METADATA
print("Testing SBS_INPUT_PATTERN_METADATA:")
sbs_parsed = find_and_parse_file(PARSE_FUNCTION_HOME, PARSE_FUNCTION_DATASET, SBS_INPUT_PATTERN_METADATA, well='A1', cycle=1)

# Test PH_INPUT_PATTERN_METADATA
print("\nTesting PH_INPUT_PATTERN_METADATA:")
sbs_parsed = find_and_parse_file(PARSE_FUNCTION_HOME, PARSE_FUNCTION_DATASET, PH_INPUT_PATTERN_METADATA, well='A1', cycle=1)

# Test SBS_INPUT_PATTERN
print("\nTesting SBS_INPUT_PATTERN:")
sbs_parsed = find_and_parse_file(PARSE_FUNCTION_HOME, PARSE_FUNCTION_DATASET, SBS_INPUT_PATTERN, well='A1', cycle=1, tiles=[1, 100])

# Test PH_INPUT_PATTERN
print("\nTesting PH_INPUT_PATTERN:")
ph_parsed = find_and_parse_file(PARSE_FUNCTION_HOME, PARSE_FUNCTION_DATASET, PH_INPUT_PATTERN, well='A1', cycle=None, tiles=[1, 100])

Testing SBS_INPUT_PATTERN_METADATA:
Found files to parse: ['input/sbs/P001_SBS_10x_C1_Wells-A1_Points-001__Channel_Cy7,Cy5,AF594,Cy3_SBS,DAPI_SBS.nd2', 'input/sbs/P001_SBS_10x_C1_Wells-A1_Points-100__Channel_Cy7,Cy5,AF594,Cy3_SBS,DAPI_SBS.nd2']
File description for first file: {'home': 'input', 'dataset': 'example_dataset', 'ext': 'tif', 'well': 'A1'}
--------------------------------------------------

Testing PH_INPUT_PATTERN_METADATA:
Found files to parse: ['input/ph/P001_Pheno_20x_Wells-A1_Points-100__Channel_AF750,Cy3,GFP,DAPI.nd2', 'input/ph/P001_Pheno_20x_Wells-A1_Points-001__Channel_AF750,Cy3,GFP,DAPI.nd2']
File description for first file: {'home': 'input', 'dataset': 'example_dataset', 'ext': 'tif', 'well': 'A1'}
--------------------------------------------------

Testing SBS_INPUT_PATTERN:
Found files to parse: ['input/sbs/P001_SBS_10x_C1_Wells-A1_Points-001__Channel_Cy7,Cy5,AF594,Cy3_SBS,DAPI_SBS.nd2']
File description for tile 1: {'home': 'input', 'dataset': 'example_dataset