# Batch processing of large data sets

For large data sets, we recommend parallel batch processing for faster analysis. Below are scripts for structural and functional analysis using `multiprocessing`. 

Further down are examples for pooling data from multiple tif-files (i.e., from one experiment) in one DataFrame for comparative analyses.

## Structural analysis

In [None]:
# copy the code in .py file and execute .py file. 

from multiprocessing import Pool
from sarcasm import *

# select folder with tif files
folder = 'D:/2023_SarcAsM_drugs_chronic/'

# find all tif files in folder
tif_files = glob.glob(folder + '*/*.tif')
print(f'{len(tif_files)} tif-files found')

# function for analysis of single tif-file
def analyze_tif(file):
    print(file)
    # initialize SarcAsM object
    sarc_obj = SarcAsM(file)

    # predict sarcomere z-bands and cell area
    sarc_obj.predict_z_bands(size=(2048, 2048))
    sarc_obj.predict_cell_area(size=(2048, 2048))

    # analyze cell area and sarcomere area
    sarc_obj.analyze_cell_area(timepoints='all')
    sarc_obj.analyze_sarcomere_area(timepoints='all')

    # analyze sarcomere structures
    sarc_obj.full_analysis_structure(timepoints='all')

    print(f'{file} successfully analyzed!')


# set number of pools
n_pools = 3

if __name__ == '__main__':
    with Pool(n_pools) as p:
        p.map(analyze_tif, tif_files)

## Motion analysis

In [None]:
# copy the code in .py file and execute .py file. 

from multiprocessing import Pool
from sarcasm import *

folder = 'D:/SarcAsM_drugs/'

# find files
files = glob.glob(folder + '/*/*.tif')[::-1]
print(f'{len(files)} tif-files found')


def find_rois(file):
    sarc_obj = SarcAsM(file)
    sarc_obj.predict_z_bands(siam_unet=True)
    sarc_obj.analyze_sarcomere_length_orient(timepoints=0)
    sarc_obj.detect_rois(timepoint=0)


def analyze_motion(file):
    rois = get_rois_of_cell(file)
    for file, roi in rois:
        try:
            mot_obj = Motion(file, roi)
            mot_obj.full_analysis_roi()
        except Exception as e:
            print(file, roi)
            print(repr(e))


if __name__ == '__main__':
    # find ROIs
    with Pool(4) as p:
        p.map(find_rois, files)
    
    # analyze ROIs
    with Pool(12) as p:
        p.map(analyze_motion, files)
    
    
    

## Pool analyses of multiple tif-files in one DataFrame

After analysis of a dataset of tif-files, SarcAsM can pool the data into a dataframe for comparative analyses.

The `MultiStructureAnalysis` class is designed for multi-file comparison of structure. It allows users to iterate through a list of tif files, add metadata using regex functions, extract structure data, and store the data in one pandas DataFrame. Users can also save and load the DataFrame from a specified data folder. Details see [here](../_autosummary/sarcasm.export.MultiStructureAnalysis.html).

The `MultiROIAnalysis` class, on the other hand, is used for multi-ROI comparison. It enables users to iterate through a list of tif files and ROI names, add metadata using regex functions, extract motion data, and store the data in a pandas DataFrame. Similar to the MultiStructureAnalysis class, users can save and load the DataFrame from a specified data folder. Details see [here](../_autosummary/sarcasm.export.MultiROIAnalysis.html).

Below is an example how to use `MultiStructureAnalysis` (analogous for `MultiROIAnalysis`).


In [None]:
from sarcasm import *

# select folder with tif files
files_folder = 'D:/2023_SarcAsM_drugs_chronic/'
data_folder = 'D:/2023_SarcAsM_drugs_chronic/data/'

# find all tif files in folder
tif_files = glob.glob(folder + '*/*.tif')
print(f'{len(tif_files)} tif-files found')

# example regex function to extract date from filename (e.g. '20230502_wt_isoprenaline_10uM.tif')
date = lambda filename: re.search(r'(\d{4})(\d{2})(\d{2})', filename).group(0)

# initialize MultiStructureAnalysis object
multi_structure = MultiStructureAnalysis(list_files=tif_files, folder=data_folder, experiment='test', date=date)

# specify structure and metadata keys
structure_keys = ['z_length_mean', 'cell_area']  # more keys see print(structure_keys_default)
meta_keys = ['tif_name', 'file_id']  # more keys see print(meta_keys_default)

# get structure data of tif-files
multi_structure.get_data(structure_keys=structure_keys, meta_keys=meta_keys)

The resulting pandas DataFrame can be opened and accessed by:

In [None]:
import pandas as pd

# read data frame
df_experiment = pd.read_pickle('/path/to/dataframe.pd')

# access e.g. Z-band mean lengths
z_length = df_experiment['z_length_mean']