# <font color="red"> CellProfiler: profiling of P-Bodies (DCP1A, LSM14A) dNLS across all batches </font>
 
JIRA task: NN-83

Why Linear Mixed-Effects Models (LMMs)?
Your experimental structure involves:

- Two groups: dNLS_Untreated vs dNLS_DOX

- 5 batches per group

- Measurements per site image (the dependent variable), 50–250 site images per batch (i.e., image-level measurements)


- Random variation across batches random intercept per batch (i.e., variation across batches)

This design includes both fixed effects (groups) and random effects (batches)
Estimate how dNLS_DOX affects each CellProfiler feature vs dNLS_Untreated, accounting for batch effects (inter-batch variation).

In [1]:
from pathlib import Path
import glob
import sys
import os


os.environ['NOVA_HOME'] = '/home/projects/hornsteinlab/Collaboration/NOVA/'
os.environ['NOVA_DATA_HOME'] = f"{os.environ['NOVA_HOME']}/input"
print('NOVA_HOME is at', os.getenv('NOVA_HOME'))
sys.path.insert(1, os.getenv('NOVA_HOME'))


import numpy as np
import pandas as pd
import seaborn as sns
from markdown import markdown
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

from cell_profiler.code.cp_effect_size_utils import CP_OUTPUTS_FOLDER, validate_cp_files, extract_path_parts, merge_on_group, collect_cp_results_by_cell_line, load_cp_results, get_features_per_image, get_aggregated_features_per_image, collect_all_features, run_analysis_generate_report, print_mixedlm_conclusions, measures_to_plot
from manuscript.plot_config import PlotConfig

%load_ext autoreload    
%autoreload 2

NOVA_HOME is at /home/projects/hornsteinlab/Collaboration/NOVA/


# New dNLS dataset - DCP1A

In [2]:
ANALYSIS_TYPE = 'PB_profiling/dNLS_DCP1A'
BATCHES = ['batch1', 'batch2', 'batch4', 'batch5', 'batch6']

# Save figures here
save_path = '/home/projects/hornsteinlab/Collaboration/NOVA/outputs/vit_models/finetunedModel_MLPHead_acrossBatches_B56789_80pct_frozen/figures/dNLS/cell_profiler/PB_profiling/dNLS_DCP1A'

# Font
FONT_PATH = '/home/projects/hornsteinlab/sagyk/anaconda3/envs/nova/fonts/arial.ttf'
from matplotlib import font_manager as fm
import matplotlib
fm.fontManager.addfont(FONT_PATH)
matplotlib.rcParams['font.family'] = 'Arial'

plt.rcParams.update({
    'font.family': 'Arial',
    'font.size': 6
})

In [3]:
group_by_columns = ['ImageNumber', 'batch', 'rep', 'cell_line', 'condition']
REQUIRED_FILES = ['Image.csv', 'Pbodies.csv', 'Cytoplasm.csv']


In [4]:

# Test CP outputs (number of images)
if False:
    pattern = os.path.join(CP_OUTPUTS_FOLDER, ANALYSIS_TYPE, '*', '*', '*', '*', '*', '*')
    # store marker folders by cell line
    for marker_path in glob.glob(pattern):
        if os.path.isdir(marker_path):
            try:
                image_df = pd.read_csv(marker_path +'/Image.csv')
                #print(marker_path, image_df.shape)
                #print(image_df[['Count_Pbodies', 'Count_nucleus']].head(10))
                
                # DEBUG CODE: to recognise problems in CP wiriting to the wrong folder
                # parts_df = image_df['PathName_nucleus'].apply(extract_path_parts)
                parts_df = image_df['PathName_DAPI'].apply(extract_path_parts)
                
                print(marker_path, parts_df['batch'].unique(), parts_df['cell_line'].unique(), parts_df['condition'].unique(), parts_df['rep'].unique(), )
                # DEBUG CODE

                marker = os.path.basename(marker_path)    
                cell_line = Path(marker_path).resolve().parents[3].name
            except FileNotFoundError as e:
                print("!!!!")
                print(e)
        else:
            print(f"Not a marker folder directory:{marker_path}")


## Collect CP files by "cell_line+condition" and Load CP data

In [5]:
# Collect paths of CP output files
paths_by_cell_line = collect_cp_results_by_cell_line(ANALYSIS_TYPE, include_condition=True)

In [6]:
# Load CP data
cp_data = load_cp_results(paths_by_cell_line, REQUIRED_FILES)

number of subjects from cell line WT_Untreated: 10
number of subjects from cell line dNLS_Untreated: 15
number of subjects from cell line dNLS_DOX: 15


In [7]:
# Get the calculated features from all CP output files
cp_measurements = collect_all_features(cp_data, group_by_columns)

⚠️ WT_Untreated: Removed 4 of 1441 site images with 0 nuclei.
WT_Untreated (1437, 6) (1441, 32) (1437, 18)
(1441, 33)
(1441, 46)
⚠️ dNLS_Untreated: Removed 2 of 2149 site images with 0 nuclei.
dNLS_Untreated (2147, 6) (2149, 32) (2147, 18)
(2149, 33)
(2149, 46)
⚠️ dNLS_DOX: Removed 4 of 1828 site images with 0 nuclei.
dNLS_DOX (1824, 6) (1828, 32) (1824, 18)
(1828, 33)
(1828, 46)
Shape after merging is: (5418, 46)


# Remove batch 3, remove WT, add new variable "group"

In [8]:
# Remove batch 3
cp_measurements = cp_measurements[cp_measurements['batch'].isin(BATCHES)]

# Add group
cp_measurements['group'] = cp_measurements['cell_line']+"_"+cp_measurements['condition']

# Fiilter by lines
lines_to_include = ["dNLS_Untreated", "dNLS_DOX"]
cp_measurements = cp_measurements[cp_measurements['group'].isin(lines_to_include)]
print(cp_measurements.shape)

# Important to put the reference group first in order for mixedlm() - has to be Categorical!
cp_measurements["group"] = pd.Categorical(
    cp_measurements["group"],
    categories=lines_to_include,
    ordered=True
)


(3977, 47)


In [9]:
#cp_measurements[['rep', 'group']].value_counts()


In [10]:
#cp_measurements[['batch']].value_counts()

In [11]:
#cp_measurements[['group']].value_counts()

# Effect size modeling

The terms in the formula:
- measurement: The CellProfiler feature (e.g., mean number of p-bodies).
- gene_group: A fixed effect to test differences between WT, C9, +sALS, -sALS.
- rep: Random intercept for each patient, accounting for intra-patient correlation across site images.

What This Model Gives You:
- Estimates of group differences: WT vs other groups, with significance testing.

- Within-patient variability: Captures how consistent measurements are across images for a given patient.

- Between-patient variability: Tests whether observed effects are reproducible across patients.

- P-values or confidence intervals: For significance of gene group effects.



In [12]:
#cp_measurements[['group', 'batch', 'num_pb']].groupby(['group', 'batch'], observed=False).describe()
#cp_measurements[['gene_group', 'num_pb']].groupby('gene_group', observed=False).describe()

In [13]:
# get the CellProfiler features you want to calculate effect for 
# cp_features_columns = [col for col in cp_measurements.columns if col not in group_by_columns + ['batch', 'group']]
cp_features_columns = ['mean_AreaShape_MeanRadius']
results_df_DCP1A = run_analysis_generate_report(
                                df=cp_measurements,
                                feature_columns=cp_features_columns,
                                group_col="group",
                                batch_col="batch",
                                output_dir=os.path.join(CP_OUTPUTS_FOLDER, ANALYSIS_TYPE, 'mixed_effect_report')
)

results_df_DCP1A





Analysing CP feature: mean_AreaShape_MeanRadius
❌ Random effect variance is near zero. — Unable to fit random intercept (e.g., low variance or convergence issue)
⚠️ Fallback to fixed-effects model for feature: mean_AreaShape_MeanRadius
                            OLS Regression Results                            
Dep. Variable:          feature_value   R-squared:                       0.469
Model:                            OLS   Adj. R-squared:                  0.469
Method:                 Least Squares   F-statistic:                     702.7
Date:                Mon, 06 Oct 2025   Prob (F-statistic):               0.00
Time:                        18:41:16   Log-Likelihood:                 5916.5
No. Observations:                3977   AIC:                        -1.182e+04
Df Residuals:                    3971   BIC:                        -1.178e+04
Df Model:                           5                                         
Covariance Type:            nonrobust             

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  _, results_df_wo_intercept["fdr_pval_global"], _, _ = multipletests(results_df_wo_intercept["pval"], method="fdr_bh")


Unnamed: 0,feature,comparison,effect_size,pval,ci_lower,ci_upper,group_var,residual_var,significance,fit_status,used_fixed_model,aic,bic,loglik,r_squared,fdr_pval_global
0,mean_AreaShape_MeanRadius,Intercept,1.517047,0.0,1.51187,1.522225,,0.002992,****,fallback_ols_fixed_batch,True,-11820.905108,-11783.17541,5916.452554,0.469429,
1,mean_AreaShape_MeanRadius,dNLS_DOX,-0.055598,3.287573e-197,-0.059032,-0.052163,,0.002992,****,fallback_ols_fixed_batch,True,-11820.905108,-11783.17541,5916.452554,0.469429,3.287573e-197


# Plotting

In [None]:

def plot_cp_feature_grouped_by_gene(cp_measurements, cp_feature_col, group_col="gene_group", patient_col="patient_id", color_mapping=None, model_results_df=None, pdf_file=None):
    
    df = cp_measurements.copy()

    # Define fixed color and label mapping for both group_col and patient_col
    _palette = {}
    groups = df[group_col].astype(str).unique().tolist() + df[patient_col].astype(str).unique().tolist()
    for g in groups:
        if g in color_mapping: _palette[color_mapping[g]['alias']] = color_mapping[g]['color']
    if patient_col=='batch':
        for b in df[patient_col].astype(str).unique().tolist():
            _palette[b]='gray'

    
    # Rename groups to aliases
    label_mapping = {k: v["alias"] for k, v in color_mapping.items() if k in groups}
    df[group_col] = df[group_col].cat.rename_categories(label_mapping)
    df[patient_col] = df[patient_col].astype("category").cat.rename_categories(label_mapping)
    
    # Determine group order and x-axis positions for each group (reversed for visual preference)
    groups_order = sorted(df[group_col].unique(), reverse=True)
    x_spacing = 0.4
    x_pos_map = {label: i * x_spacing for i, label in enumerate(groups_order)}

    
    # Setup plot
    sns.set(style="white", font_scale=1.0)
    fig, ax = plt.subplots(figsize=(3, 4))
    line_width = 1

    
    # ============================
    # Plot each group manually by numeric x
    # ============================
    for i, group in enumerate(groups_order):
        
        group_data = df[df[group_col] == group]
        xpos = x_pos_map[group]
        
        # Boxplot for group
        sns.boxplot(
            data=group_data,
            y=cp_feature_col,
            ax=ax,
            width=0.3,
            linewidth=line_width,
            showfliers=False,
            showmeans=True,
            meanline=True,
            meanprops={"linestyle": "-", "color": "black", "linewidth": line_width},
            boxprops=dict(facecolor='none', edgecolor='black', linewidth=line_width),
            whiskerprops=dict(linewidth=line_width-0.3, color='black'),
            capprops=dict(linewidth=line_width, color='black'),
            medianprops=dict(visible=False),
            positions=[x_pos_map[group]]
        )

        # ============================
        # Full distribution: raw cell-level/image-level points (light gray)
        # ============================ 
        ax.scatter(
            x=np.random.normal(loc=x_pos_map[group], scale=0.05, size=len(group_data)),  # jitter
            y=group_data[cp_feature_col],
            color='lightgray',
            s=2.5,
            alpha=0.4,
            zorder=1
        )

    # ============================
    # Overlay per-batch means (means as colored points)
    # ============================
    batch_means = df.groupby([group_col, patient_col], observed=True)[cp_feature_col].mean().reset_index()

    for _, row in batch_means.iterrows():
        group = row[group_col]
        batch = row[patient_col]

        # Skip if value is missing
        if pd.isna(group) or pd.isna(batch) or pd.isna(row[cp_feature_col]):
            continue

        xpos = x_pos_map.get(group)
        y = row[cp_feature_col]

        if pd.isna(xpos) or pd.isna(y):
            continue

        jittered_x = np.random.normal(loc=xpos, scale=0.05)

        ax.scatter(
            x=jittered_x,
            y=y,
            color=_palette.get(group, 'black'),
            edgecolor=None,
            s=5,
            zorder=3,
            label=batch
        )

    # Deduplicate legend
    handles, labels = ax.get_legend_handles_labels()
    unique = dict(zip(labels, handles))
    ax.legend(
        unique.values(),
        unique.keys(),
        title=patient_col,
        bbox_to_anchor=(1.02, 1),
        loc="upper left"
    )

    # Set axis formatting
    ax.set_xlim(-x_spacing+0.1, max(x_pos_map.values()) + x_spacing - 0.1)
    
    ax.set_xticks(list(x_pos_map.values()))
    ax.set_xticklabels(groups_order, rotation=90)
    ax.set_ylabel(cp_feature_col)
    ax.margins(x=0)
    
    # Ensure tick marks are shown on both axes
    ax.tick_params(axis='both', which='both', direction='out',
                   length=4, width=1, bottom=True, top=False, left=True, right=False)
    

    # ============================
    # P-value annotation LMM
    # ============================
    stat = model_results_df.loc[(model_results_df['comparison'] == 'dNLS_DOX') & (model_results_df['feature'] ==cp_feature_col)]
    p = float(stat['pval'].iloc[0])
    effect_size = float(stat['effect_size'].iloc[0])
    ci_low = float(stat['ci_lower'].values[0])
    ci_high = float(stat['ci_upper'].values[0])
    txt = f"Effect size: {effect_size} \n(p = {p}, 95% CI: \n[{ci_low:.4f}, {ci_high:.4f}])"

    # Format p display
    if p < 0.0001:
        p_text = "****"
    elif p < 0.001:
        p_text = "***"
    elif p < 0.01:
        p_text = "**"
    elif p < 0.05:
        p_text = "*"
    else:
        p_text = f"n.s. (p = {p:.2f})"
        
    # Plot annotation - use actual plot limits to place annotation
    ymin, ymax = ax.get_ylim()
    y_range = ymax - ymin
    
    line_y = ymax - 0.1 * y_range
    text_y = line_y - 0.02 * y_range

    # Significance - Bridge line between two groups
    x_keys = list(x_pos_map.values())
    if len(x_keys) >= 2:
        x1, x2 = x_keys[0], x_keys[1]
        ymin, ymax = ax.get_ylim()
        y_range = ymax - ymin
        line_y = ymax - 0.1 * y_range
        text_y = line_y - 0.02 * y_range


        ax.plot([x1, x1, x2, x2],
                [line_y, line_y + 0.01*y_range, 
                 line_y + 0.01*y_range, line_y],
                lw=1.5, c='black')

        # Annotation text
        ax.text((x1 + x2) / 2, text_y, p_text, ha='center', va='bottom')
    
    # Remove extra space around plot
    plt.tight_layout()
    # Add top space if needed
    plt.subplots_adjust(top=0.8)
    plt.suptitle(txt, fontsize=8)
    
    # Save the plot
    if pdf_file is not None:
        pdf_file.savefig(fig)
        plt.close(fig)
    else:
        plt.show()
    


In [15]:


with PdfPages(f"{save_path}/cell_profiler_dNLS_p_bodies_DCP1A.pdf") as pdf:
    for cp_feature_col in measures_to_plot:
        if cp_feature_col in cp_measurements.columns:
            
            plot_cp_feature_grouped_by_gene(
                cp_measurements,
                cp_feature_col=cp_feature_col,
                group_col='group',
                patient_col="batch",
                color_mapping=PlotConfig().COLOR_MAPPINGS_DOX,
                model_results_df=results_df_DCP1A,
                pdf_file=pdf  
            )

# New dNLS dataset - LSM14A

In [14]:

ANALYSIS_TYPE = 'PB_profiling/dNLS_LSM14A'
BATCHES = ['batch1', 'batch2',  'batch4', 'batch5', 'batch6']

# Save figures here
save_path = '/home/projects/hornsteinlab/Collaboration/NOVA/outputs/vit_models/finetunedModel_MLPHead_acrossBatches_B56789_80pct_frozen/figures/dNLS/cell_profiler/PB_profiling/dNLS_LSM14A'


In [15]:

# Test CP outputs (number of images)
if False:
    pattern = os.path.join(CP_OUTPUTS_FOLDER, ANALYSIS_TYPE, '*', '*', '*', '*', '*', '*')
    # store marker folders by cell line
    for marker_path in glob.glob(pattern):
        if os.path.isdir(marker_path):
            try:
                image_df = pd.read_csv(marker_path +'/Image.csv')
                #print(marker_path, image_df.shape)
                #print(image_df[['Count_Pbodies', 'Count_nucleus']].head(10))
                
                # DEBUG CODE: to recognise problems in CP wiriting to the wrong folder
                # parts_df = image_df['PathName_nucleus'].apply(extract_path_parts)
                parts_df = image_df['PathName_DAPI'].apply(extract_path_parts)
                
                print(marker_path, parts_df['batch'].unique(), parts_df['cell_line'].unique(), parts_df['condition'].unique(), parts_df['rep'].unique(), )
                # DEBUG CODE

                marker = os.path.basename(marker_path)    
                cell_line = Path(marker_path).resolve().parents[3].name
            except FileNotFoundError as e:
                print("!!!!")
                print(e)
        else:
            print(f"Not a marker folder directory:{marker_path}")


## Collect CP files by "cell_line+condition" and Load CP data

In [16]:
#NOTE!! when I ran this, batch5/WT/.../rep1 was empty. so "validate=False" to over come this

# Collect paths of CP output files
paths_by_cell_line = collect_cp_results_by_cell_line(ANALYSIS_TYPE, include_condition=True, validate=False)

In [17]:
#NOTE!! when I ran this, batch5/WT/.../rep1 was empty. so "validate=False" to over come this
paths_by_cell_line['WT_Untreated'].remove('/home/projects/hornsteinlab/Collaboration/NOVA/cell_profiler/outputs/cell_profiler_RUNS/Final_cp_analysis/PB_profiling/dNLS_LSM14A/batch5/WT/panelH/Untreated/rep1/LSM14A')


In [18]:
# Load CP data
cp_data = load_cp_results(paths_by_cell_line, REQUIRED_FILES)


number of subjects from cell line WT_Untreated: 9
number of subjects from cell line dNLS_Untreated: 15
number of subjects from cell line dNLS_DOX: 15


In [19]:
# Get the calculated features from all CP output files

LSM14A_PB_in_cyto_measures = [
    "Math_LSM14A_PB_over_cyto", 
    "Math_Texture_Contrast_LSM14A_pb_only_15", 
    "Math_Texture_Contrast_LSM14A_pb_only_3", 
    "Math_Texture_Contrast_LSM14A_pb_only_5",
    "Math_Texture_Contrast_LSM14A_pb_only_9",
    "Math_Texture_Entropy_LSM14A_pb_only_15",
    "Math_Texture_Entropy_LSM14A_pb_only_3",
    "Math_Texture_Entropy_LSM14A_pb_only_5",
    "Math_Texture_Entropy_LSM14A_pb_only_9",
    "Math_Texture_Homogeneity_LSM14A_pb_only_15",
    "Math_Texture_Homogeneity_LSM14A_pb_only_3",
    "Math_Texture_Homogeneity_LSM14A_pb_only_5",
    "Math_Texture_Homogeneity_LSM14A_pb_only_9",

    
]

cp_measurements = collect_all_features(cp_data, group_by_columns, PB_in_cyto_measures=LSM14A_PB_in_cyto_measures)

WT_Untreated (1351, 6) (1351, 32) (1351, 18)
(1351, 33)
(1351, 46)
⚠️ dNLS_Untreated: Removed 1 of 2474 site images with 0 nuclei.
dNLS_Untreated (2473, 6) (2474, 32) (2473, 18)
(2474, 33)
(2474, 46)
⚠️ dNLS_DOX: Removed 2 of 2365 site images with 0 nuclei.
dNLS_DOX (2363, 6) (2365, 32) (2363, 18)
(2365, 33)
(2365, 46)
Shape after merging is: (6190, 46)


# Remove batch 3, remove WT, add new variable "group"

In [20]:
# Remove batch 3
cp_measurements = cp_measurements[cp_measurements['batch'].isin(BATCHES)]

# Add group
cp_measurements['group'] = cp_measurements['cell_line']+"_"+cp_measurements['condition']

# Fiilter by lines
lines_to_include = ["dNLS_Untreated", "dNLS_DOX"]
cp_measurements = cp_measurements[cp_measurements['group'].isin(lines_to_include)]
print(cp_measurements.shape)

# Important to put the reference group first in order for mixedlm() - has to be Categorical!
cp_measurements["group"] = pd.Categorical(
    cp_measurements["group"],
    categories=lines_to_include,
    ordered=True
)


(4839, 47)


In [21]:
#cp_measurements[['rep', 'group']].value_counts()


In [22]:
#cp_measurements[['batch']].value_counts()

In [23]:
#cp_measurements[['group']].value_counts()

# Effect size modeling

The terms in the formula:
- measurement: The CellProfiler feature (e.g., mean number of p-bodies).
- gene_group: A fixed effect to test differences between WT, C9, +sALS, -sALS.
- rep: Random intercept for each patient, accounting for intra-patient correlation across site images.

What This Model Gives You:
- Estimates of group differences: WT vs other groups, with significance testing.

- Within-patient variability: Captures how consistent measurements are across images for a given patient.

- Between-patient variability: Tests whether observed effects are reproducible across patients.

- P-values or confidence intervals: For significance of gene group effects.



In [24]:
#cp_measurements[['group', 'batch', 'num_pb']].groupby(['group', 'batch'], observed=False).describe()
#cp_measurements[['gene_group', 'num_pb']].groupby('gene_group', observed=False).describe()

In [25]:
# get the CellProfiler features you want to calculate effect for 
# cp_features_columns = [col for col in cp_measurements.columns if col not in group_by_columns + ['batch', 'group']]
cp_features_columns = ['mean_AreaShape_MeanRadius']

results_df_LSM14 = run_analysis_generate_report(
                                df=cp_measurements,
                                feature_columns=cp_features_columns,
                                group_col="group",
                                batch_col="batch",
                                output_dir=os.path.join(CP_OUTPUTS_FOLDER, ANALYSIS_TYPE, 'mixed_effect_report')
)

results_df_LSM14





Analysing CP feature: mean_AreaShape_MeanRadius
❌ Random effect variance is near zero. — Unable to fit random intercept (e.g., low variance or convergence issue)
⚠️ Fallback to fixed-effects model for feature: mean_AreaShape_MeanRadius
                            OLS Regression Results                            
Dep. Variable:          feature_value   R-squared:                       0.199
Model:                            OLS   Adj. R-squared:                  0.198
Method:                 Least Squares   F-statistic:                     239.7
Date:                Mon, 06 Oct 2025   Prob (F-statistic):          2.40e-229
Time:                        18:53:47   Log-Likelihood:                 4716.2
No. Observations:                4839   AIC:                            -9420.
Df Residuals:                    4833   BIC:                            -9381.
Df Model:                           5                                         
Covariance Type:            nonrobust             

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  _, results_df_wo_intercept["fdr_pval_global"], _, _ = multipletests(results_df_wo_intercept["pval"], method="fdr_bh")


Unnamed: 0,feature,comparison,effect_size,pval,ci_lower,ci_upper,group_var,residual_var,significance,fit_status,used_fixed_model,aic,bic,loglik,r_squared,fdr_pval_global
0,mean_AreaShape_MeanRadius,Intercept,1.706725,0.0,1.700012,1.713438,,0.008347,****,fallback_ols_fixed_batch,True,-9420.380633,-9381.473853,4716.190317,0.198731,
1,mean_AreaShape_MeanRadius,dNLS_DOX,-0.024686,9.557192e-21,-0.029844,-0.019529,,0.008347,****,fallback_ols_fixed_batch,True,-9420.380633,-9381.473853,4716.190317,0.198731,9.557192e-21


In [28]:

with PdfPages(f"{save_path}/cell_profiler_dNLS_p_bodies_LASM14A.pdf") as pdf:
    for cp_feature_col in measures_to_plot:
        if cp_feature_col in cp_measurements.columns:
            
            plot_cp_feature_grouped_by_gene(
                cp_measurements,
                cp_feature_col=cp_feature_col,
                group_col='group',
                patient_col="batch",
                color_mapping=PlotConfig().COLOR_MAPPINGS_DOX,
                model_results_df=results_df_LSM14,
                pdf_file=pdf  
            )


In [29]:
print("Done!")

Done!
