# Join CSV files from CellProfiler and add prefix

Usage:

1) From ExporttoSpreadsheet, CellProfiler will generate many CSV files, each one containing features from the compartments selected to export.

2)  Those files need to be put into the same table, and containing a prefix added to each feature from that compartment `Nuclei_Intensity..., Cells_Intensity...` and so on.


Output:

- A single CSV file containing the single-cells as rows, and features and metadata as columns.


In [1]:
import pandas as pd
import easygui as eg
import sys

path_scripts = r"C:\Users\Fer\Documents\GitHub"
sys.path.append(path_scripts)

from scripts_notebooks_fossa.profiles import generate_profiles

%load_ext autoreload
%autoreload 2

# 0. Inputs

- `plate` variable is the name of the folder where your CSV files are located;


- The `compartments`list will contain the name of the files of each compartment, as in:


    - Each day of experiment, or `plate` folder will contain a few CSV files, named:
        - Nuclei.csv
        - Cytoplasm.csv
        - Cells.csv

- `list_columns_to_pop` are the prefix of some columns we want to drop before adding compartment name.



In [2]:
path = r'G:\My Drive\Fernanda Mestrado\Paper Mestrado\Redo_Analysis_Paper\analysis\EMT'
compartments = ['Nuclei', 'Cytoplasm', 'Cells']
list_columns_to_pop = ['Metadata', 'FileName', 'PathName']
plate = ['20190808', '20190815']

# 1. Create one df for each plate and save it to a list

In [3]:
all_dfs = []
for pl in plate:
    df_temp = generate_profiles.add_prefix_compartments(compartments, path, pl, list_columns_to_pop)
    all_dfs.append(df_temp)

In [16]:
all_dfs[0]

Unnamed: 0,Metadata_Plate,Metadata_Cell,Metadata_Protein,Metadata_Time,Metadata_Treatment,Nuclei_ImageNumber,Nuclei_ObjectNumber,Nuclei_Children_Cells_Count,Nuclei_Children_Cytoplasm_Count,Nuclei_Intensity_IntegratedIntensityEdge_Nuclei,...,Cells_Texture_SumEntropy_Protein_5_02,Cells_Texture_SumEntropy_Protein_5_03,Cells_Texture_SumVariance_Protein_5_00,Cells_Texture_SumVariance_Protein_5_01,Cells_Texture_SumVariance_Protein_5_02,Cells_Texture_SumVariance_Protein_5_03,Cells_Texture_Variance_Protein_5_00,Cells_Texture_Variance_Protein_5_01,Cells_Texture_Variance_Protein_5_02,Cells_Texture_Variance_Protein_5_03
0,20190808,PC3,Snail,4h,Control,1,1,1,1,9.383932,...,5.456967,5.425059,191.948071,184.907008,193.773311,190.992653,54.236262,57.252439,55.646816,55.234558
1,20190808,PC3,Snail,4h,Control,1,2,1,1,9.886381,...,5.685881,5.595374,206.481644,210.672992,212.047876,177.353652,66.992664,64.325949,64.570203,68.316726
2,20190808,PC3,Snail,4h,Control,1,3,1,1,10.600504,...,6.191680,6.276585,699.369921,574.892043,609.263147,607.001914,188.876571,228.606661,224.089327,211.093882
3,20190808,PC3,Snail,4h,Control,1,4,1,1,6.309667,...,7.196163,7.091717,3163.062301,2715.307388,3174.295046,2258.462393,1172.011048,1171.499796,1158.763365,1176.284750
4,20190808,PC3,Snail,4h,Control,1,5,1,1,8.120485,...,5.798726,5.719633,274.100819,256.785017,295.645530,250.872001,90.022875,89.616504,88.714423,96.230384
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70946,20190808,PC3,Zeb1,24h,TGFB,565,227,1,1,9.798734,...,7.257031,7.199247,2815.747000,1965.719881,3050.380301,3009.828413,1058.997270,1143.503895,1061.394403,997.975600
70947,20190808,PC3,Zeb1,24h,TGFB,565,228,1,1,11.393729,...,6.681632,6.586075,1001.192577,839.096011,1074.796814,896.275075,385.828305,387.501275,373.091782,385.182157
70948,20190808,PC3,Zeb1,24h,TGFB,565,229,1,1,9.153460,...,4.751839,4.459987,43.815322,39.613620,50.735802,37.634574,19.110248,18.220444,18.222840,18.804477
70949,20190808,PC3,Zeb1,24h,TGFB,565,230,1,1,9.635401,...,7.129989,6.791806,2873.797994,2606.589985,3122.932393,2301.302055,1241.665078,1294.778155,1192.754576,1266.990407


# 2. Join all dfs into a single df

In [9]:
df = pd.concat(all_dfs, axis='index').reset_index(drop=True)

# 3. Export

In [14]:
output_name = 'EMT' + '_SingleCells'

In [15]:
output_path = eg.diropenbox(msg="Choose an output folder", default=r"G:")
df.to_csv(fr"{output_path}/{output_name}.csv", index=False)
print('Successfully exported to:', fr"{output_path}/{output_name}.csv")

Successfully exported to: C:\Users\Fer\Desktop/EMT_SingleCells.csv


# APPENDIX 

## Split df by something

If you need, for some reason, to analyze only a portion of your df, because it contains two completely different proteins in the same df, you can use something like a query, and export the df in parts.

In [35]:
proteinX = df.query(f"Metadata_Protein in 'proteinX'").reset_index(drop=True)

In [37]:
output_name = 'proteinX'
proteinX.to_csv(output_path + r'/' + output_name + '.csv')
print('Successfully exported to:', output_path + r'/' + output_name + '.csv')

Successfully exported to: G:\My Drive\Fernanda Mestrado\Paper Mestrado\Redo_Analysis_Paper\profiles\EMT/EMT_Snail.csv
