# Select cells from npz files: output from DeepProfiler

This notebook is intended to retrieve single-cells from npz files and join them into a dataframe. 

These npz files are organized as a DeepProfiler output:

```
    |- outputs
    |   |- results
    |   |   |- features
    |   |   |   |- <plate_name>
    |   |   |   |   |- <well_name>
    |   |   |   |   |   |- <site_number.npz>
    |   |   |   |- <plate_name>
    |   |   |   |   |- <well_name>
    |   |   |   |   |   |- <site_number.npz>
    ...
```

Requirements:
- DeepProfiler output after feature extraction/profiling;
- index.csv file (same used for profiling with DeepProfiler);

Environment:
- python 3.9
- latest version of pycytominer 
    - `pip install git+https://github.com/cytomining/pycytominer`

In [1]:
import pandas as pd
import numpy as np
import os
import easygui as eg

from pycytominer.cyto_utils import DeepProfiler_processing

# Inputs

In [2]:
PROJECT_ROOT = eg.diropenbox(msg="Choose an output folder", default=r"D:")
print('Path to save the single cell file', PROJECT_ROOT)

Path to save the single cell file D:\2022_10_04_AgNPCellRecovery_fossa_Cimini\workspace\deepprofiler\2023_04_25_CNN_CellPainting_GFPRNA\profiles


In [None]:
EXPERIMENT = r"\experiment_name"
PROFILE_DIR = PROJECT_ROOT + EXPERIMENT + r"\outputs\results\features"
OUTPUT_ROOT = PROJECT_ROOT + EXPERIMENT + r"\profiles"
META_FILE = PROJECT_ROOT + EXPERIMENT + r"\inputs/metadata/index.csv"

In [3]:
PROJECT_ROOT = r"D:\2022_10_04_AgNPCellRecovery_fossa_Cimini\workspace\deepprofiler"
EXPERIMENT = r"\2023_04_25_CNN_CellPainting_GFPRNA"
PROFILE_DIR = PROJECT_ROOT + EXPERIMENT + r"\outputs\results\features"
OUTPUT_ROOT = PROJECT_ROOT + EXPERIMENT + r"\profiles"
META_FILE = PROJECT_ROOT + EXPERIMENT + r"\inputs/metadata/index.csv"

# Load functions

DeepProfilerData: This class holds all functions needed to load and annotate the DeepProfiler (DP) run.

SingleCellDeepProfiler: This class holds functions needed to analyze single cells from the DeepProfiler (DP) run.

In [4]:
deep_data = DeepProfiler_processing.DeepProfilerData(META_FILE, PROFILE_DIR, filename_delimiter="/", file_extension=".npz")
deep_single_cell = DeepProfiler_processing.SingleCellDeepProfiler(deep_data)

# Generate single_cells dataframe

In [5]:
df = deep_single_cell.get_single_cells(output=True)

This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce errors!
This program will continue, but be aware that this might induce 

In [6]:
df

Unnamed: 0,Location_Center_X,Location_Center_Y,Metadata_pert_name_replicate,Metadata_Well,Metadata_Site,Metadata_Plate,Metadata_Mito,Metadata_DNA,Metadata_ER,Metadata_RNA,...,efficientnet_662,efficientnet_663,efficientnet_664,efficientnet_665,efficientnet_666,efficientnet_667,efficientnet_668,efficientnet_669,efficientnet_670,efficientnet_671
0,360,105,1,B2,10,220528_102915_Plate_1,220528_102915_Plate_1/B2_01_4_10_CY5_001.tif,220528_102915_Plate_1/B2_01_1_10_DAPI_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,...,0.844353,0.868463,-0.241533,1.480911,1.636794,0.325552,3.284743,1.408531,0.463014,1.110101
1,522,204,1,B2,10,220528_102915_Plate_1,220528_102915_Plate_1/B2_01_4_10_CY5_001.tif,220528_102915_Plate_1/B2_01_1_10_DAPI_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,...,0.848369,0.728093,-0.204476,1.660685,1.567816,0.573118,2.896473,1.803001,0.722755,1.360648
2,920,226,1,B2,10,220528_102915_Plate_1,220528_102915_Plate_1/B2_01_4_10_CY5_001.tif,220528_102915_Plate_1/B2_01_1_10_DAPI_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,...,0.920560,0.753859,-0.110862,1.728471,1.557163,0.529493,3.099245,1.187923,0.643885,0.889264
3,315,244,1,B2,10,220528_102915_Plate_1,220528_102915_Plate_1/B2_01_4_10_CY5_001.tif,220528_102915_Plate_1/B2_01_1_10_DAPI_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,...,0.976197,0.735847,-0.187575,1.629080,1.661231,0.583495,3.063155,1.544220,0.611376,0.776780
4,462,290,1,B2,10,220528_102915_Plate_1,220528_102915_Plate_1/B2_01_4_10_CY5_001.tif,220528_102915_Plate_1/B2_01_1_10_DAPI_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,220528_102915_Plate_1/B2_01_2_10_GFP_001.tif,...,0.775841,1.022027,-0.152649,1.639837,1.662376,0.404519,2.915474,1.580802,0.666315,2.227555
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64038,1043,807,10,G11,8,220609_145227_Plate_1,220609_145227_Plate_1/G11_01_4_8_CY5_001.tif,220609_145227_Plate_1/G11_01_1_8_DAPI_001.tif,220609_145227_Plate_1/G11_01_2_8_GFP_001.tif,220609_145227_Plate_1/G11_01_2_8_GFP_001.tif,...,0.920263,1.002181,-0.158819,2.230305,1.588596,0.485931,2.257285,2.406876,0.516575,1.595753
64039,201,224,10,G11,9,220609_145227_Plate_1,220609_145227_Plate_1/G11_01_4_9_CY5_001.tif,220609_145227_Plate_1/G11_01_1_9_DAPI_001.tif,220609_145227_Plate_1/G11_01_2_9_GFP_001.tif,220609_145227_Plate_1/G11_01_2_9_GFP_001.tif,...,0.828373,0.866036,-0.145758,1.499509,1.470202,0.799537,2.608975,0.997253,0.682500,1.411311
64040,728,335,10,G11,9,220609_145227_Plate_1,220609_145227_Plate_1/G11_01_4_9_CY5_001.tif,220609_145227_Plate_1/G11_01_1_9_DAPI_001.tif,220609_145227_Plate_1/G11_01_2_9_GFP_001.tif,220609_145227_Plate_1/G11_01_2_9_GFP_001.tif,...,0.920129,0.711448,-0.029279,1.378243,1.337997,0.275439,3.137587,0.915618,0.548644,1.507946
64041,399,773,10,G11,9,220609_145227_Plate_1,220609_145227_Plate_1/G11_01_4_9_CY5_001.tif,220609_145227_Plate_1/G11_01_1_9_DAPI_001.tif,220609_145227_Plate_1/G11_01_2_9_GFP_001.tif,220609_145227_Plate_1/G11_01_2_9_GFP_001.tif,...,0.721044,1.117528,-0.080933,1.769346,1.331954,0.288646,2.946376,1.538728,0.405498,1.602361


# Export

In [14]:
df.to_csv(OUTPUT_ROOT + r'/' + EXPERIMENT + r'single_cells.csv', index=False)
print('Successfully exported to:', OUTPUT_ROOT + r'/' + EXPERIMENT + r'single_cells.csv')

Successfully exported to: D:\2022_10_04_AgNPCellRecovery_fossa_Cimini\workspace\deepprofiler\2023_04_25_CNN_CellPainting_GFPRNA\profiles/\2023_04_25_CNN_CellPainting_GFPRNAsingle_cells.csv
