Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deepprofiler project processing features #23

Merged
merged 3 commits into from
Jan 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions 4_processing_features/4.extract_sc_features.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
#!/bin/bash
jupyter nbconvert --to python extract_single_cell_features.ipynb
python extract_single_cell_features.py
jupyter nbconvert --to python *.ipynb

python extract_sc_features_cp.py

python extract_sc_features_dp.py
2 changes: 1 addition & 1 deletion 4_processing_features/4.processing_features.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ dependencies:
- conda-forge::numpy=1.22
- conda-forge::scikit-learn
- pip:
- git+https://github.com/cytomining/pycytominer@e43be528d3ca6d77d2b1a347fe8e4131dfc65bf0
- git+https://github.com/cytomining/pycytominer@afac3ea16818ad25f37318ecd5c5090c0eff5806
17 changes: 10 additions & 7 deletions 4_processing_features/README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
# 4. Processing Extracted Single Cell Features

In this module, we present our pipeline for processing outputted `.sqlite` file with single cell features from CellProfiler.
The processed features are saved into compressed `.csv.gz` for use during statistical analysis.
In this module, we present our pipeline for processing outputted `.sqlite` file with single cell features from CellProfiler (CP) and DeepProfiler (DP).

The processed CP features are saved into compressed `.csv.gz` and DP features are saved as `.npz` files for use during statistical analysis.

## Pycytominer

We use [Pycytominer](https://github.com/cytomining/pycytominer) to perform the aggregation, merging, and normalization of the NF1 single cell features.
We use [Pycytominer](https://github.com/cytomining/pycytominer) to perform the merging, normalization, and feature selection of the NF1 single cell features.

For more information regarding the functions that we used, please see [the documentation](https://pycytominer.readthedocs.io/en/latest/pycytominer.cyto_utils.html#pycytominer.cyto_utils.cells.SingleCells.merge_single_cells) from the Pycytominer team.

### Normalization

CellProfiler features can display a variety of distributions across cells.
CellProfiler and DeepProfiler features can display a variety of distributions across cells.
To facilitate analysis, we standardize all features (z-score) to the same scale.

---
Expand All @@ -27,15 +28,17 @@ Make sure you are in the `4_processing_features` directory before performing the
conda env create -f 4.processing_features.yml
```

## Step 2: Normalize Single Cell Features
## Step 2: Normalize and Feature Select Single Cell Features

### Step 2a: Set Up Paths

Within the [extract_single_cell_features.ipynb](4_processing_features/extract_single_cell_features.ipynb) notebook, you can chnage the paths to reflect the local paths or names for your machine (***IF* you changed anything from the original pipeline**) for the various parameters (e.g. CellProfiler directory, output directory, path to sqlite file, etc.)
Within the [extract_sc_features_cp.ipynb](4_processing_features/extract_sc_features_cp.ipynb) notebook, you can change the paths to reflect the local paths or names for your machine (***IF* you changed anything from the original pipeline**) for the various parameters (e.g. CellProfiler directory, output directory, path to sqlite file, etc.)

As well, you can update the paths with the [extract_sc_features_dp.ipynb](4_processing_features/extract_sc_features_dp.ipynb) notebook if the paths to the project are different on your local machine.

### Step 2b: Run Extract Single Cell Features

Using the code below, run the notebook to extract and normalize single cell features from CellProfiler.
Using the code below, run the notebook to extract and normalize single cell features from CellProfiler and DeepProfiler.

```bash
# Run this script in terminal
Expand Down
Binary file not shown.
Binary file not shown.
257 changes: 257 additions & 0 deletions 4_processing_features/data/nf1_sc_norm_deepprofiler_cyto.csv.gz

Large diffs are not rendered by default.

258 changes: 258 additions & 0 deletions 4_processing_features/data/nf1_sc_norm_deepprofiler_nuc.csv.gz

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Loading