WayScience · roshankern · Jun 23, 2022 · Jun 22, 2022 · Jun 22, 2022 · Jun 22, 2022
diff --git a/.gitignore b/.gitignore
@@ -14,3 +14,13 @@ __pycache__/
 1.preprocess_data/labeled_frames_preprocessed
 #segmentation data
 2.segment_nuclei/segmented
+#DeepProfiler repo
+3.extract_features/DeepProfiler
+#DeepProfiler project images
+3.extract_features/inputs/images
+#DeepProfiler project locations
+3.extract_features/inputs/locations
+#DeepProfiler outputs
+3.extract_features/outputs/efn_pretrained/features
+3.extract_features/outputs/efn_pretrained/logs
+3.extract_features/outputs/efn_pretrained/summaries
diff --git a/2.segment_nuclei/README.md b/2.segment_nuclei/README.md
@@ -5,7 +5,7 @@ In this module, we present our pipeline for segmenting nuclei from the mitosis m
 
 ### Segmentation
 
-We use the CellPose nucleus model to segment the nuclei from each mitosis movie. 
+We use the CellPose segmentation algorithim to segment the nuclei from each mitosis movie. 
 CellPose was first introduced in [Stringer, C., Wang, T., Michaelos, M. et al., 2020](https://doi.org/10.1038/s41592-020-01018-x) and we use the [python implementation](https://github.com/mouseland/cellpose).
 
 Stringer et al. trained the CellPose segmentation models on a diverse set of cell images and is therefore a good selection for our use case.

diff --git a/3.extract_features/3.compile_deepprof__training_proj.sh b/3.extract_features/3.compile_deepprof__training_proj.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+jupyter nbconvert --to python compile_deepprof_training_proj.ipynb
+python compile_deepprof_training_proj.py
diff --git a/3.extract_features/3.feature_extraction_env.yml b/3.extract_features/3.feature_extraction_env.yml
@@ -0,0 +1,7 @@
+name: 3.feature_extraction_mitocheck
+channels:
+  - conda-forge
+dependencies:
+  - conda-forge::python=3.8.13
+  - conda-forge::jupyter=1.0.0
+  - conda-forge::pandas=1.4.2
diff --git a/3.extract_features/README.md b/3.extract_features/README.md
@@ -0,0 +1,67 @@
+# 3. Extract Features
+
+In this module, we present our pipeline for extracting features from the mitosis movies.
+### Feature Extraction
+
+We use [DeepProfiler](https://github.com/cytomining/DeepProfiler), commit [`2fb3ed3`](https://github.com/cytomining/DeepProfiler/commit/2fb3ed3027cded6676b7e409687322ef67491ec7), to extract features from the mitosis movies. 
+
+We use a [pretrained model](https://github.com/broadinstitute/luad-cell-painting/tree/main/outputs/efn_pretrained/checkpoint) from the [LUAD Cell Painting repository](https://github.com/broadinstitute/luad-cell-painting) with DeepProfiler.
+[Caicedo et al., 2022](https://www.molbiolcell.org/doi/10.1091/mbc.E21-11-0538) trained this model to extract features from Cell Painting data.
+This model extracts features from the DNA (nuclei) channel and is thus a good selection for our use case.
+
+## Step 1: Setup Feature Extraction Environment
+
+### Step 1a: Create Feature Extraction Environment
+
+```sh
+# Run this command to create the conda environment for feature extraction
+conda env create -f 3.feature_extraction_env.yml
+```
+
+### Step 1b: Activate Feature Extraction Environment
+
+```sh
+# Run this command to activate the conda environment for Deep Profiler feature extraction
+
+conda activate 3.feature_extraction_mitocheck
+```
+
+## Step 2: Install DeepProfiler
+
+### Step 2a: Clone Repository
+
+Clone the DeepProfiler repository into 3.extract_features/ with 
+
+```console
+# Make sure you are located in 3.extract_features/
+cd 3.extract_features/
+git clone https://github.com/cytomining/DeepProfiler.git
+```
+
+### Step 2b: Install Repository
+
+Install the DeepProfiler repository with
+
+```console
+cd DeepProfiler/
+pip install -e .
+```
+
+### Step 2c (Optional): Complete Tensorflow GPU Setup
+
+If you would like use Tensorflow GPU when using DeepProfiler, follow [these instructions](https://www.tensorflow.org/install/pip#3_gpu_setup) to complete the Tensorflow GPU setup.
+We use Tensorflow GPU while processing mitocheck data.
+
+## Step 3: Compile DeepProfiler Project
+
+```bash
+# Run this script to compile the DeepProfiler project
+bash 3.compile_deepprof__training_proj.sh
+```
+
+## Step 4: Extract Features with DeepProfiler
+
+```sh
+# Run this script to extract features with DeepProfiler
+python3 -m deepprofiler --gpu 0 --exp efn_pretrained --root ./ --config mitocheck_profiling_config.json profile
+```
diff --git a/3.extract_features/compile_deepprof_training_proj.ipynb b/3.extract_features/compile_deepprof_training_proj.ipynb
@@ -0,0 +1,236 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# DeepProfiler Project Compiler\n",
+    "### Compile a [DeepProfiler Project](https://cytomining.github.io/DeepProfiler-handbook/docs/2.%20Project%20structure.html) from training data\n",
+    "\n",
+    "\n",
+    "#### Import libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import pathlib\n",
+    "\n",
+    "from PIL import Image\n",
+    "import shutil"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Define Functions for Compiling Project"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def get_gene(plate: str, well:str, annoations: pd.DataFrame) -> str:\n",
+    "    \"\"\"get gene for a particular well from a particular plate\n",
+    "\n",
+    "    Args:\n",
+    "        plate (string): plate name\n",
+    "        well (string): well name\n",
+    "        annoations (pandas.DataFrame): annoations loaded from screen metadata annotations.csv.gz file\n",
+    "\n",
+    "    Returns:\n",
+    "        string: gene targeted to be changed for this particular plate\n",
+    "    \"\"\"\n",
+    "    target_gene = annoations[(annoations[\"Plate\"]==plate) & (annoations[\"Well Number\"]==str(int(well)))][\"Original Gene Target\"].item()\n",
+    "    if str(target_gene) == \"nan\":\n",
+    "        target_gene = \"failed_QC\"\n",
+    "    return target_gene\n",
+    "\n",
+    "def compile_index_csv(preproc_training_path: pathlib.Path, annotations_path: pathlib.Path, save_path: pathlib.Path):\n",
+    "    \"\"\"compile index.csv from training data used by DeepProfiler, save index.csv to save_path\n",
+    "\n",
+    "    Args:\n",
+    "        preproc_training_path (pathlib.Path): path to preprocessed images folder\n",
+    "        annotations_path (pathlib.Path): path to screen annotations.csv.gz file\n",
+    "        save_path (pathlib.Path): path to save folder for index.csv file\n",
+    "    \"\"\"\n",
+    "    index_csv_data = []\n",
+    "    annoations = pd.read_csv(annotations_path, compression='gzip', dtype=object)\n",
+    "    for plate_path in preproc_training_path.iterdir():\n",
+    "        for well_path in plate_path.iterdir():\n",
+    "            for frame_path in well_path.iterdir():\n",
+    "                for file_path in frame_path.iterdir():\n",
+    "                    index_csv_line = {\n",
+    "                        \"Metadata_Plate\": plate_path.name, \n",
+    "                        \"Metadata_Well\": f\"{well_path.name}_{frame_path.name}\", \n",
+    "                        \"Metadata_Site\": 1, \n",
+    "                        \"Plate_Map_Name\": f\"{plate_path.name}_{well_path.name}_{frame_path.name}\",\n",
+    "                        \"DNA\": f\"{plate_path.name}/{well_path.name}/{frame_path.name}/{file_path.name}\",\n",
+    "                        \"Gene\": get_gene(plate_path.name, well_path.name, annoations),\n",
+    "                        \"Gene_Replicate\": 1\n",
+    "                        }\n",
+    "                    index_csv_data.append(index_csv_line)\n",
+    "    index_csv_data = pd.DataFrame(index_csv_data)\n",
+    "    save_path.parents[0].mkdir(parents=True, exist_ok=True)\n",
+    "    index_csv_data.to_csv(save_path, index=False)\n",
+    "    \n",
+    "def compile_training_locations(index_csv_path: pathlib.Path, segmentations_path: pathlib.Path, save_path: pathlib.Path):\n",
+    "    \"\"\"compile well_frame-site-Nuclei.csv file with cell locations, save to in save_path/plate/ folder\n",
+    "\n",
+    "    Args:\n",
+    "        index_csv_path (pathlib.Path): path to index.csv file for DeepProfiler project\n",
+    "        segmentations_path (pathlib.Path): path to segmentations folder with .tsv locations files\n",
+    "        save_path (pathlib.Path): path to save location files\n",
+    "    \"\"\"\n",
+    "    index_csv = pd.read_csv(index_csv_path)\n",
+    "    for index, row in index_csv.iterrows():\n",
+    "        plate = row[\"Metadata_Plate\"]\n",
+    "        well_frame = row[\"Metadata_Well\"]\n",
+    "        well = well_frame.split(\"_\")[0]\n",
+    "        frame = well_frame.split(\"_\")[1]\n",
+    "        site = row[\"Metadata_Site\"]\n",
+    "        \n",
+    "        frame_segments_path = pathlib.Path(f\"{segmentations_path}/{plate}/{well}/{frame}/{plate}_{well}_{frame}.tsv\")\n",
+    "        try:\n",
+    "            frame_segments = pd.read_csv(frame_segments_path, delimiter=\"\\t\")\n",
+    "            frame_segments = frame_segments[['Location_Center_X', 'Location_Center_Y']]\n",
+    "            frame_segments = frame_segments.rename(columns={'Location_Center_X': 'Nuclei_Location_Center_X', 'Location_Center_Y': 'Nuclei_Location_Center_Y'})\n",
+    "            \n",
+    "            locations_save_path = pathlib.Path(f\"{save_path}/{plate}/{well_frame}-{site}-Nuclei.csv\")\n",
+    "            locations_save_path.parents[0].mkdir(parents=True, exist_ok=True)\n",
+    "            frame_segments.to_csv(locations_save_path, index=False)\n",
+    "        except FileNotFoundError:\n",
+    "            print(f\"No tsv for {frame_segments_path}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Compile index.csv file\n",
+    "\n",
+    "DeepProfiler expects to find an index.csv file with metadata for the images that need to be processed.\n",
+    "In this step we compile that index.csv file and save it to inputs/metadata/index.csv"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Done compiling index.csv!\n"
+     ]
+    }
+   ],
+   "source": [
+    "preproc_training_path = pathlib.Path(\"../1.preprocess_data/labeled_frames_preprocessed/\")\n",
+    "annoations_path = pathlib.Path(\"idr0013-screenA-annotation.csv.gz\")\n",
+    "save_path = pathlib.Path(\"inputs/metadata/index.csv\")\n",
+    "compile_index_csv(preproc_training_path, annoations_path, save_path)\n",
+    "print(\"Done compiling index.csv!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Copy images to DeepProfiler Project\n",
+    "\n",
+    "DeepProfiler expects to find the images that need to be processed in inputs/images/.\n",
+    "In this step we copy the preprocessed frames from the 1.preprocess_data module to inputs/images/."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Done copying images!\n"
+     ]
+    }
+   ],
+   "source": [
+    "preproc_training_path = pathlib.Path(\"../1.preprocess_data/labeled_frames_preprocessed/\")\n",
+    "deepprof_images_path = pathlib.Path(\"inputs/images/\")\n",
+    "shutil.copytree(preproc_training_path, deepprof_images_path)\n",
+    "print(\"Done copying images!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Compile Training Locations Data\n",
+    "\n",
+    "DeepProfiler expects to find nuclei location data in inputs/locations/Plate/WellName-Site-Nuclei.csv\n",
+    "In this step we compile the location data files and save them to their respective locations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "No tsv for ../2.segment_nuclei/segmented/LT0144_01/166/68/LT0144_01_166_68.tsv\n",
+      "No tsv for ../2.segment_nuclei/segmented/LT0109_38/349/25/LT0109_38_349_25.tsv\n",
+      "No tsv for ../2.segment_nuclei/segmented/LT0013_42/107/39/LT0013_42_107_39.tsv\n",
+      "Done compiling locations!\n"
+     ]
+    }
+   ],
+   "source": [
+    "index_csv_path = pathlib.Path(\"inputs/metadata/index.csv\")\n",
+    "segmentations_path = pathlib.Path(\"../2.segment_nuclei/segmented/\")\n",
+    "save_path = pathlib.Path(\"inputs/locations/\")\n",
+    "compile_training_locations(index_csv_path, segmentations_path, save_path)\n",
+    "print(\"Done compiling locations!\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.8.13 ('3.feature_extraction_mitocheck')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.13"
+  },
+  "orig_nbformat": 4,
+  "vscode": {
+   "interpreter": {
+    "hash": "aff5294438fc2797d595e1ff21d50e9f93b16a791927dfa0e016f7db3c3fedca"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}