# Transfer CellProfiler and Cytomining inputs

For this demonstration, we will use the image data, metadata, and [CellProfiler](https://cellprofiler.org/) pipelines from:

> [Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations](https://www.biorxiv.org/content/10.1101/2022.01.05.475090v1), Chandrasekaran et al., 2022

In this notebook, we transfer several CellProfiler and Cytomining inputs to Google Cloud Storage.

<div class="alert alert-block alert-info">
<b>Note: you don't need to run this notebook.</b> The workflows that you run in your clone of the featured workspace can read the input files directly from the featured workspace bucket. No need to make your own copy of the input files, unless you would like to.
</div>


# Setup

In [None]:
import os

## Define constants

In [None]:
#---[ Inputs ]---
SOURCE_WORKSPACE_BUCKET = 'gs://fc-e1e6b6ac-3d52-4041-964d-43ce9beb3352'
IMAGES = F'{SOURCE_WORKSPACE_BUCKET}/source_4_images/images/2020_11_04_CPJUMP1/images/'

#---[ Outputs ]---
# Use this folder in the workspace bucket for pe2loaddata configuration.
PE2LOADDATA_CONFIG_DESTINATION = os.path.join(os.getenv('WORKSPACE_BUCKET'), 'pe2loaddata_config')
# Use this folder in the workspace bucket for CellProfiler pipeline definition files.
CPPIPE_DESTINATION = os.path.join(os.getenv('WORKSPACE_BUCKET'), 'cellprofiler_pipelines')
# Use this folder in the workspace bucket for the plate maps.
PLATE_MAP_DESTINATION = os.path.join(os.getenv('WORKSPACE_BUCKET'), 'plate_maps')
# Use this folder in the workspace bucket for the images.
IMAGE_DESTINATION = IMAGES.replace(SOURCE_WORKSPACE_BUCKET, os.getenv('WORKSPACE_BUCKET'))

# Create pe2loaddata config file

In [None]:
%%writefile chandrasekaran_config.yml

channels:
    Alexa 647: OrigMito
    Alexa 568: OrigAGP
    488 long: OrigRNA
    Alexa 488: OrigER
    HOECHST 33342: OrigDNA
    Brightfield H: OrigHighZBF
    Brightfield L: OrigLowZBF
    Brightfield: OrigBrightfield
metadata:
    Row: Row
    Col: Col
    FieldID: FieldID
    PlaneID: PlaneID
    ChannelID: ChannelID
    ChannelName: ChannelName
    ImageResolutionX: ImageResolutionX
    ImageResolutionY: ImageResolutionY
    ImageSizeX: ImageSizeX
    ImageSizeY: ImageSizeY
    BinningX: BinningX
    BinningY: BinningY
    MaxIntensity: MaxIntensity
    PositionX: PositionX
    PositionY: PositionY
    PositionZ: PositionZ
    AbsPositionZ: AbsPositionZ
    AbsTime: AbsTime
    MainExcitationWavelength: MainExcitationWavelength
    MainEmissionWavelength: MainEmissionWavelength
    ObjectiveMagnification: ObjectiveMagnification
    ObjectiveNA: ObjectiveNA
    ExposureTime: ExposureTime

In [None]:
!gsutil cp chandrasekaran_config.yml {PE2LOADDATA_CONFIG_DESTINATION}/

# Transfer CellProfiler pipeline definitions

In [None]:
%%bash

mkdir -p ~/miscGitHub
cd ~/miscGitHub
git clone https://github.com/jump-cellpainting/2021_Chandrasekaran_submitted.git || true

In [None]:
!gsutil -m cp ~/miscGitHub/2021_Chandrasekaran_submitted/pipelines/2020_11_04_CPJUMP1/*.cppipe {CPPIPE_DESTINATION}
!gsutil -m setmeta -h "Content-Type:text/plain" {CPPIPE_DESTINATION}/*.cppipe

# Transfer plate maps

In [None]:
!gsutil cp ~/miscGitHub/2021_Chandrasekaran_submitted/benchmark/output/experiment-metadata.tsv \
    {PLATE_MAP_DESTINATION}/experiment-metadata.tsv
!gsutil setmeta -h "Content-Type:text/plain" {PLATE_MAP_DESTINATION}/experiment-metadata.tsv

In [None]:
%%bash

cd ~/miscGitHub
git clone https://github.com/jump-cellpainting/JUMP-Target.git || true

In [None]:
!gsutil -m cp ~/miscGitHub/JUMP-Target/*.tsv {PLATE_MAP_DESTINATION}
!gsutil -m setmeta -h "Content-Type:text/plain" {PLATE_MAP_DESTINATION}/*.tsv

# Transfer plate images

These were copied to GCS previously so we don't need to pull directly from the source S3 bucket `s3://cellpainting-gallery/cpg0000-jump-pilot/source_4/images/2020_11_04_CPJUMP1/images/`. We put a copy directly into this workspace so that the permissions are correct for anyone looking at this workspace.

In [None]:
!gsutil -m cp -R -n {IMAGES}* {IMAGE_DESTINATION}

In [None]:
!gsutil ls {IMAGE_DESTINATION}** > all_files.txt

In [None]:
!grep -c tiff all_files.txt

In [None]:
27648 * 4

In [None]:
!grep -v tiff all_files.txt

In [None]:
!grep -v tiff all_files.txt | cut -d '/' -f '8-'

# Provenance

In [None]:
%%bash

date

In [None]:
%%bash

pip3 freeze

Copyright 2022 The Broad Institute, Inc. and Verily Life Sciences LLC.

Use of this source code is governed by a BSD-style license that can be found in the LICENSE file or at https://developers.google.com/open-source/licenses/bsd