# Load_data/metadata generator
Written by Fernanda Fossa @fefossa

Python 3.9.12

**Description**: This notebook generates a CSV file to be used by CellProfiler module LoadData, using a list of the images to be analyzed locally.

**1. Inputs**: it requires the following inputs:

- Path to the input folder (where images are located) = Make sure to have only the images that will be analyzed in this folder.

- project_name = this is the name of your bigger project.

- subproject = this is the name of each subproject inside project_name folder. You can have different subprojects within one project. 

- Channels dictionary = Dictionary with Channel as a key and the name you want to give as a Value. Example: 'DAPI':'OrigDNA'. DAPI is the channel (written in the image name), and OrigDNA is the name we want to give to this image in CellProfiler. We have three options already available to choose from (Cell Painting, Live Cell Painting, ToxPath panels); if yours is different, please provide a new dictionary. 

**Outputs**: 

1. **load_data.csv** that we use with Illumination Correction pipelines. 

    To extract these informations, we are using a regex adapted to files from Cytation 5 microscope (B10_02_1_10_GFP_001.tif), where the location of the Well, Site and Channel is known. If you have images from different microscope, with a different pattern, you'd need to **change the regex**. 

    - FileName_CHANNEL = CHANNEL as the value you provided in the dictionary. It will extract the names of the images from the input folder. 

    - PathName_CHANNEL = containing a specific path to AWS (which will be used later in the virtual machines). To change that you would need to modify the images_dir variable.

    - Metadata_Well and Metadata_Site = both are extracted from the image filename. 

    - Metadata_Plate = usually the name of the plate is the name of the FOLDER where the images are located with. Notice that we also have a regex for the plate name. We replace any spaces with "_" because AWS does not handle spaces well.

2. **load_data_with_illum.csv** that we use for analysis pipelines after Illum Correction was performed. 

    The columns are the same as above, with two additional columns per channel:

    - FileName_IllumCHANNEL = the name of the Illum Correction file (the name pattern is "PlateName_IllumCHANNEL.npy")

    - PathName_IllumCHANNEL = the path in AWS where these Illum files will be located. Again this is a pattern for who's using AWS with Distributed-CellProfiler.

    





## Import libraries

In [50]:
path_to_scripts = r"C:\Users\Fer\Documents\GitHub\fefossa"

In [51]:
import pandas as pd
import easygui as eg
import sys

sys.path.append(path_to_scripts)

from LoadDataGenerator.notebook import load_data_utils

%load_ext autoreload
%autoreload 2

# 1. Inputs

In [68]:
input_folder = eg.diropenbox('Paste path to input folder and press OK', 'Paste Path')
print(input_folder)

F:\2022_10_04_AgNPCellRecovery_fossa_Cimini\2022_05_25_LiveCellPainting\images\220526_084043_Plate_1


## 1.1 Dictionary with channel as a key and new name as a value

###  1.1.1 Create your own dictionary

- Enter first the name of the Channel, and then the new name. When you finish, just write **done** and press ok.

In [77]:
dirlist = eg.enterbox("Enter the NUMBER of channels in your assay:")
number_inputs = [x+1 for x in range(int(dirlist))] 
channel_inputs = eg.multenterbox("Write the name of each channel like on the filename (DAPI, GFP, etc):", "Channels", number_inputs)
names_inputs = eg.multenterbox("Write a name to represent each channel:", "Names", channel_inputs)
list_channels = [s.replace(" ", "") for s in channel_inputs]
list_names = [s.replace(" ", "") for s in names_inputs]
ch_dic = dict(zip(list_channels, list_names))
print(ch_dic)

{'GFP': 'AOGFP'}


### 1.1.2 OR Run one of the cells below to use our pre-made dictionaries

- We have dictionaries for Cell Painting, Live Cell Painting, and ToxPath image panels names. 

- Run **ONLY ONE OF THE CELLS BELOW**

#### CELL PAINTING

In [75]:
ch_dic = {'DAPI':'OrigDNA', 'GFP':'OrigER', 'PropidiumIodide':'OrigAGP', 'CY5':'OrigMito'}
print(ch_dic)

{'DAPI': 'OrigDNA', 'GFP': 'OrigER', 'PropidiumIodide': 'OrigAGP', 'CY5': 'OrigMito'}


#### LIVE CELL PAINTING

In [6]:
ch_dic = {'GFP':'AOGFP', 'PropidiumIodide':'AOPI'}
print(ch_dic)

{'GFP': 'AOGFP', 'PropidiumIodide': 'AOPI'}


## Load data generator

- Run next cell to generate the load data using regex, filenames and folder names. Both files will be saved in the input folder inside a load_data_csv folder.

- IMPORTANT: this will run and generate load_data.csv and load_data_with_illum.csv because we have True and False in the illum_list. **load_data.csv only = False**
and **load_data_with_illum.csv only = True**.

In [80]:
load_data_utils.generate_load_data(input_folder, ch_dic)

