<h1>Snakemake Tutorial<h1>

Snakemake is a workflow engine that provides a readable Python-based workflow definition language and a powerful execution environment that scales from single-core workstations to compute clusters without modifying the workflow. <br>
https://carpentries-incubator.github.io/workflows-snakemake/setup.html

<h3>Install Python / Anaconda<h3> 

1 Visit the __[Anaconda download page](https://www.anaconda.com/download)__ . <br>
2 Select your operating system (Windows). <br>
3 Download the Python 3 64-bit graphical installer. <br>
4 After the download completes, run the installer to install Anaconda. <br>

<h3>Updating Anaconda<h3>

Once Anaconda is installed, it is a good idea to update it. <br>
It boils down to opening an Anaconda terminal and running the command:

`conda update --all`

<h3> Install snakemake-minimal for window <h3>

`conda install -c bioconda -c conda-forge snakemake-minimal`

<h3>Manual Data Processing Workflow<h3>

In Anaconda Terminal: <br>
Use `cd` to change directory to the folder where your python file is. <br>
For example: `cd c:\Jade\thesis\data\pipline`

In [7]:
pwd

'c:\\Jade\\thesis\\data\\pipline'

In Anaconda Terminal: <br>
`python your_python_file.py your_input_file your_output_file` <br>
This would run your python file, but we have many input file. <br>
Is there a better way?

<h3> use Snakemake to write a simple workflow <h3>

1 the components of a Snakefile: rules, inputs, outputs, and actions. <br>
2 Run Snakemake from the Anaconda Terminal.

<h4>I. Extract feature <h4>

Create a file, called Snakefile, with the following content: <br>
![image-4.png](attachment:image-4.png)
    

In the 'nucleoid_feature_extraction.py' file: <br>
![image-2.png](attachment:image-2.png)
![image.png](attachment:image.png)
![image-3.png](attachment:image-3.png)

`sys.argv` is a list in Python, which contains the command-line arguments passed to the script. <br>
The first element (sys.argv[0]) is the script name itself,<br>
 and the following elements contain the arguments provided when running the script.

In Anaconda Terminal: <br>
`snakemake --cores` <br>
You can see a new folder '02nucleid_feature' and the '230516-MG1655-M9glu-DAPI_nucleoid_feature.pkl' inside the 02 folder.

In [2]:
from snakemake.io import *

Get wildcard values with glob_wildcards()

In [3]:
glob_wildcards('01image/{image}.tif')

Wildcards(image=['230516-MG1655-M9glu-DAPI_XY01_DAPI', '230516-MG1655-M9glu-DAPI_XY01_phase', '230516-MG1655-M9glu-DAPI_XY02_DAPI', '230516-MG1655-M9glu-DAPI_XY02_phase', '230516-MG1655-M9glu-DAPI_XY03_DAPI', '230516-MG1655-M9glu-DAPI_XY03_phase', '230516-MG1655-M9glu-DAPI_XY04_DAPI', '230516-MG1655-M9glu-DAPI_XY04_phase', '230516-MG1655-M9glu-DAPI_XY05_DAPI', '230516-MG1655-M9glu-DAPI_XY05_phase', '230516-MG1655-M9glu-DAPI_XY06_DAPI', '230516-MG1655-M9glu-DAPI_XY06_phase', '230516-MG1655-M9glu-DAPI_XY07_DAPI', '230516-MG1655-M9glu-DAPI_XY07_phase', '230516-MG1655-M9glu-DAPI_XY08_DAPI', '230516-MG1655-M9glu-DAPI_XY08_phase', '230516-MG1655-M9glu-DAPI_XY09_DAPI', '230516-MG1655-M9glu-DAPI_XY09_phase', '230516-MG1655-M9glu-DAPI_XY10_DAPI', '230516-MG1655-M9glu-DAPI_XY10_phase', '230516-MG1655-M9glu-DAPI_XY11_DAPI', '230516-MG1655-M9glu-DAPI_XY11_phase', '230516-MG1655-M9glu-DAPI_XY12_DAPI', '230516-MG1655-M9glu-DAPI_XY12_phase', '230516-MG1655-M9glu-DAPI_XY13_DAPI', '230516-MG1655-M9glu-

In [7]:
glob_wildcards('01image/{image}.tif').image


['230516-MG1655-M9glu-DAPI_XY01_DAPI',
 '230516-MG1655-M9glu-DAPI_XY01_phase',
 '230516-MG1655-M9glu-DAPI_XY02_DAPI',
 '230516-MG1655-M9glu-DAPI_XY02_phase',
 '230516-MG1655-M9glu-DAPI_XY03_DAPI',
 '230516-MG1655-M9glu-DAPI_XY03_phase',
 '230516-MG1655-M9glu-DAPI_XY04_DAPI',
 '230516-MG1655-M9glu-DAPI_XY04_phase',
 '230516-MG1655-M9glu-DAPI_XY05_DAPI',
 '230516-MG1655-M9glu-DAPI_XY05_phase',
 '230516-MG1655-M9glu-DAPI_XY06_DAPI',
 '230516-MG1655-M9glu-DAPI_XY06_phase',
 '230516-MG1655-M9glu-DAPI_XY07_DAPI',
 '230516-MG1655-M9glu-DAPI_XY07_phase',
 '230516-MG1655-M9glu-DAPI_XY08_DAPI',
 '230516-MG1655-M9glu-DAPI_XY08_phase',
 '230516-MG1655-M9glu-DAPI_XY09_DAPI',
 '230516-MG1655-M9glu-DAPI_XY09_phase',
 '230516-MG1655-M9glu-DAPI_XY10_DAPI',
 '230516-MG1655-M9glu-DAPI_XY10_phase',
 '230516-MG1655-M9glu-DAPI_XY11_DAPI',
 '230516-MG1655-M9glu-DAPI_XY11_phase',
 '230516-MG1655-M9glu-DAPI_XY12_DAPI',
 '230516-MG1655-M9glu-DAPI_XY12_phase',
 '230516-MG1655-M9glu-DAPI_XY13_DAPI',
 '230516-MG16

Generating file names with expand()

In [9]:
IMG_FILE = '01image/{img}_DAPI.tif'

# Build the list of image names. 
IMG_NAME = glob_wildcards(IMG_FILE).img

expand('01image/{image}_DAPI.tif', image = IMG_NAME)

['01image/230516-MG1655-M9glu-DAPI_XY01_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY02_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY03_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY04_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY05_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY06_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY07_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY08_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY09_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY10_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY11_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY12_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY13_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY14_DAPI.tif',
 '01image/230516-MG1655-M9glu-DAPI_XY15_DAPI.tif']

In [46]:
import numpy as np

IMG_FILE = '01image/{img}_DAPI.tif'

# Build the list of image names. 
IMG_NAME = glob_wildcards(IMG_FILE).img

# The list of all dat files
IMGS = expand('01image/{img}_DAPI.tif', img = IMG_NAME)

# Extract common string from image filenames
file_name = np.unique(np.array([filename.split('_')[0] for filename in IMG_NAME]))

# Print the result
print(file_name)


['230516-MG1655-M9glu-DAPI']


<h4>II. Generate plot <h4>

Before generating plot, we need to delete all the output file, so we can re-run snakemake. <br>
Add 'rule clean' and 'rule make_plot' in the snakefile.
![image.png](attachment:image.png)
![image-3.png](attachment:image-3.png)

In the 'nucleoid_plot.py' file:
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

In Anaconda Terminal: <br>
`snakemake clean --cores` <br>
All your output file will be deleted.

Re-run snakemake <br>
`snakemake --cores` <br>
You can see the 'nucleoid_feature.pkl' file and 'nucleoid_PCA.tif' shown in 02 and 03 folder. 