This repo provide several demos of image analysis and simulation used in SEER-FISH. These functions can be used in images analysis for bacterial/microbiome community, FISH probe design, SEER-FISH simulation and other appropriate situations.
For image analysis, a submodule nd2reader
is used for loading .nd2
files captured by microscopy from Nikon. It is easy to modify the code for analyzing the images in other format such as .tif
.
For FISH probe design, we proved a prokka based script to extract the 16S and 23S sequence from WGS. We also proved a auto-submit script to calculated the Gibbs Free energy change during probe binding to corresponding rRNA on mathFISH.
For simulation used in SEER-FISH, we used a MATLAB
script to evaluate the F1 score when applying SEER-FISH with a specific set of rRNA probes based on the hybridization efficiency given by mathFISH.
For analysis of the sequencing data, we used botiwe2 to align the sequencing data with the reference sequence and extract the OTU table.
Thanks to @Junyu for his help on probe design. Thanks to @Yuxiang for his help on amplicon sequencing data analysis.
- Cao, Z., Zuo, W., Wang, L. et al. Spatial profiling of microbial communities by sequential FISH with error-robust encoding. Nat Commun 14, 1477 (2023). https://doi.org/10.1038/s41467-023-37188-3
Using the following commander to align multiple images in image stack Image
.
[ImageAligned] = Alignment(Image)
The image stack will be aligned to the position with maximal cross correlation. The aligned image stack will be return as ImageAligned
.
To align other image stacks with the same position as Image
, you can use,
[ImageAligned,Image2Aligned,Image3Aligned...] = Alignment(Image,Image2,Image3,..)
To segment each bacterial cell in the image, you can use ImageSegment
function. This function segment image based on auto threshold and watershed method.
[SegmentedBWImage] = ImageSegment(Image)
The Image
input should be dark phase contrast image, i.e. bacteria cell is darker than the background. To segment fluorescent image, you can use the reversed image instead.
You can adjust the segmentation with Parameter as below. These parameters are defined in the table below.
[SegmentedBWImage] = ImageSegment(Image, 'Parameter', value)
Parameters | Value |
---|---|
sePattern |
The pattern used in imerode and imdilate . For example, strel('disk',1) . |
AreaRange |
The range of the area of the bacteria cells. The patterns with area out of range in the final segmented image will be dropped. For example [10,1000] . |
Threshold |
The threshold for generating the binary image. When Threshold is set as 'Auto' , the threshold will be generated according to the image. For example, 'Auto' , 0.5 . |
Effectiveness |
The effectiveness for auto threshold. If the auto threshold gives a effectiveness less than the given value, the image will be cut into several pieces to find a better threshold (See parameter N below). For example, 0.65 . |
N |
The image will be cut into N*N regions and the threshold for each region will be calculated. For example, 10 . |
First, the color of each bacteria cell will be identified according to the fluorescent image stacks of each channel.
[CodexRes] = ColorIdentify(SegmentedBWImage,FITCImage,TRITCImage,CY5Image,Index)
You can also use ColorIdentify_Bright
if you want to identify the color only by brightness of each channel.
[CodexRes] = ColorIdentify_Bright(SegmentedBWImage,FITCImage,TRITCImage,CY5Image,Index)
Then the obtained code will be identified according to the codebook and bits of correction.
[StrainLikehood,Decode] = StrainIndentify(StrainCode,CodexRes,CorrBit)
The following code could do the process faster.
[StrainLikehood,Decode] = StrainIndentifySim(StrainCode,CodexRes,CorrBit)
The segmented images can be labeled according to the decode result.
[StrainImageAll, StrainImage] = LabelImage(BW_Image_Segment, Decode)
Where StrainImageAll
is the labeled image and StrainImage
is the binary image of each strain. The unidentified strain is labeled as strainnum+1
.
PairCorrelation
calculates the correlation of the bacteria in the images based on the the Linear Dipole Algorithm (Daims, H. & Wagner, M., 2011).
[GR] = PairCorrelation(StrainImage1, StrainImage2, ScaleX, Range, varargin)
A mask can be used for the analysis by,
[GR] = PairCorrelation(StrainImage1, StrainImage2, ScaleX, Range, 'Mask',Maskimage)
You can calculate the pair association with the folloing code.
[...] = PairAssociation(CellInfoAll)
Here CellInfoAll
should be a cell
contains all cell informations on all roots (1:10 for control, 11:20 for perturbation 1 and 21:30 for perturbation 3). The cell info included the strain type and cell location on root.
ProcessStep1
, ProcessStep2
and ProcessStep3
give an example code about process multiple region images of root.
The Demo
of image anaysis is given in Demo_ImageAnalysis.m
Demo data is provided at Zenodo
This python script is used for submit probe and rRNA sequence to mathFISH to calculate ΔG.
You need to list the sequences of probes and rRNAs in separated txt
files.
For example, ProbeSequence.txt
1, ~sequence of the probe~
and rRNASeq.txt
1, ~sequence of 16S rRNA~, ~sequence of 23S rRNA~
Then run the script below to calculate the ΔG between each probe in ProbeSequence.txt
and each rRNA in rRNASeq.txt
. The result will be given in DeltaGoverall_*_List.txt
in the output folder.
python ./MAthFISH/MathFISH.py -p ./MathFISH/ProbeSequence.txt -r ./MathFISH/rRNASeq.txt -o ./MathFISH/Output
You can use the following script to generate the code book with correspounding colors, rounds and minimal Hamming Distance (HD).
[CodexMatrix] = CodeGenerator(Color,Round,HD);
[F1ScoreHarmMean,F1ScoreMean,Detection] = F1ScoreCal(Specify,SpecifySTD,StrainCode,CellNum,CorrBit)
The following packages are required for sequnecing data analysis.
conda install -c bioconda fastqc # fastqc
conda install -c bioconda samtools # samtools
conda install -c bioconda bowtie2 # bowtie2
conda install -c bioconda bedtools # bedtools
qiime2
is required for merge and parse the double end sequencing data according to the barcodes.
The following script requires qiime2
environment.
vsearch --fastq_mergepairs Path_to_the_forward.fastq.gz \
--reverse Path_to_the_forward.fastq.gz \
--fastqout ./Seuencing/merged.fq \
--fastqout_notmerged_fwd ./Seuencing/unmapped_1.fq \
--fastqout_notmerged_rev ./Seuencing/unmapped_2.fq \
1>vserach_log.txt \
2>>vserach_log.txt
python ./Seuencing/parse_sample_perfectmatch.py \
-i /Seuencing/merged.fq \
-t ./Seuencing/reference/sample_indexes.txt \
-f 20 -r 18
The barcodes for each set of data should be listed in sample_indexes.txt
in .scv
format.
We use Bowtie2
to align the sequencing result to reference sequence and extract the OTU table.
conda deactivate
bash /home/LDlab/BioSoft/Scripts/Bowtie2align_and_feature_extract.sh \
./Seuencing/parse \
./Seuencing/reference/strains_v5_v7.fa \
./Seuencing1/reference/strains_bed.bed
The reference sequence of each strain is listed in the .fa
file. And the sequence info is listed in the .bed
file.
To generate multiple colors in MATLAB
for labeling multiple species at once, you can use following function with desired number of colors, ColorNum
.
[ColorRGB] = ColorGenerator(ColorNum)
There is a processing bar in [##>---]
style displayed in Matlab command window when running for
loops during tracking.
The function DisplayBar
is used to display the processing bar. The number of #
on the screen indicates the percentage of the progress. The process bar is 80 characters in width. You can also use the process bar in all for
loops as below,
for Index = 1:Length
......
[Percentage, Barlength] = DisplayBar(Index, Length);
end
The bar would be shown in the command windows as below.