OligoMiner

Overview

This repository contains the code for the OligoMiner tool.

If you are looking to use probe sequences that we have already generated for various genome assemblies (hg19, hg38, mm9, mm10, dm3, dm6, ce6, ce11, danRer10, tair10), you can download those on our website. If you would like to run the OligoMiner tool yourself, please see below for instructions.

Installing OligoMiner dependencies

Make sure you have conda installed.
Clone this repo, then create and activate the provided environment:

$ git clone https://github.com/beliveau-lab/OligoMiner.git
$ cd OligoMiner
$ conda env create -f environment.yml
$ conda activate probeMining

This will install the following packages and their dependencies:

Note about operating systems

OligoMiner is a set of command-line scripts developed on Python 2.7 that can easily be executed from a Bash Shell. If you are using standard Linux or Mac OS X sytsems, we expect these instructions to work for you.

If you are using Windows 10, we recommend enabling Ubuntu on Windows 10, a full Linux distribution, and then running OligoMiner in the Ubuntu terminal.

Running OligoMiner locally

To make sure all of your dependencies are set up properly, below we will run you through the pipeline using some small example datasets.

Running scripts on the example files

To run the blockParse.py script on a .fa file, you can run the following command:
```
 python blockParse.py -f 3.fa
```
This produces a .fastq file (3.fastq) containing all identified probe sequences matching your provided criteria. To see additional command line arguments available for this script, you can run the python file with the -h argument (i.e. `python blockParse.py -h').
NGS alignment. For example, you can use Bowtie2 to align the newly generated set of candidate probes by running:
```
 bowtie2 -x /path_to_hg38_index/hg38 -U 3.fastq --no-hd -t -k 100 --very-sensitive-local -S 3_u.sam
```
or
```
 bowtie2 -x /path_to_hg38_index/hg38 -U 3.fastq --no-hd -t -k 2 --local -D 20 -R 3 -N 1 -L 20 -i C,4 --score-min G,1,4 -S 3.sam
```
... where "path_to_hg38_index" is replaced with the path to the bowtie2 indices for your genome of interest. These commands produce .sam files (3_u.sam and 3.sam) containing sequence alignment information, but require genome builds as described in the previous section. If you are just testing your scripts to make sure they are working properly, we have already provided the output 3_u.sam and 3.sam files in the example files directory for you to use to test subsequent scripts.
To process the .sam file produced by sequence alignment, use the outputClean.py script:
```
 python outputClean.py -u -f 3_u.sam
```
or, optionally (requires sklearn for the LDA model, see above)
```
 python outputClean.py -T 42 -f 3.sam
```
13 of 13 of the candidate probes should pass the first command (and 12 of 13 candidate probes should pass the specificity filtering with the 42C LDA model in the second command). To see additional command line arguments available for this script, you can run the python file with the -h argument (i.e. `python outputClean.py -h').
[Optional] Now, you can use kmerFilter.py to screen your probes against high abundance kmers (requires Jellyfish to be installed and in your path, and a Jellyfish dictionary, see instructions above).
```
 python kmerFilter.py -f 3_probes.bed -m 18 -j sp.jf -k 4
```
This command uses a Jellyfish dictionary containing information about high abundance kmers in the genome of interest to screen probes. (We have provided sp.jf as an example for you to test the python script, which should pass all 12 probes into the file 3_probes_18_4.bed . However, you will need to generate your own Jellyfish dictionary for your desired genome in the real case!) To see additional command line arguments available for this script, you can run the python file with the -h argument (i.e. `python kmerFilter.py -h').
To convert your probe set to their reverse complements, you can use the probeRC.py script:
```
 python probeRC.py -f 3_probes.bed
```
This creates a file, 3_probes_RC.bed containing the reverse complements of all sequences in the original .bed file. To see additional command line arguments available for this script, you can run the python file with the -h argument (i.e. `python probeRC.py -h').
[Optiona] You can check for secondary structures of probes by calling NUPACK using the structureCheck.py script:
```
 python structureCheck.py -f 3_probes.bed -t 0.4
```
This command should pass 6 of 12 example candidate probes. Additional information can be seen in the produced 3_probes_sC.bed file. To see additional command line arguments available for this script, you can run the python file with the -h argument (i.e. `python probeTm.py -h').
[Optional] To generate a list of melting temperatures for a given probe set, you can use theprobeTm.py script:
```
 python probeTm.py
```
or
```
 python probeTm.py -f 3.txt
```
The first command will allow you to enter a sequence interactively to retrieve its computed melting temperature. The second command takes a two column .txt file with the sequence in column 2 (tab delimited) and outputs a new file (3_tm.txt) with a 3rd column of Tms.

That's all! If you made it through these all without any errors thrown about missing dependencies or modules, you are all set to run OligoMiner on your own computer. Happy FISHing!

Notes on running OligoMiner on new genomes

You'll need to download your genome of interest in FASTA format and prepare index/dictionary files for your NGS aligner and optionally Jellyfish. We recommend using unmasked files for dictionary file construction and repeat-masked files as the input files for blockParse.py

Citation

Please cite according to the enclosed citation.bib:

@article{Beliveau2018,
        doi = {10.1073/pnas.1714530115},
        url = {https://doi.org/10.1073%2Fpnas.1714530115},
        year = 2018,
        month = {feb},
        publisher = {Proceedings of the National Academy of Sciences},
        volume = {115},
        number = {10},
        pages = {E2183--E2192},
        author = {Brian J. Beliveau and Jocelyn Y. Kishi and Guy Nir and Hiroshi M. Sasaki and Sinem K. Saka and Son C. Nguyen and Chao-ting Wu and Peng Yin},
        title = {{OligoMiner} provides a rapid, flexible environment for the design of genome-scale oligonucleotide in situ hybridization probes},
        journal = {Proceedings of the National Academy of Sciences}
}

Questions

Please reach out to Brian with any questions about installing and running the scripts, or open an issue on GitHub.

License

We provide this open source software without any warranty under the MIT license.

Contributing

We welcome commits from researchers who wish to improve our software. Please follow the git flow branching model. Make all changes to a topic branch off the branch dev. Merge the topic branch into dev first (preferably using --no-ff) and ensure everything works. Code will only merged into master for release builds. Hotfixes should be developed and tested in a separate branch off master, and a new release should be generated immediately after the hotfix is merged.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
ExampleFiles		ExampleFiles
ImageQuantification		ImageQuantification
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bedChainer.py		bedChainer.py
bedToFastq.py		bedToFastq.py
blockParse.py		blockParse.py
citation.bib		citation.bib
environment.yml		environment.yml
fastqToBed.py		fastqToBed.py
kmerFilter.py		kmerFilter.py
outputClean.py		outputClean.py
probeRC.py		probeRC.py
probeTm.py		probeTm.py
structureCheck.py		structureCheck.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OligoMiner

Overview

Installing OligoMiner dependencies

Note about operating systems

Running OligoMiner locally

Running scripts on the example files

Notes on running OligoMiner on new genomes

Citation

Questions

License

Contributing

About

Releases

Packages

Contributors 2

Languages

License

beliveau-lab/OligoMiner

Folders and files

Latest commit

History

Repository files navigation

OligoMiner

Overview

Installing OligoMiner dependencies

Note about operating systems

Running OligoMiner locally

Running scripts on the example files

Notes on running OligoMiner on new genomes

Citation

Questions

License

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages