Marleen1/OGRE
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
master
Could not load branches
Nothing to show
Could not load tags
Nothing to show
{{ refName }}
default
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code
-
Clone
Use Git or checkout with SVN using the web URL.
Work fast with our official CLI. Learn more about the CLI.
- Open with GitHub Desktop
- Download ZIP
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
OGRE is a read clustering tool that clusters short reads from a metagenomic dataset using an overlap graph. Note that OGRE is not fully developed and optimized software. However the code is available and usable for read clustering. --------------- Installation --------------- 1. Download OGRE: https://github.com/Marleen1/OGRE 2. Install the following dependencies: - Python 3 with packages numpy, scikit-learn and scipy - Snakemake (https://snakemake.readthedocs.io/en/stable/) - Minimap2 (https://github.com/lh3/minimap2) -------------------- Setting the inputs -------------------- The header of the file "Snakefile" contains the required input information. The following need to be specified in the file header: - folder: the location of the read data. - fqfile: the filename of the fastq file containing the reads that need to be clustered. - limits: the desired maximum cluster size, given as a string. For example a maximum cluster size of 5000 should be written as '5000'. A list of cluster sizes can be provided as well, OGRE will then output multiple clusterings for various maximum cluster sizes. For example: limits = ['5000','10000','inf']. - numsplitlines: OGRE starts by splitting up the original fastq file into smaller files. This is the number of lines in each small fastq file. If you have a large amount of RAM, you can set this value high. On a system with 125 GB RAM we used numsplitlines = 8000000. - train_data: the training dataset. A training dataset is provided with OGRE. In order to use this dataset, choose train_data = 'traindata_CAMI23toy23_80000.txt' - lines_per_file: OGRE creates an overlap file, which represents the overlap graph. This file is later split into smaller files and processed. If you have a large amount of RAM, you can choose a high value here. we chose lines_per_file = 2500000 on a computer with 125 GB RAM. -------------- Running OGRE -------------- OGRE is run by typing snakemake -j xx in the command line, where xx is the number of threads it can use. -------------- Running Mash -------------- Please type sh mash.sh -h to see the parameters needed to run Mash on the data.
About
Overlap graph-based read assembly
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published