-
Notifications
You must be signed in to change notification settings - Fork 53
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Showing
450 changed files
with
63,355 additions
and
9,365 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,38 @@ | ||
Expansion Hunter: a tool for estimating repeat sizes | ||
---------------------------------------------------- | ||
# Expansion Hunter: a tool for estimating repeat sizes | ||
|
||
There are a number of regions in the human genome consisting of repetitions of short unit sequence (commonly a trimer). | ||
Such repeat regions can expand to a size much larger than the read length and thereby cause a disease. | ||
There are a number of regions in the human genome consisting of repetitions of | ||
short unit sequence (commonly a trimer). Such repeat regions can expand to a | ||
size much larger than the read length and thereby cause a disease. | ||
[Fragile X Syndrome](https://en.wikipedia.org/wiki/Fragile_X_syndrome), | ||
[ALS](https://en.wikipedia.org/wiki/Amyotrophic_lateral_sclerosis), and | ||
[Huntington's Disease](https://en.wikipedia.org/wiki/Huntington%27s_disease) are well known examples. | ||
[Huntington's Disease](https://en.wikipedia.org/wiki/Huntington%27s_disease) | ||
are well known examples. | ||
|
||
Expansion Hunter aims to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for | ||
reads that span, flank, and are fully contained in each repeat. | ||
Expansion Hunter aims to estimate sizes of such repeats by performing a targeted | ||
search through a BAM/CRAM file for reads that span, flank, and are fully | ||
contained in each repeat. | ||
|
||
Linux and macOS operating systems are currently supported. | ||
|
||
License | ||
------- | ||
|
||
Expansion Hunter is provided under the terms and conditions of the [GPLv3 license](LICENSE.txt). It relies on several | ||
third party packages provided under other open source licenses, please see [COPYRIGHT.txt](COPYRIGHT.txt) for additional | ||
details. | ||
## License | ||
|
||
Documentation | ||
------------- | ||
Expansion Hunter is provided under the terms and conditions of the | ||
[GPLv3 license](LICENSE.txt). It relies on several third party packages provided | ||
under other open source licenses, please see [COPYRIGHT.txt](COPYRIGHT.txt) for | ||
additional details. | ||
|
||
Installation instructions, usage guide, and description of file formats are contained in the [docs folder](docs/01_Introduction.md). | ||
|
||
## Documentation | ||
|
||
Method | ||
------ | ||
Installation instructions, usage guide, and description of file formats are | ||
contained in the [docs folder](docs/01_Introduction.md). | ||
|
||
|
||
## Method | ||
|
||
The detailed description of the method can be found here: | ||
|
||
Dolzhenko et al., [Detection of long repeat expansions from PCR-free whole-genome sequence | ||
data](http://genome.cshlp.org/content/27/11/1895), Genome Research 2017 | ||
Dolzhenko and others, [Detection of long repeat expansions from PCR-free | ||
whole-genome sequence data](http://genome.cshlp.org/content/27/11/1895), Genome | ||
Research 2017 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
// | ||
// Expansion Hunter | ||
// Copyright (c) 2018 Illumina, Inc. | ||
// | ||
// Author: Egor Dolzhenko <edolzhenko@illumina.com> | ||
// | ||
// This program is free software: you can redistribute it and/or modify | ||
// it under the terms of the GNU General Public License as published by | ||
// the Free Software Foundation, either version 3 of the License, or | ||
// at your option) any later version. | ||
// | ||
// This program is distributed in the hope that it will be useful, | ||
// but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
// GNU General Public License for more details. | ||
// | ||
// You should have received a copy of the GNU General Public License | ||
// along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
// | ||
|
||
#include "alignment/AlignmentFilters.hh" | ||
|
||
#include <list> | ||
#include <vector> | ||
|
||
#include "graphalign/GaplessAligner.hh" | ||
#include "graphalign/GraphAlignmentOperations.hh" | ||
#include "graphalign/LinearAlignmentOperations.hh" | ||
#include "graphcore/PathOperations.hh" | ||
|
||
#include "alignment/GraphAlignmentOperations.hh" | ||
|
||
using graphtools::GraphAlignment; | ||
using graphtools::NodeId; | ||
using graphtools::Path; | ||
using std::list; | ||
using std::string; | ||
using std::vector; | ||
|
||
namespace ehunter | ||
{ | ||
|
||
bool checkIfLocallyPlacedReadPair( | ||
boost::optional<GraphAlignment> readAlignment, boost::optional<GraphAlignment> mateAlignment, | ||
int kMinNonRepeatAlignmentScore) | ||
{ | ||
int nonRepeatAlignmentScore = 0; | ||
|
||
if (readAlignment) | ||
{ | ||
nonRepeatAlignmentScore += scoreAlignmentToNonloopNodes(*readAlignment); | ||
} | ||
|
||
if (mateAlignment) | ||
{ | ||
nonRepeatAlignmentScore += scoreAlignmentToNonloopNodes(*mateAlignment); | ||
} | ||
|
||
if (nonRepeatAlignmentScore < kMinNonRepeatAlignmentScore) | ||
{ | ||
return false; | ||
} | ||
|
||
return true; | ||
} | ||
|
||
bool checkIfUpstreamAlignmentIsGood(NodeId nodeId, GraphAlignment alignment) | ||
{ | ||
const list<int> repeatNodeIndexes = alignment.getIndexesOfNode(nodeId); | ||
|
||
if (repeatNodeIndexes.empty()) | ||
{ | ||
return false; | ||
} | ||
|
||
const int firstRepeatNodeIndex = repeatNodeIndexes.front(); | ||
int score = 0; | ||
LinearAlignmentParameters parameters; | ||
for (int nodeIndex = 0; nodeIndex != firstRepeatNodeIndex; ++nodeIndex) | ||
{ | ||
score += scoreAlignment( | ||
alignment[nodeIndex], parameters.matchScore, parameters.mismatchScore, parameters.gapOpenScore); | ||
} | ||
|
||
const int kScoreCutoff = parameters.matchScore * 8; | ||
|
||
return score >= kScoreCutoff; | ||
} | ||
|
||
bool checkIfDownstreamAlignmentIsGood(NodeId nodeId, GraphAlignment alignment) | ||
{ | ||
const list<int> repeatNodeIndexes = alignment.getIndexesOfNode(nodeId); | ||
|
||
if (repeatNodeIndexes.empty()) | ||
{ | ||
return false; | ||
} | ||
|
||
const int lastRepeatNodeIndex = repeatNodeIndexes.back(); | ||
int score = 0; | ||
LinearAlignmentParameters parameters; | ||
for (int nodeIndex = lastRepeatNodeIndex + 1; nodeIndex != static_cast<int>(alignment.size()); ++nodeIndex) | ||
{ | ||
score += scoreAlignment( | ||
alignment[nodeIndex], parameters.matchScore, parameters.mismatchScore, parameters.gapOpenScore); | ||
} | ||
|
||
const int kScoreCutoff = parameters.matchScore * 8; | ||
|
||
return score >= kScoreCutoff; | ||
} | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
// | ||
// Expansion Hunter | ||
// Copyright (c) 2018 Illumina, Inc. | ||
// | ||
// Author: Egor Dolzhenko <edolzhenko@illumina.com> | ||
// | ||
// This program is free software: you can redistribute it and/or modify | ||
// it under the terms of the GNU General Public License as published by | ||
// the Free Software Foundation, either version 3 of the License, or | ||
// at your option) any later version. | ||
// | ||
// This program is distributed in the hope that it will be useful, | ||
// but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
// GNU General Public License for more details. | ||
// | ||
// You should have received a copy of the GNU General Public License | ||
// along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
// | ||
|
||
#pragma once | ||
|
||
#include <string> | ||
|
||
#include <boost/optional.hpp> | ||
|
||
#include "graphalign/GraphAlignment.hh" | ||
|
||
namespace ehunter | ||
{ | ||
|
||
/** | ||
* Checks if a read pair is likely to have originated in the alignment region | ||
* | ||
* The check is performed by verifying that the alignment score to non-repeat nodes (combined for both mates) is | ||
* sufficiently high. | ||
* | ||
* @param readAlignment: Alignment of a read | ||
* @param mateAlignment: Alignment of read's mate | ||
* @param kMinNonRepeatAlignmentScore: Score threshold | ||
* @return true if the alignment score to non-repeat nodes exceeds the threshold | ||
*/ | ||
bool checkIfLocallyPlacedReadPair( | ||
boost::optional<graphtools::GraphAlignment> readAlignment, | ||
boost::optional<graphtools::GraphAlignment> mateAlignment, int kMinNonRepeatAlignmentScore); | ||
|
||
// Checks if alignment upstream of a given node is high quality | ||
bool checkIfUpstreamAlignmentIsGood(graphtools::NodeId nodeId, graphtools::GraphAlignment alignment); | ||
|
||
// Checks if alignment downstream of a given node is high quality | ||
bool checkIfDownstreamAlignmentIsGood(graphtools::NodeId nodeId, graphtools::GraphAlignment alignment); | ||
|
||
} |
Oops, something went wrong.