Skip to content

Commit

Permalink
Merge branch 'main' into integration-test-development
Browse files Browse the repository at this point in the history
  • Loading branch information
only-lou committed Sep 19, 2022
2 parents 00c2f8e + fd12044 commit 0c0031b
Show file tree
Hide file tree
Showing 36 changed files with 2,796 additions and 2,147 deletions.
1 change: 1 addition & 0 deletions cfg/cabal.project.domain
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ with-compiler: ghc-9.2.4
-- datadir: ./data
-- docdir: ./doc
-- htmldir: ./doc/html
builddir: ./doc
symlink-bindir: ./bin
installdir: ./bin
logs-dir: ./log
Expand Down
25 changes: 25 additions & 0 deletions doc/Notes/ToDo.txt
Original file line number Diff line number Diff line change
Expand Up @@ -704,3 +704,28 @@ preOrderTreeTraversal
397) Add options to bias in favor of data, graph, or trajectory parallelism

399) Reverse/randomize order of move when repeating moves so not revisiting same edge move

400) Make Sure Sim Aneal stuff in new swap

401) IN swap--check that net penalty factor is correct for softwired and in differnt cost optimality criteria

402) Check wagner v POY seems longer PhyG

403) Clean up max distance thing --may set in types.h becasue of 2x distance

406) Add paralleism for tree median calculations

407) Add pre-order paralleism for tree and softwired/harwired

408) add paralleize bitpacking-- with option to turn off for memory saving

409) redo fuse

410) redo parallel add/delete/move

411) figure out issue with net move

412) Internal setting of dynamic epsilon
(also only for dynmaic/sequence characters)


17 changes: 7 additions & 10 deletions doc/PhyGraphUserManual/PhyG_Allcommands.tex
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,6 @@ \subsection{Read}
\item [tcm:] This refers to a file containing a custom-alphabet matrix that specifies varying
costs among alphabet elements in a sequence. The elements in the alphabet can be letters,
digits, or both. \\

The \texttt{tcm} contains two parts: the first line of the file contains the alphabet elements
separated by a space and the transformation cost matrix, which follows below. The dash
character representing an insertion/deletion or indel character is not specified on the first
Expand All @@ -328,13 +327,11 @@ \subsection{Read}
must be symmetrical, but not necessarily metric. Non-metric tcm's can yield unexpected
results. Transformation costs must be integers. If real values are desired, a character can
be weighted with a floating point value factor. \\

For a sequence with four elements alpha, beta, gamma and delta and an indel cost of 4
for all insertion deletion transformations, a valid custom alphabet file is provided below:

\\
\begin{equation}
\nolabel
\begin{equation*}
%\nolabel
\begin{array}{lllll}
alpha & beta & gamma & delta & \\
0 & 2 & 1 & 2 & 5 \\
Expand All @@ -343,8 +340,8 @@ \subsection{Read}
2 & 1 & 2 & 0 & 5 \\
5 & 5 & 5 & 5 & 0
\end{array}
\end{equation}

\end{equation*}
\\
In this example, the cost of transformation of \texttt{alpha} into \texttt{beta} is \texttt{2},
and cost of a deletion or insertion of any of the four elements costs \texttt{5}.

Expand Down Expand Up @@ -622,7 +619,7 @@ \subsection{Report}

\item[reconcile] Outputs a single ``reconciled'' graph from all graphs in memory. The
methods include consensus, supertree, and other supergraph methods as described in
\cite{Wheeler2012, Wheeler2021a}. When \texttt{reconcile} is specified as a command
\cite{Wheeler2012, Wheeler2022}. When \texttt{reconcile} is specified as a command
option a series of other options may be specified to tailor the desired outputs:
\begin{description}
\item {Method:eun$\mid$cun$\mid$majority$\mid$strict$\mid$Adams\\Default:eun\\
Expand Down Expand Up @@ -818,7 +815,7 @@ \subsection{Set}
rooting output trees, to specifying graph type, final assignment, and optimality
criterion. All \texttt{set} commands are executed at the start of a run, irrespective of
where they appear in the script. The command \texttt{transform} is used to modify
global settings during a run (see \texttt{transform} Section \ref{subsec:Transform}).
global settings during a run (see \texttt{transform} Section \ref{subsec:Transform}).}
\end{phygdescription}

\subsubsection{Arguments}
Expand Down Expand Up @@ -936,7 +933,7 @@ \subsection{Swap}
group of algorithms referred to as branch swapping, that proceed by clipping
parts of the given tree and attaching them in different positions. These algorithms
include ``NNI'' \citep{CaminandSokal1965, Robinson1971}, ``SPR'' \citep{Dayhoff1969},
and ``TBR'' \citep{Farris1988, swofford1990a} refinement.
and ``TBR'' \citep{Farris1988, swofford1990a} refinement.}
\end{phygdescription}

\subsubsection{Arguments}
Expand Down
12 changes: 6 additions & 6 deletions doc/PhyGraphUserManual/PhyGraphUserManual.tex
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@ \chapter{What is PhyG?}
Graphs can be input in the graphviz \href{https://graphviz.org/}{``dot''} format Newick (as
interpreted by Gary Olsen, linked \href{https://evolution.genetics.washington.edu/phylip/newick_doc.html}
{here}), Enhanced Newick \cite{Cardonaetal2008}, and Forest Enhanced Newick (defined by
\citealp{WheelerPhyloSuperGraphs}) formats.
\citealp{Wheeler2022}) formats.
Forest Enhanced Newick (FEN) is a format based on Enhanced Newick (ENewick) for
forests of components, each of which is represented by an ENewick string. The ENewick
components are surrounded by `$<$' and '$>$'. As in $<$(A, (B,C)); (D,(E,F));$>$.
Expand Down Expand Up @@ -393,7 +393,7 @@ \section{\phyg Command Structure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Command Reference}
\input{PhyG_allcommands.tex}
\input{PhyG_Allcommands.tex}

\section{Example Script Files}
The following file (titled ``Example Script 1'')reads two input sequence files (net-I.fas and net-II.fas),
Expand Down Expand Up @@ -427,9 +427,9 @@ \section*{Acknowledgments}
Kleberg foundation grant ``Mechanistic Analyses of Pancreatic Cancer Evolution'', and the American Museum
of Natural History for financial support.

\newpage
%\newpage
%\bibliography{big-refs-3.bib}
\bibliography{/Users/louise/DropboxAMNH/big-refs-3.bib}
%\bibliography{/home/ward/Dropbox/Work_stuff/manus/big-refs-3.bib}
%\bibliography{/Users/ward/Dropbox/Work_stuff/manus/big-refs-3.bib}
%\bibliography{/Users/louise/DropboxAMNH/big-refs-3.bib}
\bibliography{/home/ward/Dropbox/Work_stuff/manus/big-refs-3.bib}
%\bibliography{/users/ward/Dropbox/work_stuff/manus/big-refs-3.bib}
\end{document}
14 changes: 9 additions & 5 deletions pkg/PhyGraph/Commands/CommandExecution.hs
Original file line number Diff line number Diff line change
Expand Up @@ -534,10 +534,11 @@ setCommand argList globalSettings processedData inSeedList =
let localMethod
| (head optionList == "nopenalty") = NoNetworkPenalty
| (head optionList == "w15") = Wheeler2015Network
| (head optionList == "w23") = Wheeler2023Network
| (head optionList == "pmdl") = PMDLGraph
| otherwise = errorWithoutStackTrace ("Error in 'set' command. GraphFactor '" ++ (head optionList) ++ "' is not 'NoPenalty', 'W15', or 'PMDL'")
| otherwise = errorWithoutStackTrace ("Error in 'set' command. GraphFactor '" ++ (head optionList) ++ "' is not 'NoPenalty', 'W15', 'W23', or 'PMDL'")
in
trace ("GraphFactor set to " ++ head optionList)
trace ("GraphFactor set to " ++ (show localMethod))
(globalSettings {graphFactor = localMethod}, processedData, inSeedList)

else if head commandList == "graphtype" then
Expand Down Expand Up @@ -660,6 +661,8 @@ reportCommand globalSettings argList numInputFiles crossReferenceString processe
-- need to specify -O option for multiple graphs
let inputDisplayVVList = fmap fth6 curGraphs
costList = fmap snd6 curGraphs
displayCostListList = fmap GO.getDisplayTreeCostList curGraphs
displayInfoString = ("DisplayTree costs : " ++ (show (fmap sum $ fmap fst displayCostListList, displayCostListList)))
treeIndexStringList = fmap ((++ "\n") . ("Canonical Tree " ++)) (fmap show [0..(length inputDisplayVVList - 1)])
canonicalGraphPairList = zip treeIndexStringList inputDisplayVVList
blockStringList = concatMap (++ "\n") (fmap (outputBlockTrees commandList costList (outgroupIndex globalSettings)) canonicalGraphPairList)
Expand All @@ -669,7 +672,8 @@ reportCommand globalSettings argList numInputFiles crossReferenceString processe
trace ("No soft-wired graphs to report display trees")
("No soft-wired graphs to report display trees", outfileName, writeMode)
else
(blockStringList, outfileName, writeMode)
(displayInfoString ++ "\n" ++ blockStringList, outfileName, writeMode)


else if "graphs" `elem` commandList then
--else if (not .null) (L.intersect ["graphs", "newick", "dot", "dotpdf"] commandList) then
Expand Down Expand Up @@ -872,15 +876,15 @@ outputGraphStringSimple commandList lOutgroupIndex graphList costList
-- need to specify -O option for multiple graph(outgroupIndex globalSettings)s
makeDotList :: [VertexCost] -> Int -> [SimpleGraph] -> String
makeDotList costList rootIndex graphList =
let graphStringList = fmap fgl2DotString $ fmap (GO.rerootTree rootIndex) graphList
let graphStringList = fmap fgl2DotString $ fmap (LG.rerootTree rootIndex) graphList
costStringList = fmap ("\n//" ++) $ fmap show costList
in
L.intercalate "\n" (zipWith (++) graphStringList costStringList)

-- | makeAsciiList takes a list of fgl trees and outputs a single String cointaining the graphs in ascii format
makeAsciiList :: Int -> [SimpleGraph] -> String
makeAsciiList rootIndex graphList =
concatMap LG.prettify (fmap (GO.rerootTree rootIndex) graphList)
concatMap LG.prettify (fmap (LG.rerootTree rootIndex) graphList)

{- Older version wiht more data dependenncy
-- | getDataListList returns a list of lists of Strings for data output as csv
Expand Down
64 changes: 52 additions & 12 deletions pkg/PhyGraph/Commands/Transform.hs
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,9 @@ transform inArgs inGS origData inData rSeed inGraphList =
atRandom = any ((=="atrandom").fst) lcArgList
chooseFirst = any ((=="first").fst) lcArgList
reWeight = any ((=="weight").fst) lcArgList

changeEpsilon = any ((=="dynamicepsilon").fst) lcArgList
reRoot = any ((=="outgroup").fst) lcArgList

reweightBlock = filter ((=="weight").fst) lcArgList
weightValue
| length reweightBlock > 1 =
Expand All @@ -107,6 +109,24 @@ transform inArgs inGS origData inData rSeed inGraphList =
| null (snd $ head reweightBlock) = Just 1
| otherwise = readMaybe (snd $ head reweightBlock) :: Maybe Double


changeEpsilonBlock = filter ((=="dynamicepsilon").fst) lcArgList
epsilonValue
| length changeEpsilonBlock > 1 =
errorWithoutStackTrace ("Multiple dynamicEpsilon specifications in tansform--can have only one: " ++ show inArgs)
| null changeEpsilonBlock = Just $ dynamicEpsilon inGS
| null (snd $ head changeEpsilonBlock) = Just $ dynamicEpsilon inGS
| otherwise = readMaybe (snd $ head changeEpsilonBlock) :: Maybe Double

reRootBlock = filter ((=="outgroup").fst) lcArgList
outgroupValue
| length reRootBlock > 1 =
errorWithoutStackTrace ("Multiple outgroup specifications in tansform--can have only one: " ++ show inArgs)
| null reRootBlock = Just $ outGroupName inGS
| null (snd $ head reRootBlock) = Just $ outGroupName inGS
| otherwise = readMaybe (snd $ head reRootBlock) :: Maybe TL.Text


nameList = fmap TL.pack $ fmap (filter (/= '"')) $ fmap snd $ filter ((=="name").fst) lcArgList
charTypeList = fmap snd $ filter ((=="type").fst) lcArgList

Expand All @@ -132,14 +152,14 @@ transform inArgs inGS origData inData rSeed inGraphList =
let newGS = inGS {graphType = Tree}

-- generate and return display trees-- displayTreNUm / graph
displayGraphList = if chooseFirst then fmap (take (fromJust numDisplayTrees) . GO.generateDisplayTrees) (fmap fst6 inGraphList)
else fmap (GO.generateDisplayTreesRandom rSeed (fromJust numDisplayTrees)) (fmap fst6 inGraphList)
displayGraphList = if chooseFirst then fmap (take (fromJust numDisplayTrees) . LG.generateDisplayTrees) (fmap fst6 inGraphList)
else fmap (LG.generateDisplayTreesRandom rSeed (fromJust numDisplayTrees)) (fmap fst6 inGraphList)

-- prob not required
displayGraphs = fmap GO.ladderizeGraph $ fmap GO.renameSimpleGraphNodes (concat displayGraphList)

-- reoptimize as Trees
newPhylogeneticGraphList = fmap (T.multiTraverseFullyLabelGraph newGS inData pruneEdges warnPruneEdges startVertex) displayGraphs `using` PU.myParListChunkRDS
newPhylogeneticGraphList = PU.seqParMap rdeepseq (T.multiTraverseFullyLabelGraph newGS inData pruneEdges warnPruneEdges startVertex) displayGraphs -- `using` PU.myParListChunkRDS
in
(newGS, origData, inData, newPhylogeneticGraphList)

Expand All @@ -148,7 +168,7 @@ transform inArgs inGS origData inData rSeed inGraphList =
if (graphType inGS == SoftWired) then (inGS, origData, inData, inGraphList)
else
let newGS = inGS {graphType = SoftWired}
newPhylogeneticGraphList = fmap (T.multiTraverseFullyLabelGraph newGS inData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) `using` PU.myParListChunkRDS
newPhylogeneticGraphList = PU.seqParMap rdeepseq (T.multiTraverseFullyLabelGraph newGS inData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) -- `using` PU.myParListChunkRDS
in
(newGS, origData, inData, newPhylogeneticGraphList)

Expand All @@ -158,21 +178,21 @@ transform inArgs inGS origData inData rSeed inGraphList =
else
let newGS = inGS {graphType = HardWired}

newPhylogeneticGraphList = fmap (T.multiTraverseFullyLabelGraph newGS inData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) `using` PU.myParListChunkRDS
newPhylogeneticGraphList = PU.seqParMap rdeepseq (T.multiTraverseFullyLabelGraph newGS inData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) -- `using` PU.myParListChunkRDS
in
(newGS, origData, inData, newPhylogeneticGraphList)

-- roll back to dynamic data from static approx
else if toDynamic then
let newPhylogeneticGraphList = fmap (T.multiTraverseFullyLabelGraph inGS origData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) `using` PU.myParListChunkRDS
let newPhylogeneticGraphList = PU.seqParMap rdeepseq (T.multiTraverseFullyLabelGraph inGS origData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) -- `using` PU.myParListChunkRDS
in
trace ("Transforming data to dynamic: " ++ (show $ minimum $ fmap snd6 inGraphList) ++ " -> " ++ (show $ minimum $ fmap snd6 newPhylogeneticGraphList))
(inGS, origData, origData, newPhylogeneticGraphList)

-- transform to static approx--using first Tree
else if toStaticApprox then
let newData = makeStaticApprox inGS inData (head $ L.sortOn snd6 inGraphList)
newPhylogeneticGraphList = fmap (T.multiTraverseFullyLabelGraph inGS newData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) `using` PU.myParListChunkRDS
newPhylogeneticGraphList = PU.seqParMap rdeepseq (T.multiTraverseFullyLabelGraph inGS newData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) -- `using` PU.myParListChunkRDS

in
trace ("Transforming data to staticApprox: " ++ (show $ minimum $ fmap snd6 inGraphList) ++ " -> " ++ (show $ minimum $ fmap snd6 newPhylogeneticGraphList))
Expand All @@ -183,12 +203,32 @@ transform inArgs inGS origData inData rSeed inGraphList =
else if reWeight then
let newOrigData = reWeightData (fromJust weightValue) charTypeList nameList origData
newData = reWeightData (fromJust weightValue) charTypeList nameList inData
newPhylogeneticGraphList = fmap (T.multiTraverseFullyLabelGraph inGS newData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) `using` PU.myParListChunkRDS
newPhylogeneticGraphList = PU.seqParMap rdeepseq (T.multiTraverseFullyLabelGraph inGS newData pruneEdges warnPruneEdges startVertex) (fmap fst6 inGraphList) -- `using` PU.myParListChunkRDS
in
trace ("Reweighting types " ++ (show charTypeList) ++ " and/or characters " ++ (L.intercalate ", " $ fmap TL.unpack nameList) ++ " to " ++ (show $ fromJust weightValue)
if isNothing weightValue then errorWithoutStackTrace ("Reweight value is not specified correcty. Must be a double (e.g. 1.2): " ++ (show (snd $ head reweightBlock)))
else
trace ("Reweighting types " ++ (show charTypeList) ++ " and/or characters " ++ (L.intercalate ", " $ fmap TL.unpack nameList) ++ " to " ++ (show $ fromJust weightValue)
++ "\n\tReoptimizing graphs")
(inGS, newOrigData, newData, newPhylogeneticGraphList)
(inGS, newOrigData, newData, newPhylogeneticGraphList)

-- changes dynamicEpsilon error check factor
else if changeEpsilon then
if isNothing epsilonValue then errorWithoutStackTrace ("DynamicEpsilon value is not specified correcty. Must be a double (e.g. 0.02): " ++ (show (snd $ head changeEpsilonBlock)))
else
trace ("Changing dynamicEpsilon factor to " ++ (show $ fromJust epsilonValue))
(inGS {dynamicEpsilon = fromJust epsilonValue}, origData, inData, inGraphList)

else if reRoot then
if isNothing outgroupValue then errorWithoutStackTrace ("Outgroup is not specified correctly. Must be a string (e.g. \"Name\"): " ++ (snd $ head reRootBlock))
else
let newOutgroupName = TL.filter (/= '"') $ fromJust outgroupValue
newOutgroupIndex = V.elemIndex newOutgroupName (fst3 origData)
newPhylogeneticGraphList = PU.seqParMap rdeepseq (T.multiTraverseFullyLabelGraph inGS origData pruneEdges warnPruneEdges startVertex) (fmap (LG.rerootTree (fromJust newOutgroupIndex)) $ fmap fst6 inGraphList)
in
if isNothing newOutgroupIndex then errorWithoutStackTrace ("Outgoup name not found: " ++ (snd $ head reRootBlock))
else
trace ("Changing outgroup to " ++ (TL.unpack newOutgroupName))
(inGS {outgroupIndex = fromJust newOutgroupIndex, outGroupName = newOutgroupName}, origData, inData, newPhylogeneticGraphList)


else error ("Transform type not implemented/recognized" ++ (show inArgs))
Expand Down Expand Up @@ -263,7 +303,7 @@ makeStaticApprox inGS inData inGraph =
(nameV, nameBVV, blockDataV) = inData

-- do each block in turn pulling and transforming data from inGraph
newBlockDataV = fmap (pullGraphBlockDataAndTransform decGraph inData) [0..(length blockDataV - 1)] `using` PU.myParListChunkRDS
newBlockDataV = PU.seqParMap rdeepseq (pullGraphBlockDataAndTransform decGraph inData) [0..(length blockDataV - 1)] -- `using` PU.myParListChunkRDS

-- convert prealigned to non-additive if all 1's tcm

Expand Down
Loading

0 comments on commit 0c0031b

Please sign in to comment.