Skip to content

Latest commit

 

History

History
157 lines (135 loc) · 7.93 KB

TODO

File metadata and controls

157 lines (135 loc) · 7.93 KB

##-*- mode: org -*- Emacs org: use [Tab] “all the time” Priority: Higher-order bullet point “inherits” max. priority of all lower-order bullet points. 1 (top): Do before substantial writing starts 2: Do before submission 3: Do if there is enough time 4 = DONE

(2) NAMESPACE and “internal” functions:

Look at ./NAMESPACE and replace ALL the Imports() by ImportsFrom(.)

(2) What do we export / what not? <–> man/pcalg-internal.Rd

The concept of such “internal” functions really predates the use of NAMESPACEs and is now obsolete.

Many of these should be removed for the .Rd page and from NAMESPACE.

(2) The others will be kept, but “well” documented, i.e., in a different .Rd file :

(2,MK) rfci.vStruc renamed; TODO (MM)

allDags mentioned in other help pages !!

(2,MK) amat2dag() in ../tests/test_amat2dag.R

(3) Get rid of man/pcalg-internal.Rd entirely.

(2)Remarks on specific functions and issues:

(2) skeleton(): keep back compatibility update= c(“stable”,”original”) <<—MM

in code

(2, MM), evt (3) but MM’s tests show NOT QUITE back compatibility

pag2mag() return adj.matrix instead of object -> named pag2magAM()

whereas dag2pag() does return more than just an adj.matrix. Change BEFORE release

(2,MM) pc(), probably fci() etc: loses variable labels and works with “1”, “2”,…???

–> file:/u/maechler/R/MM/Pkg-ex/pcalg/Borsboom_mail.R is an example where he explicitly calls an other package just to get sensible labeled plots.

(2,MK) Graph Node Labels <==> rownames(A): Still a mess, sometimes using as.numeric(rownames(.)),

e.g. in my.SpecialDag() in R/pcalg.R

(3) fci() et al : sparse matrices [Matrix] ?

  • returns the adjacency matrix “@ amat” (among other things), a simple matrix with entries in {0,1,2,3}. It would be nice to allow sparse matrices here, e.g., by using the “Matrix” class from the Matrix package.

    This makes sense mostly if it’s realistic to have quite sparse and relatively large sets of variables.

(3) gAlgo-class: consider using setMethod(., “gAlgo”)

instead of all methods (plot, summary) for both pcAlgo and fciAlgo

myCItest() in Vignette vignettes/pcalgDoc.Rnw

instead of lm() twice, use lm.fit once (multivariate regression). This will probably be much faster.

(2,MK) ida(): argument ‘all.dags’ is never used, i.e., never tested.

dsepTest(): gibt 0 / 1 zurück; wieso nicht FALSE/TRUE wie dsep()? A:”P-value”

(3) I’ve introduced ‘max.chordal = 10’ to ‘backdoor()’

which was hidden in the code previously. Have you ever tried larger/smaller?

gSquareBin(), gSquareDis():

  • returns a P-value but not the test statistic. Should really return an object of (standard S3 class) “htest” (which contains P-value, test statistic, …). But they have been documented to do what they do, and so we keep them.

(2,MK) pc(*, verbose=TRUE) for a “large” example with 18 vars: much output;

and the 10 rules at the very end. Better: verbose in { 0,1,2 (, 3) } and verbose=1 should give much less than TRUE now

(1, MM) Package ‘ggm’ has topOrder() for topological ordering. Should allow to use it,

e.g., optionally (‘topOrder=FALSE’) in unifDAG(.) and randDAG() to fixup a non-top.ordered input.

(3) MM: ggm::topOrder() can be made faster

(1, MM): as(<fitted-object>, “amat”) gives a “amat”-object with ‘type’ and print() method.

see also ‘Adjacency Matrices’ in the “Internal Programming” part below.

(3) Parallelize option

(3) Allow ‘tiers’ (as in Tetrad), and ‘background knowledge’ (about orientations etc).

(2,MM) Robustness examples that pcAlgo() had explicitly: add examples using robust mcor()

2 of pc(), fci(), rfci(), fciPlus()


find.unsh.triple(): remove arg ‘p’

MK[2013-12-17]: added and documented function pdag2allDags; deprecated function allDags

(2) MK[2013-12-17]: Letztest Bsp in fci(): p-1 statt p im Argument, weil eine (latente) Var gelöscht wurde

Boost C++ library needed for Alain’s GIES

does need a correct ./configure, somewhat analogous to Martin’s Rmpfr.

Data: Mit Markus gesprochen (3.Sept.2010):

  • Die data/test_*.rda Daten welche nur in tests/ gebraucht werden, sollen nicht “exponiert” werden.
  • Die andern sind momentan “falsch” dokumentiert, sowohl formal (-> R CMD check), als auch inhaltlich {die Namen strings sind “dokumentiert”; sonst kein Inhalt}.
  • Auch hat es z.T. mehrere Objekte pro *.rda die inhaltlich zusammengehören; MM denkt, die sollten wohl zusammen in eine Liste (mit kurzem Namen!).

Wo gebraucht:

pcalg$ grep-r binaryData ./man/pc.Rd:data(binaryData) ./man/skeleton.Rd:data(binaryData) pcalg$ grep-r discreteData ./inst/doc/pcalgDoc.Snw:data set \code{discreteData} (which consists of $p=5$ discrete ./inst/doc/pcalgDoc.Snw:data(discreteData) ./man/pc.Rd:data(discreteData) ./man/skeleton.Rd:data(discreteData) pcalg$ grep-r gaussianData ./inst/doc/pcalgDoc.Snw: data(gaussianData) ./inst/doc/pcalgDoc.Snw:\code{gaussianData} (which consists of $p=8$ variables and $n=5000$ ./inst/doc/pcalgDoc.Snw:data(gaussianData) ./inst/doc/pcalgDoc.Snw:data(gaussianData) ./inst/doc/pcalgDoc.Snw:data(gaussianData) ./inst/doc/pcalgDoc.Snw:data(gaussianData) ## contains data matrix datG pcalg$ grep-r idaData ./inst/doc/pcalgDoc.Snw:data(idaData) pcalg$ grep-r latentVarGraph ./inst/doc/pcalgDoc.Snw:data(latentVarGraph)

## Here, you get their contents:

for(f in list.files(“~/R/Pkgs/pcalg/data/”, full.names=TRUE)) { bf <- basename(f) nam <- sub(“\.rda$”, ”, bf) cat(“\n”, bf, “:\n—————–\n”, sep=”“) attach(f) print(ls.str(pos=2)) detach() }

DESCRIPTION:

Depends: too many; I do not believe they are all “dependently” needed.

Remove more ‘Depends:’: vcd, RBGL.

  • vcd: I’ve eliminated it, and all the ‘R CMD check’ still run
  • RBGL: .. ok, only a few needed
  • graph: import quite a few; now found all examples – is this (not attaching ‘graph’) acceptable to the current pcalg users ?

: ‘abind’ and ‘corpcor’, as all packages now DO have a namespace!

  • abind: abind() needed in gSquareBin() & *Dis() – once only, each
  • corpcor: pseudoinverse() [ -> fast.svd() .. non-optimal “A %% diag(.)” ]

(3) MK[2014-07-08]: Change convention: If something strange happens with a test, KEEP edge

(3) MK[2014-07-08]: Make up some clever way to deal with NAs in the continuous case (phps also in other cases) and prepare test functions for users

(2,MM) Write a “doc” on Internal Programming - conventions and style guide

(2) am[which(am != 0)] is faster than am[am != 0]

for a sparse am of dim 50 x 50 ==> Using the more ugly notation is ok! — But this will most probably change in R >= 3.3.0

(3) Adjacency Matrices:

(3) We should use integer {internal} ==> A[..,] <- 1 is wrong, should be A[..] <- 1L, etc

(3) Our code should make sure that ad-matrices are integer and hence use 1L, 0L, etc !

Document the different kinds of ‘amat’ (adjacency matrices) that we use:

file:man/amatType.Rd (symmetric ? / lower- or upper-triangular ?) E.f. file:man/backdoor.Rd has “Coding of adjacency matrix” (0,1,2,3)

(2,MK) Graph Node Labels <==> rownames(A): Still a mess, sometimes using as.numeric(rownames(.)),

e.g. in my.SpecialDag() in R/pcalg.R

In these cases graph labels are lost and replaced by “1”, “2”, .. this is typically entirely wrong.

(2,MK) Help pages:

For function \arguments{.}, in \item{<name>}{<description>}, use lower case descr.

as in all practically all “base R” help pages.

(2,MK) Add test for lingam() (exits only for LINGAM() )