Find file
Fetching contributors…
Cannot retrieve contributors at this time
920 lines (696 sloc) 30.3 KB

Time-stamp: <2013-01-22 19:33:50 tony> Creation: <2008-09-08 08:06:30 tony>

Intro and Metadata

File: TODO.lisp Author: AJ Rossini <> Copyright: (c) 2007-2012, AJ Rossini <>. MIT license. Purpose: Stuff that needs to be made working sits inside the Task sections.

This file contains the current challenges to solve, including a description of the setup and the work to solve. Solutions welcome.

What is this talk of ‘release’? Klingons do not make software ‘releases’. Our software ‘escapes’, leaving a bloody trail of designers and quality assurance people in its wake.

Approach and Design, Strategy and Tactics

This chapter describes the general philosophy along with the thinking behind some of the selections which might appear a bit on the strange side.


Please develop possible high level API code in examples subdirectory – when we get it “reasonable”, migrate into appropriate core code directories in the src subdirectory.


Collection of packages. Include as appropriate, so that we end up with a coherent collection as needed.

(Internal) Package and (External) System Hierarchy

This section provides some of the details regarding infrastructure of the system.

current (possibly incomplete) set of lisp dependencies <2012-10-04 Thu>



Singletons (primary building blocks)

These are packages as well as

asdfcommon system loader
xarraycommon access structure to array-like
(matrix, vector) structures.
cls-configinitialization of Lisp state, variables, etc,
localization to the particular lisp.
cffiforiegn function library

as of now, all are in QL heirarchy

Dependency structure

lisp-matrixgeneral purpose matrix package, linking to lapack
for numerics. Depends on:
cls-dataframein the same spirit as lisp-matrix, a means to
create tables. Perhaps better called datatables?
cls-probabilitydepends on gsll, cl-variates, cl-? initially,

Need to integrate

Random number streams and probability calculus

Something for random numbers, that has a settable seed. However, we could pass on the “right thing” in favor of something that will “work for now”. cl-randist

import of data in text files

CSV and similar specialized imports



?? cl-2d : cl-cairo2 : cffi

?? cl-plplot : cffi

Tasks to Do [3/12]

Usually, we need to load it everything before going on.

(ql:quickload :cls)

and sometimes we might want to recompile fully:

(asdf:oos 'asdf:compile-op :cls :force T)

Currently <2012-10-10 Wed> QuickLisp support doesn’t provide a recompilation facility. And QL is built over and partially extends ASDF, so we should be fine for now.

Another activity, to check current state of the unit tests, is to run the tests. This can be done by:

(in-package :lisp-stat-unittests)
(run-tests :suite 'lisp-stat-ut)

[#A] Agree on a releasable license (MIT, LLGPL, BSD, GPL, ?): MIT

  • CLOSING NOTE [2012-11-13 Tue 13:34]
    MIT for now, unless someone really insists on Boost license.

Decision: MIT

[#B] Get rid of rsm-string-cls

  • CLOSING NOTE [2012-11-24 Sat 16:51]
    replaced with fare-csv.

[#A] ASDF updating for both v1 and v2 of ASDF.

Need to update as appropriate, to use modern features.

[#B] Example of Custom Data analysis set up

  • State “DONE” from “CURR” [2010-10-12 Tue 13:48]
    setup is mostly complete
  • State “CURR” from “TODO” [2010-10-12 Tue 13:47]
  • State “TODO” from “” [2010-10-12 Tue 13:47]

This is an example of a custom setup, not really interesting at this point (it will hopefully be obsolete by the first release) except to remind Tony how to program. Pointy-headed managers need any support they can find in order to regress to their hacker-childhood.

The only point of this section is to illustrate that we could want to load additional modules that are not a central part of the core files.

<2012-10-29 Mon> Perhaps another point is to demonstrate that we might not want certain functionality, or would like to replace with different functionality.

;; always ensure we are in the right package to leave droppings and access functionality
(in-package :cl-user) 
  (defun init-CLS (&key (compile 'nil))
    (let ((packagesToLoad (list ;; core system
                                :lift :lisp-matrix :cls
                                ;; visualization
                                ;; :cl-cairo2-x11 :iterate
                                ;; doc reporting
                                :cl-pdf :cl-typesetting
                                :asdf-system-connections :xarray
                                :metatilities-base :anaphora :tinaa
                                :cl-ppcre :cl-markdown :docudown
                                ;; version and validate CLOS objects
                                ;; :versioned-objects :validations
                                ;; :cl-opengl
                                ;; :cl-glu :cl-glut :cl-glut-examples

                                ;; :cells :cells-gtk
      (mapcar #'(lambda (x)
                  (if compile
                      (asdf:oos 'asdf:compile-op x :force T)
                      (asdf:oos 'asdf:load-op x)))
  ;; (init-CLS :compile T) vs:

[#A] Integrate with quicklist support.

important to merge with quicklisp system loader support. We currently have some of this work integrated, but I think there are a few systems which are not auto-installable.

[#B] Determine which packages still need to be in QuickLisp

Currently, probably need my versions of files, or will need to preface them as needed. As we can afford at most 2 more renames, probably have something like cls-cl-XXXX for packages which have API conflicts, and then if we rename the system, something like NAME-RANDOM, NAME-CORE, NAME-MATRIX, etc… as needed.

[#A] Testing: unit, regression, examples. [0/3]

  • State “CURR” from “TODO” [2010-10-12 Tue 13:51]
  • State “TODO” from “” [2010-10-12 Tue 13:51]

Testing consists of unit tests, which internally verify subsets of code, regression tests, and functional tests (in increasing order of scale).

[#B] Unit tests

  • State “CURR” from “TODO” [2010-11-04 Thu 18:33]
  • State “CURR” from “TODO” [2010-10-12 Tue 13:48]
  • State “TODO” from “” [2010-10-12 Tue 13:48]

Unit tests have been started using LIFT. Need to consider some of the other systems that provide testing, when people add them to the mix of libraries that we need, along with examples of how to use.

(in-package :lisp-stat-unittests)
(run-tests :suite 'lisp-stat-ut)

Before we removed the internal legacy lispstat probability code, we had:

;; => tests = 78, failures = 7, errors = 20

The following needs to be solved in order to have a decent installation qualification (IQ) and performance qualification (PQ). It currently fails on approach.

(in-package :lisp-stat-unittests)
(asdf:oos 'asdf:test-op 'cls) ; (describe (run-tests :suite 'lisp-stat-ut))

and check documentation to see if it is useful.

(in-package :lisp-stat-unittests)
(describe 'lisp-stat-ut)
(documentation 'lisp-stat-ut 'type)

;; FIXME: Example: currently not relevant, yet
;;   (describe (lift::run-test :test-case  'lisp-stat-unittests::create-proto
;;                             :suite 'lisp-stat-unittests::lisp-stat-ut-proto))

(describe (lift::run-tests :suite 'lisp-stat-ut-dataframe))
;; => Test Report for LISP-STAT-UT-DATAFRAME: 11 tests run, 5 Errors.

(lift::run-tests :suite 'lisp-stat-ut-dataframe)

;;; The following barfs, doesn't like test-case keyword
;; (describe (lift::run-test
;;             :test-case  'lisp-stat-unittests::create-proto
;;             :suite 'lisp-stat-unittests::lisp-stat-ut-proto))

[#B] Regression Tests

  • State “TODO” from “” [2010-10-12 Tue 13:54]

By regression tests, we refer to tests which focus on probing a range of high level interactions. The test skeleton should focus on managing complex interactions which are reasonable.

[#B] Functional Tests

  • State “TODO” from “” [2010-10-12 Tue 13:54]

[#B] Functional Examples that need to work [1/3]

  • State “CURR” from “TODO” [2010-11-30 Tue 17:57]
  • State “TODO” from “” [2010-10-12 Tue 13:55]

These examples should be functional forms within CLS, describing working functionality which is needed for work.

[#A] Dataframe creation

Illustration via a file, that we need to get working so that we can get data in-and-out of CLS structures.

;;; -*- mode: lisp -*-
;;; Copyright (c) 2006-2012, by A.J. Rossini <>
;;; See COPYRIGHT file for any additional restrictions (BSD license).
;;; Since 1991, ANSI was finally finished.  Edited for ANSI Common Lisp. 

;;; Time-stamp: <2012-10-04 02:16:45 tony>
;;; Creation:   <2012-07-01 11:29:42 tony>
;;; File:       example.lisp
;;; Author:     AJ Rossini <>
;;; Copyright:  (c) 2012, AJ Rossini.  BSD.
;;; Purpose:    example of possible usage.

;;; What is this talk of 'release'? Klingons do not make software
;;; 'releases'.  Our software 'escapes', leaving a bloody trail of
;;; designers and quality assurance people in its wake.

;; Load system
(ql:quickload "cls")

;; use the example package...
(in-package :cls-user)

;; or better yet, create a package/namespace for the particular problem being attacked.
(defpackage :my-package-user
  (:documentation "demo of how to put serious work should be placed in
    a similar package elsewhere for reproducibility.  This hints as to
    what needs to be done for a user- or analysis-package.")
  (:nicknames :my-clswork-user)
  (:use :common-lisp ; always needed for user playgrounds!
        :lisp-matrix ; we only need the packages that we need...
        :lisp-stat-data-examples) ;; this ensures access to a data package
  (:export summarize-data summarize-results this-data this-report)
  (:shadowing-import-from :lisp-stat call-method call-next-method

      expt + - * / ** mod rem abs 1+ 1- log exp sqrt sin cos tan
      asin acos atan sinh cosh tanh asinh acosh atanh float random
      truncate floor ceiling round minusp zerop plusp evenp oddp 
      < <= = /= >= > > ;; complex
      conjugate realpart imagpart phase
      min max logand logior logxor lognot ffloor fceiling
      ftruncate fround signum cis

      <= float imagpart)) 

(in-package :my-clswork-user)

;; create some data by hand using arrays, and demonstrate access. 

(let ((myArray #2A((1 2 3)(4 5 6)))
      (myDF    (make-dataframe #2A((1 2 3)(4 5 6))))
      (myLOL   (list (list 1 2 3) (list 4 5 6)))
      ;; FIXME: listoflist conversion does not work.
      ;; (myDFlol (make-dataframe  '(list ((1 2 3)(4 5 6)))))

  (= (xref myArray 1 1)
     (xref myDF    1 1)
     (xref myLOL   1 1)))

[#B] Scoping with datasets

  • State “TODO” from “” [2010-11-04 Thu 18:46]

The following needs to work, and a related syntax for resampling and similar synthetic data approaches (bootstrapping, imputation) ought to use similar syntax as well.

(in-package :ls-user)
  ;; Syntax examples using lexical scope, closures, and bindings to
  ;; ensure a clean communication of results
  ;; This is actually a bit tricky, since we need to clarify whether
  ;; it is line-at-a-time that we are considering or if there is
  ;; another mapping strategy.  In particular, one could imagine a
  ;; looping-over-observations function, or a
  ;; looping-over-independent-observations function which leverages a
  ;; grouping variable which provides guidance for what is considered
  ;; independent from the sampling frame being considered. The frame
  ;; itself (definable via some form of metadata to clarify scope?)
  ;; could clearly provide a bit of relativity for clarifying what
  ;; statistical independence means.
  (with-data dataset ((dsvarname1 [usevarname1])
                      (dsvarname2 [usevarname2]))

  ;; SAS-centric approach to spec'ing work 
     dataset ((dsvarname1 [usevarname1])
              (dsvarname2 [usevarname2]))

  ;; SAS plus "statistical sensibility"... for example, if an
  ;; independent observation actually consists of many observations so
  ;; that a dataframe of independence results -- for example,
  ;; longitudinal data or spatial data or local-truncated network data
  ;; are clean examples of such happening -- then we get the data
  ;; frame or row representing the independent result.
     dataset independence-defining-variable
       ((dsvarname1 [usevarname1])
        (dsvarname2 [usevarname2]))

[#B] Dataframe variable typing

  • State “DONE” from “CURR” [2010-11-30 Tue 17:56]
    check-type approach works, we would just have to throw a catchable error if we want to use it in a reliable fashion.
  • State “CURR” from “TODO” [2010-11-30 Tue 17:56]
  • State “TODO” from “” [2010-11-04 Thu 18:48]

Seems to generally work, need to ensure that we use this for appropriate typing.

(in-package :ls-user)
(defparameter *df-test*
  (make-instance 'dataframe-array
                 :storage #2A (('a "test0" 0 0d0)
                               ('b "test1" 1 1d0)
                               ('c "test2" 2 2d0)
                               ('d "test3" 3 3d0)
                               ('e "test4" 4 4d0))
                 :doc "test reality"
                 :case-labels (list "0" "1" 2 "3" "4")
                 :var-labels (list "symbol" "string" "integer" "double-float")
                 :var-types (list 'symbol 'string 'integer 'double-float)))

;; with SBCL, ints become floats?  Need to adjust output
;; representation appropriately..

(defun check-var (df colnum)
  (let ((nobs (xdim (dataset df) 0)))
    (dotimes (i nobs)
      (check-type (xref df i colnum) (elt (var-types df) i)))))

(xdim (dataset *df-test*) 1)
(xdim (dataset *df-test*) 0)

(check-var *df-test* 0)

  (xref *df-test* 1 1))

(check-type (xref *df-test* 1 1)
            string) ;; => nil, so good.
(check-type (xref *df-test* 1 1)
            vector) ;; => nil, so good.
(check-type (xref *df-test* 1 1)
            real) ;; => simple-error type thrown, so good.

;; How to nest errors within errors?
(check-type (check-type (xref *df-test* 1 1) real) ;; => error thrown, so good.
(xref *df-test* 1 2)

(check-type *df-test*
            dataframe-array) ; nil is good.

(integerp (xref *df-test* 1 2))
(floatp (xref *df-test* 1 2))
(integerp (xref *df-test* 1 3))
(type-of (xref *df-test* 1 3))
(floatp (xref *df-test* 1 3))

(type-of (vector 1 1d0))
(type-of *df-test*)

(xref *df-test* 2 1)
(xref *df-test* 0 0)
(xref *df-test* 1 0)
(xref *df-test* 1 '*)

[#A] Random Numbers [2/6]

  • State “CURR” from “TODO” [2010-11-05 Fri 15:41]
  • State “TODO” from “” [2010-10-14 Thu 00:12]

Need to select and choose a probability system (probability functions, random numbers). Goal is to have a general framework for representing probability functions, functionals on probabilities, and reproducible random streams based on such numbers.

[#B] CL-VARIATES system evaluation [2/3]

  • State “CURR” from “TODO” [2010-11-05 Fri 15:40]
  • State “TODO” from “” [2010-10-12 Tue 14:16]

CL-VARIATES is a system developed by Gary W King. It uses streams with seeds, and is hence reproducible. (Random comment: why do CL programmers as a class ignore computational reproducibility?)

The main problem with this system is licensing. It has a weird licensing schema which prevents

(in-package :cl-user)
(ql:quickload :cl-variates)
;;(ql:quickload :cl-variates-test)
(in-package :cl-variates-test)
;; check tests
(run-tests :suite 'cl-variates-test)
(describe (run-tests :suite 'cl-variates-test))

basic example of reproducible draws from the uniform and normal random number streams.

(in-package :cl-variates-user)

(defparameter state (make-random-number-generator))
(setf (random-seed state) 44)

(random-seed state)
(loop for i from 1 to 10 collect
                  (random-range state 0 10))
;; => (1 5 1 0 7 1 2 2 8 10)
(setf (random-seed state) 44)
(loop for i from 1 to 10 collect
                  (random-range state 0 10))
;; => (1 5 1 0 7 1 2 2 8 10)

(setf (random-seed state) 44)
(random-seed state)
(loop for i from 1 to 10 collect
                  (normal-random state 0 1))
;; => 
;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
;;  1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
;;  0.20750134211656893 -0.14501914108452274)

(setf (random-seed state) 44)
(loop for i from 1 to 10 collect
                  (normal-random state 0 1))
;; => 
;; (-1.2968656102820426 0.40746363934173213 -0.8594712469518473 0.8795681301148328
;;  1.0731526250004264 -0.8161629082481728 0.7001813608754809 0.1078045427044097
;;  0.20750134211656893 -0.14501914108452274)

[#B] Full example of general usage

  • State “CURR” from “TODO” [2010-11-05 Fri 15:40]
  • State “TODO” from “” [2010-11-05 Fri 15:40]

What we want to do here is describe the basic available API that is present. So while the previous work describes what the basic reproducibility approach would be in terms of generating lists of reproducible pRNG streams, we need the full range of possible probability laws that are present.

One of the good things about cl-variates is that it provides for reproducibility. One of the bad things is that it has a mixed bag for an API.

[#B] CL-RANDOM system evaluation

  • State “TODO” from “” [2010-11-05 Fri 15:40]


  1. no seed setting for random numbers
  2. contamination of a probability support with optimization and linear algebra.


  1. good code
  2. nice design for generics.

[#B] Native CLS (from XLS)

  • State “TODO” from “” [2010-11-05 Fri 15:40]

[#B] Numerical Linear Algebra [0/6]

  • State “TODO” from “” [2010-10-14 Thu 00:12]

[#B] LLA evaluation

  • State “TODO” from “” [2010-10-12 Tue 14:13]

LLA is an SBCL targetted linear algebra library from Tamas Papp

#+NAME LLA-experiments

(in-package :cl-user)
(asdf:oos 'asdf:load-op 'lla)
(in-package :lla-user)
;;; experiment here

[#B] Lisp-Matrix system evaluation

  • State “CURR” from “TODO” [2010-10-12 Tue 14:13]
  • State “TODO” from “” [2010-10-12 Tue 14:13]

    in progress

[#A] Review GSLL and Antik as per Mirko’s suggestion [0/2]

<2012-10-12 Fri>: Mirko suggested this approach. Might look into using GSLL and Antik to augment lisp-matrix, pRNG selection (GSLL) and replace xarray (Antik).

[#A] Review Antik

evaluation should go here

[#A] Review GSLL

evaluation (updated to <2012-10-12 Fri>, removing Tony’s obsolete opinions) should go here.

[#B] LispLab system evaluation

  • State “TODO” from “” [2010-10-12 Tue 14:13]

LL is an SBCL targetted linear algebra library from —

[#B] Numerical Statistical Procedures to implement [0/4]

By this, I mean procedures which provide numerical quantitative or precise categorical qualitative results (for example, excluding visualizations, which tend to produce very useful but relatively imprecise actionable insights).

[#A] Basic Descriptives


(in-package :cls-user)
;;;; PFIM notes

;; PFIM 3.2 

;; population design eval and opt
- # individuals
- # sampling times
- sampling times?

number of samples/cost of lab analysis and collection
expt constraints

(defun pfim (&key model ( constraints ( summary-function )

  (list num-subjects num-times list-times))))

N individuals i
Each individal has a deisgn psi_i
   nubmer of samples n_i and sampling times t_{i{1}} t_{i{n_1}}
   individuals can differ


individual-level model 

(=model y_i (+ (f \theta_i \psi_i) epsilion_i ))
(=var \epsilion_i \sigma_between \sigma_within  )

;; Information Matrix for pop deisgn 

(defparameter IM (sum  (i 1 N) (MF \psi_i \phi_i)))

For nonlinear structureal models, expand around RE=0

Cramer-Rao : MF^{-1} is lower bound for estimation variance.

Design comparisons: 

- smallest SE, but is a matrix, so
- criteria for matrix comparison
-- D-opt, (power (determinant MF) (/ 1 P))

find design maxing D opt, (power (determinant MF) (/ 1 P))
Design varialables 
 -- contin vars for smapling times within interval or set -- number of groups for cat vars

Stat in Med 2009, expansion around post-hoc RE est, not necessarily zero.

Example binary covariate C

(if (= i reference-class) 
    (setf (aref C i) 0)
    (setf (aref C i) 1))

;; Exponential RE,
(=model (log \theta) (  ))

;; extensions

;; outputs

PFIM provides for a given design and values of \beta: 
 compute extended FIM
 SE/RSE for \beta of each class of each covar
 eval influence of design on SE(\beta)

inter-occassion variability (IOV)
- patients sampled more than once, H occassions
- RE for IOV
- additional vars to estimate


;;; comparison criteria

functional of conc/time curve which is used for comparison, i.e. 
(AUC conc/time-curve)
(Cmax conc/time-curve)
(Tmax conc/time-curve)


(defun conc/time-curve (t) 
  ;; computation
  (let ((conc (exp (* t \beta1))))

(url-get "")

;;; Thinking of generics...
(information-matrix model parameters)
(information-matrix variance-matrix)
(information-matrix model data)
(information-matrix list-of-individual-IMs)

(defun IM (loglikelihood parameters times)
  "Does double work.  Sum up the resulting IMs to form a full IM."
  (let ((IM (make-matrix (length parameters)
			 (length parameters)
			 :initial-value 0.0d0)))
    (dolist (parameterI parameters)
      (dolist (parameterJ parameters)
	(setf (aref IM I J)
	      (differentiate (differentiate loglikelihood parameterI) parameterJ))))))

[#C] difference between empirical, fisherian, and …? information.

[#C] Example of Integration with CL-GENOMIC

  • State “TODO” from “” [2010-10-12 Tue 14:03]

CL-GENOMIC is a very interesting data-structure strategy for manipulating sequence data.

(in-package :cl-user)
(asdf:oos 'asdf:compile-op :ironclad)
(asdf:oos 'asdf:load-op :cl-genomic)

(in-package :bio-sequence)
(make-dna "agccg") ;; fine
(make-aa "agccg")  ;; fine
(make-aa "agc9zz") ;; error expected

[#A] Visual data analytic methods [0/10]

[#B] Evaluate Graphics toolkits [0/3]

[#B] QT and similar tools

Pros: Insight from Deepyan Saarkar and Mike – super fast plot routines for dynamic interactive graphics. Crossplatform.

Common-QT, or ??

[#B] Cairo-based

Pros: actually have example lattice/trellis plotting system with Tamas Papp’s cl-2d based on cl-cairo2.

Con: cross-platform? setup on a mac?

[#C] Others?

increase priority if someone cares enough to code

[#A] Evaluate APIs, methods, designs, back-end into framework [0/2]

By this, I mean that we need a good proposal, and it should be based on history. I need to email Paul Murrell and Deepyan and Hadley for a “lessons learned in statistical graphics systems”.

[#B] Paul Murrell’s core R system (grid?)

[#B] Peter Siebel’s Grammer of Graphics javascript implementation

Thanks Peter Schmiedeskamp for pointing this out.

[#B] Implement Visualization routines [0/2]

This should happen one-two times. Remember, with the package approach, we can try out new packages and continually build newer ones, as long as we appropriately version the interface for user selection purposes.

[#A] actual statistical graphics

we need functions to x-y plots, bar charts, and need the API to describe in terms of statistical quantities, scatter plots, etc.

Also, will be important to get prototypes working ASAP to get testing and feedback. But remember, not all users want what is good for them, just like not all people “honestly prefer” completely healthy approaches to life.

See and the Philosophy for background for the above.

[#C] Statistical toolkit and pipeline, ala ORCA

Orca (sutherland, cook, lumley, rossini, etal) was a java based toolkit for pipelined DAG representations of interactive dynamic graphics.

[#B] Documentation and Examples [0/3]

  • State “TODO” from “” [2010-10-14 Thu 00:12]

I’ve started putting examples of use in function documentation. If you are a lisp’er, you’ll find this pendantic and insulting. Many of the uses are trivial. However, this has been tested out on a number of research statisticians (the primary user audience) and found useful.

Still need to write the

(evaluate-documentation-example 'function-name)

function, which would print out the example and run it live. Hopefully with the same results. Need to setup the infrastructure, but basically, we’d like something like:


and have this within the doc-string. Then the doc-string would be parsed for the appropriate code and we’d get the results, evaluated in a special name space derived from the object (function, class) name, possibly with the corresponding functions and environment set up that would be required. OR, it could just work in cl-user (which is the default starting location.

Here are some possible common lisp systems that could be evaluated:

[#B] Docudown

  • State “TODO” from “” [2010-11-05 Fri 15:34]


  • State “TODO” from “” [2010-11-05 Fri 15:34]

[#B] CLPDF, and literate data analysis

  • State “TODO” from “” [2010-11-05 Fri 15:34]


Place proposals for features, work, etc here

<2011-12-29 Thu> new stuff

First new proposal is to track proposals.


This project is dedicated to all the lisp hackers out there who provided the basic infrastructure to get so far so fast with minimal effort on my part.

And to all the people trying to help to get this off the ground.