Skip to content

Commit

Permalink
firest attempt at a readme.
Browse files Browse the repository at this point in the history
  • Loading branch information
Bradford Cross committed Jul 21, 2010
1 parent dbc5509 commit 03a4b39
Showing 1 changed file with 42 additions and 0 deletions.
42 changes: 42 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Infer is a library for machine learning and statistical inference.

Infer is written in clojure and leverages many JVM packages.

You can read about some of the performance testing that went into the foundation here:
http://measuringmeasures.com/blog/2010/4/17/numerics-benchmarking-fast-statistics-on-the-jvm.html
http://measuringmeasures.com/blog/2010/3/28/matrix-benchmarks-fast-linear-algebra-on-the-jvm.html
http://measuringmeasures.com/blog/2010/3/27/fast-clojure-vectors-and-multidimensional-arrays.html

Infer does not aim to replace R, or be an R for the JVM, or be a numpy written in Clojure.

Infer seeks to be a new kind of data tool.

1) Infer was initially extracted from production use. It has the philosophy that you should be able to deploy your research code, not be forced to rewrite it in C++ or java to deploy to production.

2) Infer was initially extracted from use with hadoop. It has the philosophy that you should be able to scale up your research computations without doing much differently, and event be able to deploy clusters from your local interactive environment and run your code in a cluster almsot as esaily as your run it on your local machine.

3) Infer was initially extracted from real life research and deployment of machine learning systems. It has the philosophy that abstractions should be composable so that you can try out different algorithms, and different combinations of measures, models, and learning algorithms. Most systems, like R, Weka, Lingpipe, or Matlab, for example, have all the methods coded up in silos, and not meant to be used composably. Want to learn tree models using different loss functions, different measures of statistical diveregence, or a different peanalized learning algorithm? Traditional libraries say too bad, Infer says, rock it out.

What is in Infer?

Infer is broad and covers, or intends to cover, most of what you'll find in statistical and machine learning.

-matrices: wrap the ujmp matrix package. (examples)

-probabilities: a language for dealing with probabilities. composable into things like graphical models, tress, classifiers, etc.

-measures: lots of measure functions, stats, etc. Everyhting you want for measuring things in your learners. We also ahve lots of information theory specific measures in information-theory.

-linear models: ols, gls, glms, and pelanized / regularized regression methods - L1, L2 and combinations thereof. (examples)

-neighbor methods: lsh methods, nearest neighbor queries, kernel methods

-learning: convex optimization, regularized learning, subset selection

-cross validation: all the tools you want for k-fold, etc.

-genearlized classification & regression (stuff that currently lives in classification)

-features: easy to deal with feature represetnations as a matrix, a clojure vector of vectors, or nseted maps, and transforming between the options. easy dealing with continuous or discrete, or discretizing continuous variables (currently in classification). merging equivalence classes.

[TODO: what currently lives in classification is really general to clasificationa nd regression. the shit for discretizing is for turning regression into classification. stuff in probability and features allows you to build classifiers. figure out better way to structure and explain this.]

0 comments on commit 03a4b39

Please sign in to comment.