-
Notifications
You must be signed in to change notification settings - Fork 38
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Bradford Cross
committed
Jul 21, 2010
1 parent
dbc5509
commit 03a4b39
Showing
1 changed file
with
42 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
Infer is a library for machine learning and statistical inference. | ||
|
||
Infer is written in clojure and leverages many JVM packages. | ||
|
||
You can read about some of the performance testing that went into the foundation here: | ||
http://measuringmeasures.com/blog/2010/4/17/numerics-benchmarking-fast-statistics-on-the-jvm.html | ||
http://measuringmeasures.com/blog/2010/3/28/matrix-benchmarks-fast-linear-algebra-on-the-jvm.html | ||
http://measuringmeasures.com/blog/2010/3/27/fast-clojure-vectors-and-multidimensional-arrays.html | ||
|
||
Infer does not aim to replace R, or be an R for the JVM, or be a numpy written in Clojure. | ||
|
||
Infer seeks to be a new kind of data tool. | ||
|
||
1) Infer was initially extracted from production use. It has the philosophy that you should be able to deploy your research code, not be forced to rewrite it in C++ or java to deploy to production. | ||
|
||
2) Infer was initially extracted from use with hadoop. It has the philosophy that you should be able to scale up your research computations without doing much differently, and event be able to deploy clusters from your local interactive environment and run your code in a cluster almsot as esaily as your run it on your local machine. | ||
|
||
3) Infer was initially extracted from real life research and deployment of machine learning systems. It has the philosophy that abstractions should be composable so that you can try out different algorithms, and different combinations of measures, models, and learning algorithms. Most systems, like R, Weka, Lingpipe, or Matlab, for example, have all the methods coded up in silos, and not meant to be used composably. Want to learn tree models using different loss functions, different measures of statistical diveregence, or a different peanalized learning algorithm? Traditional libraries say too bad, Infer says, rock it out. | ||
|
||
What is in Infer? | ||
|
||
Infer is broad and covers, or intends to cover, most of what you'll find in statistical and machine learning. | ||
|
||
-matrices: wrap the ujmp matrix package. (examples) | ||
|
||
-probabilities: a language for dealing with probabilities. composable into things like graphical models, tress, classifiers, etc. | ||
|
||
-measures: lots of measure functions, stats, etc. Everyhting you want for measuring things in your learners. We also ahve lots of information theory specific measures in information-theory. | ||
|
||
-linear models: ols, gls, glms, and pelanized / regularized regression methods - L1, L2 and combinations thereof. (examples) | ||
|
||
-neighbor methods: lsh methods, nearest neighbor queries, kernel methods | ||
|
||
-learning: convex optimization, regularized learning, subset selection | ||
|
||
-cross validation: all the tools you want for k-fold, etc. | ||
|
||
-genearlized classification & regression (stuff that currently lives in classification) | ||
|
||
-features: easy to deal with feature represetnations as a matrix, a clojure vector of vectors, or nseted maps, and transforming between the options. easy dealing with continuous or discrete, or discretizing continuous variables (currently in classification). merging equivalence classes. | ||
|
||
[TODO: what currently lives in classification is really general to clasificationa nd regression. the shit for discretizing is for turning regression into classification. stuff in probability and features allows you to build classifiers. figure out better way to structure and explain this.] |