[MAHOUT-1856][WIP] create a framework for new Mahout Clustering, Classification, and Optimization Algorithms #246

rawkintrevo · 2016-08-04T15:50:37Z

Relevant JIRA: https://issues.apache.org/jira/browse/MAHOUT-1856

Readme.md provides a more comprehensive (yet still incomplete) overview.

Key Points:
Top Level Class:
Model has one method- fit, and coefs.

Transformers map a vector input to a vector output (same or different length)
Regressors map a vector input to a single output (e.g. a Double)
Classifiers extend Transformers which have created a probability vector by 'selecting' the class and returning the label (instead of the entire p-vector)

Pipelines and Ensembles are models as well, except they are composed from other models listed above, or from other pipelines and ensembles.

ToDo:

All models need a uniform way to expose their tuning parameters -> this will be required for a auto-tuning algo.
Pipelines / Ensembles must be able to account and report the tunable paremeters of their sub models
Need fitness functions
Native method wrappers- Underlying engines and third party packages have implementations of many ML models, let's not recreate the wheel by exposing YET ANOTHER sgd algorithm. Instead should be able to convert matrix to expected format of 'other' library, run model, get results, package back into matrix and pass on in pipeline or ensemble. (This is especially useful for DeepLearning4J integration). Also Native implementations on engine of some algos probably more efficient by leveraging engine specific tricks (think Flink delta iterators) than implementations we would make.
Lots more, open for discussion.

This is merely a conversation started on what to do.

I've included OLS as an example regressor and a normalizer as an example transformer, only for illustrative purposes. I really don't want to pack to many algos in to this initial commit, just an example/ proof of concept so we can say, yea- this framework makes sense for this kind of model OR ooh, we probably want to have these features too.

andrewpalumbo · 2016-08-04T22:22:41Z

Thanks @rawkintrevo.. This is a great start. I'd originally thought that we'd put the algos in the math-scala module, but looking at it, I think this makes sense.

andrewpalumbo · 2016-08-05T01:10:29Z

Nice work on the slides also! 💯

Another thought that I had is that we may want to allow In-Core matrices as parameters. Just throwing it out there for discussion. I cant think of a particular use case off the top of my head but It seems that there should be.

rawkintrevo · 2016-11-30T17:26:20Z

This is WIP so it doesn't really matter that it's failing atm but this isn't good.. Couldn't find maven on the wget command.

Building framework add placeholders for ensemble pipeline and fitness test added readme fixed readme image fixed readme image fixed readme image removed autogen comments from files added readme fixed readme image fixed readme image add placeholders for ensemble pipeline and fitness test added readme fixed readme image fixed readme image fixed readme image removed autogen comments from files add placeholders for ensemble pipeline and fitness test added readme fixed readme image fixed readme image fixed readme image removed autogen comments from files updated pom.xml for 0.12.3-SNAP Added MeanSubtract [MAHOUT-1856] Added Subtract Mean [MAHOUT-1856] Added Durbin Watson Test [MAHOUT-1856] Refactored to use fitParams [MAHOUT-1856] Added Factorize refactored others [MAHOUT-1856] Added Factorize Test [MAHOUT-1856] Refactor to scala-math [MAHOUT-1856] Rename Normalizer to StandardScaler [MAHOUT-1856] Don't collect for ols [MAHOUT-1856] Add Several Fittness Tests [MAHOUT-1856] Clean up files [MAHOUT-1856] Remove old algos

andrewpalumbo · 2017-01-16T04:34:59Z

@dlyubimov, @mahout-team could you review/provide feedback on this? Originally Trevor had a separate module for this, and I asked him to move it into math-scala.

andrewpalumbo · 2017-01-16T22:38:06Z

I should note that my above comment was meant as a question to all concerned: "Does this (the algorithms base traits package) Belong in math-scala or in its own module... Trevor initially had it in its own module, I asked that he move it to math-scala, due in most part to size concerns of the entire binary distribution.. however I am now questioning whether that is the correct call.. I can think of several good reasons that it should have its own module, (and others that it should be in math-scala).. For now, I think that math-scala is an ok place, and we can just move it to another module if we want later. @smarthi @pferrel @rawkintrevo @dlyubimov please weigh in here. Not a big deal right now, but thinking ahead it may be good to get the location right.

… into mahout-1856

dlyubimov · 2017-01-17T17:46:02Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/Model.scala

+
+abstract class Model extends Serializable {
+
+  var fitParams = collection.mutable.Map[String, MahoutVector]()


So model requires all parameters be named and be vectors? Shouldn't this be an artifact of a more specialized models, like say glms? there are plenty of ML models that would probably not fit that fairly rigid definition, not easily or pragmatically, at least.

dlyubimov · 2017-01-17T17:50:21Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/CochraneOrcutt.scala

+    regressor.fit(drmY, drmX)
+    fitParams("beta0") = regressor.fitParams("beta")
+
+    val Y = drmY(1 until drmY.nrow.toInt, 0 until 1).checkpoint()


Consider giving the callee an option to specify cache hint here, since it seems essential that this algorithm relies on plenty of things being put into memory. right now this implies memory only, so if it doesn't fit then the algorithm is going to a crawl. (in all fairness, in spark it would go to a crawl with memory and disk spec too, but to put things in perspective, we are probably talking of a difference between crawling snail and snail skeleton 20 years after its death.)

dlyubimov · 2017-01-17T18:10:14Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/Regressor.scala

+  */
+abstract class Regressor extends Model {
+
+  def fit[Int](drmY: DrmLike[Int], drmX: DrmLike[Int]): Unit


perhaps first parameter being predictors and second parameter being target is more intuitive signature for most

also maybe drmT for target ?

Also: why [Int] only? In any kind of fitter, it should be fit[K](...)

there's no reason why correspondence between predictor and target cannot be established on any type of key in general case.

also : i am not sure fitter should extend a model. Rather, fitter should return a model, i.e.,
fit[k](...) : Model, right? i think that's how this pattern goes in most kits. Fitter is just a startegy.

And i'd abstain from doing abstract classes in Scala, unless trait absolutely cannot do it. (and it can in this case). Abstract class points to a specific, single and necessary base implementation in hirerarchy, which is too constraining without need for the actual implementations.

+1 on swapping Y and X- I've had to catch myself more than once on that already. I think the original motivation was a tip of the hat to R's Y ~ x but I agree with you.

Re [Int] I realized that later, but haven't gone through to swap them all back to K. It is (or was) K in some places.

i guess if this is abstract enough, we also need to be able admit hyperparameters which are of course specific for every fitter. in R this is trivial (any call can be made a bag of whatever named arguments), but in Scala this may need a bit of a thought (if this abstraction needs to be that high). otherwise, i guess most scala kits just create a concrete fit signature per implementation.

if the Regressor trait is meant to be common to all possible regression class algorithms, we either need a way to universally pass in the hyperparameters, or just not have fit abstraction in the regressor trait at all . (then what i guess :) )

like perhaps fit[K](X:drmLike[K],T:drmLike[K], (Symbol,Any)*):Model where the optional list is hyperparameter list. Then hyperparameterized calls could be something like:

fit(X,Y, 'alpha ->10.0, 'lambda -> 1.e-15)

etc.
This loses strong type-iness of the signature but call is not that ugly, and it might be ok if specific implementations are cleanly documented .

rawkintrevo · 2017-01-19T23:02:32Z

Thanks for the quick review @dlyubimov will incorporate your sugguestions

andrewpalumbo · 2017-01-22T21:57:59Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/Regressor.scala

+
+  var residuals: DrmLike[K] = _
+
+  def fit(drmFeatures: DrmLike[K], drmTarget: DrmLike[K]): Unit


I would think that fit(..) should be able to return a List of errors per sample so possibly: def fit(drmFeatures: DrmLike[K], drmTarget: DrmLike[K]): Any would be a better signature for the trait.

As well, drmFeatures is somewhat confusing, with the Drm being a matrix of samples x features, i.e. features are columns, I think something like drmSamples, or drmObservations, or even drmX may be more straightforward.

by errors- do you mean like code errors, or the residuals?

yes eg. residuals, a hessian matrix, etc, anything that a user who is developing their own algorithm might like to see returned from their own fit(..) method. My point was No reason to limit the return value to a Unit, rather I'd think Any would be a more appropriate return value from the base trait. Though maybe This would not work with the pipeline structure that you're setting up?

@andrewpalumbo re drmFeatures vs drmX and @dlyubimov
We're mixing conventions-
I feel like, for consistency- we either use drmFeatures and drmTarget (similar to sparkML) or drmX and drmY similar to sklearn and R- leaving for now, but open to debate- I have a slight bias towards drmX , drmY

andrewpalumbo · 2017-01-22T22:10:24Z

...scala/src/main/scala/org/apache/mahout/math/algorithms/regression/OrdinaryLeastSquares.scala

+  var addIntercept: Boolean = hyperparameters.asInstanceOf[Map[String, Boolean]].getOrElse("addIntercept", true)
+
+  var summary = ""
+  def fit(drmFeatures: DrmLike[K], drmTarget: DrmLike[K]): Unit = {


Continuing on from our discussion on Slack I would think that Fit may be a more appropriate place for Hyperparameters eg:

fit(observed_independent: Drm[K], observerd_targets: Drm[K], hyperparamters: Option[List[double]]): List[double]

I think that this may be a matter of Convention, so If you're following a convention that I am not familiar with, this may be fine. However I feel that this may be more robust.

andrewpalumbo · 2017-01-22T22:14:36Z

...src/main/scala/org/apache/mahout/math/algorithms/regression/tests/AutocorrelationTests.scala

+       d = 2 : no auto-correlation
+       d > 2 : negative auto-correlation
+  */
+  def DurbinWaton[K](model: Regressor[K]): Regressor[K] = {


misspelling- "DurbinWatson"

andrewpalumbo · 2017-01-22T22:16:44Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/transformer/AsFactor.scala

+
+  def transform[K: ClassTag](input: DrmLike[K]): DrmLike[K] ={
+    if (!isFit) {
+      //throw an error


Is this complete? I.e., do you want to throw an error here for this release?

andrewpalumbo · 2017-01-22T22:18:45Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/transformer/MeanCenter.scala

+  def invTransform[K: ClassTag](input: DrmLike[K]): DrmLike[K] = {
+
+    if (!isFit) {
+      //throw an error


Same qestion here as before- do you want to throw an error here for 0.13.0?

andrewpalumbo · 2017-01-22T22:32:35Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/transformer/StandardScaler.scala

+  def transform[K: ClassTag](input: DrmLike[K]): DrmLike[K] = {
+
+    if (!isFit) {
+      //throw an error


I find this a bit confusing (as well as the other mentions of 'if (!isFit) {// throw }`. So if i want to standardize a Drm to (mean = 0,std_dev = 1), I would need to do something like:

val drmStandardized = StandardScaler(unscaledDrm).fit().transform()

E.g., does the Transformer trait need to extend Model ? If so, we do need this to throw / catch here if fit() is not called by the user. Or possibly just move fit() into the constructor in this case.

As well, I think that this could be another argument for moving Hyperparamaters into fit(...), e.g. If for some reason we wanted to standardize on N(mean = 0, stdDev = 2), we could still call StandardScaler and fit (Map["mu" -> 0, "sigma" ->2]):

val drmStandardized = StandardScaler(unscaledDrm).fit(Map["mu" -> 0, "sigma" ->2]).transform()

correct and agreed- sklearn convention is to also have a fitTransform method to do it in one step.

will add that.

andrewpalumbo · 2017-01-22T22:48:55Z

math-scala/src/test/scala/org/apache/mahout/math/algorithms/RegressionSuite.scala

+    (estimate - Ranswers).sum should be < epsilon
+
+  }
+


It would be good to have a couple of more tests here; at least one for Transform(...)

andrewpalumbo · 2017-01-22T23:02:37Z

Trevor- looks really good to me- I've left some comments mainly about hyperparameter being moved to fit(...) from model, I think that his makes sense in many ways, E.g, When doing an highly iterative Hyperparameter search, It would eliminate a good amount of overhead to call:

aModel.fit(....,HyperParameters: Map["hParameter1" -> value , "hParameter2" -> ...])

rather than re-constructing the entire class each time. As well as i noted in line, I think that the fit(...) method should have the ability to return at least a List[double] of errors per row if needed, So I would suggest that it return Any rather than Unit in the base Traits. (unless the convention that you're following is to rely on predict for this.

andrewpalumbo · 2017-01-22T23:08:05Z

I want to emphasize though that if you're building up to follow a convention (sk-learn-like, MlLib etc) which I am not familiar with That may e better to follow than my suggestions to make this framework as easy on new users as possible.

rawkintrevo · 2017-01-22T23:26:52Z

Thanks for the review @andrewpalumbo !

sklearn parameters are set when the Estimator is instantiated..

MLlib on the other hand, passes parameter maps in fit as you suggest

BOTH however, allow hyper parameters to be updated. And in the case you refer too, the model would not be re-instantiated, but something like this:

model.param1 = 1
model.fit(X, y) 
model.param1 = 2
model.fit

To your point, I also want to make this as easy as possible for new users- so I think it would be best to leave the option to pass a parameter map at initiation, and also expose it as a optional parameter of the fit method.

It also seems it would be logical (to avoid duplicating code) so have a setHyperparameters(paramMap: Map([String,Any])

val paramSet1 = Map( "param1" -> 1 )
model.setHyperparameters(paramSet1)

The would be called on the map when instantiated, and again if a Map is passed in fit or if the user wants to, and someday by a CV method.

andrewpalumbo · 2017-01-22T23:54:45Z

@rawkintrevo Great- best of both worlds then!

rawkintrevo · 2017-01-24T20:50:17Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/CochraneOrcutt.scala

+          drmTarget: DrmLike[K],
+          hyperparameters: Map[String, Any] = Map("" -> None)): Unit = {
+
+    var hyperparameters: Option[Map[String,Any]] = None


there should be a setHyperparameters right about here...

dlyubimov · 2017-01-24T21:07:24Z

If not too much hassle, please consider using unicode characters →, ⇒, ← (for new code).
In intelliJ this is easily facilitated by adding substitution 'live' templates:

dlyubimov · 2017-01-24T20:48:19Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/Regressor.scala

+
+  def fit(drmFeatures: DrmLike[K],
+          drmTarget: DrmLike[K],
+          hyperparameters: Map[String,Any] = Map("" -> None)): Unit


Just in case it was not quite clear, my suggestion was to use hyperparameters: (Symbol, Any)*. First, symbols are faster for something that meant to be an id, and second, have somewhat more palatable notation, e.g.,

val model = fit(X,Y,'k -> 10, 'alpha -> 1e-5)

and the signature overall

def fit(drmFeatures: DrmLike[K], + drmTarget: DrmLike[K], + hyperparameters: (Symbol, Any)*): Model

of course the implementation can easily get a map, should it need to:

val hmap = hyperparameters.toMap

That's actually a Scala pattern i developed and used in a similar situation.

dlyubimov · 2017-01-24T20:51:00Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/Regressor.scala

+/**
+  * Abstract of Regressors
+  */
+trait Regressor[K] extends Model {


Regressor still extends model? Regressor's fit IMO should be a factory method w.r.t. model instead

I agree that Spark and Flink follow the paradigm you are suggesting, however sklearn doesn't. If we're just going off of what others do, following the other larger packages- yea we should probably follow conventions of what other scala based "Big Data" packages do. However, I can't understand WHY they do it that way- it makes the code hard to read/follow and I assume is an artifact of all the serialization and the way they execute models (having to ship object around for map / reduce phases), that is to say they do it because they are forced to and at the expense of readability.

In Mahout, most of that is taken care of at the distributed engine level.

If we start going down the rabbit hole of "do as Spark and Flink do" we may find ourselves with entire class just for the summary of a linear model. I for one, want to stay as far away from that as possible. I'd like to see algorithm code (these and future) be as succinct and tractable as possible so that

New contributors aren't intimidated (that is to say are encouraged to commit algorithms)

Those algorithms can be easily reviewed and maintained with minimal Scala knowledge (as it limits the pool of willing and able contributors who understand the actual math at play)

That isn't to say, at the end of the day, your proposal is incorrect- you usually are correct and I value and appreciate you taking the time to review. I am saying, " i think that's how this pattern goes in most kits." is neither necessary nor sufficient imo, as in some respects I'm explicitly trying to avoid the approach of other packages, in this case- refactoring something to be more complex with no clear understanding of the benefit.

in most kits i know (including two i am working on right now myself) the pattern is that fit returns a model object. sckit seems to be on the outlier end on this.

Also i think the approach "package A does it like this and we do/don't like it therefore it is good/bad" to be a dogma fallacy. IMO We just need to do what makes sense.

And it makes sense to me to serialize or persist the model, not the (fitting algorithm+model). This will cause problems both on user and implementation ends IMO

+1 to model persistence. Though we need some thinking here. We could use persist/read the model as a set of Drms, since we have native-to-mahout support for this. ie. drm.dfsWrite(..), dfsRead(...). this can be quite clunky though. (see NaiveBayes model.persist(...) eg.)

dlyubimov · 2017-01-24T21:00:51Z

...src/main/scala/org/apache/mahout/math/algorithms/regression/tests/AutocorrelationTests.scala

+       d > 2 : negative auto-correlation
+  */
+  def DurbinWatson[K](model: Regressor[K]): Regressor[K] = {
+    val e: DrmLike[K] = model.residuals(1 until model.residuals.nrow.toInt, 0 until 1)


nrow.toInt is generally dangerous as it does not catch the algorithm limitation. The problem with this (and i actually have run into this before) is that the algorithm obviously has in this case a limitation of 2 bln rows, and it should explicitly fall apart once this limit is reached, instead of silently producing a nonsense. I think there's a method specifically for this purpose in one of our math-scala package drm, safeToNonNegInt that would throw IllegalArgument if conversion loses significant bits.

Of course the best approach is to avoid such limitations in the first place, but if unavoidable, please use safeToNonNegInt.

rawkintrevo · 2017-01-25T15:53:38Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/CochraneOrcutt.scala

+    val Y = drmTarget(1 until drmTarget.nrow.toInt, 0 until 1).checkpoint(cacheHint)
+    val Y_lag = drmTarget(0 until drmTarget.nrow.toInt - 1, 0 until 1).checkpoint(cacheHint)
+    val X = drmFeatures(1 until drmFeatures.nrow.toInt, 0 until 1).checkpoint(cacheHint)
+    val X_lag = drmFeatures(0 until drmFeatures.nrow.toInt - 1, 0 until 1).checkpoint(cacheHint)


missed all of these- but have since updated with safeToNonNegInt(

andrewpalumbo · 2017-01-29T21:25:54Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/transformer/MeanCenter.scala

+
+  def invTransform[K: ClassTag](input: DrmLike[K]): DrmLike[K] = {
+
+    if (!isFit){


Style point: if (!isFit){ - space after close parens before brace ie: if (!isFit) {
Here and other places.

isFit has been more or less abstracted away by the Model/ModelFactory paradigm. need to clean this up

andrewpalumbo · 2017-01-29T21:26:45Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/ModelFactory.scala

@@ -0,0 +1,8 @@
+package org.apache.mahout.math.algorithms


File needs License

andrewpalumbo · 2017-01-29T21:27:22Z

...scala/src/main/scala/org/apache/mahout/math/algorithms/regression/LinearRegressorModel.scala

@@ -0,0 +1,22 @@
+package org.apache.mahout.math.algorithms.regression


andrewpalumbo · 2017-01-29T21:27:42Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/tests/AbstractTest.scala

@@ -0,0 +1,8 @@
+package org.apache.mahout.math.algorithms.regression.tests


andrewpalumbo · 2017-01-29T21:28:45Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/regression/tests/AbstractTest.scala

@@ -0,0 +1,8 @@
+package org.apache.mahout.math.algorithms.regression.tests
+
+/**


need to remove all references to your name. (autogenereated by Intellij, etc..)

andrewpalumbo · 2017-01-29T21:29:52Z

math-scala/src/test/scala/org/apache/mahout/math/algorithms/RegressionTestsSuiteBase.scala

@@ -0,0 +1,135 @@
+package org.apache.mahout.math.algorithms


andrewpalumbo · 2017-01-29T21:33:17Z

math-scala/src/test/scala/org/apache/mahout/math/algorithms/RegressionSuiteBase.scala

+    val drmY = drmData(::, 4 until 5)
+
+    val model = new OrdinaryLeastSquares[Int]().fit(drmX, drmY, 'calcStandardErrors → false)
+


If I understand the new factory method correctly, could this just be somethinng like:

val olsModel = OrdinaryLeastSquares[Int].fit(drmX, drmY, 'calcStandardErrors → false)

?

Looking closer at the code, I see that I was wrong in last comment.. Do we need helper objects for this?

andrewpalumbo · 2017-01-29T21:35:54Z

.gitignore

@@ -20,3 +20,6 @@ mr/temp
 temp
 foo
 math-tests/
+derby.log
+bin/*


is this meant to be added? -- bin/* don't think we want that in .gitignore

nope- intellij run amok

andrewpalumbo · 2017-01-29T21:37:23Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala

+
+import scala.reflect.ClassTag
+
+class AsFactor extends PreprocessorModelFactory{


space after opening {

andrewpalumbo · 2017-01-29T21:40:13Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala

+  }
+
+  override def invTransform[K](input: DrmLike[K]) = {
+    // not yet implemented


throw NotImplementedException here?

andrewpalumbo · 2017-01-29T21:41:10Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/MeanCenter.scala

+  var colCentersV: MahoutVector = means
+
+  def setCenters(centers: MahoutVector): Unit ={
+    if (means.length != centers.length){


space before opening brace here and line above

andrewpalumbo · 2017-01-29T21:45:57Z

.../src/main/scala/org/apache/mahout/math/algorithms/regression/OrdinaryLeastSquaresModel.scala

+    if (addIntercept) {
+      X = X cbind 1
+    }
+    val drmXtXinv = solve(X.t %*% X)


Do we have any way around pulling X.t %*% X into memory here? I thought that recently you'd said that this was being done differently.

short answer is you can solve for beta without doing this- but not the var/co-var matrix. I can refactor such that if 'calcStandardErrors -> false then it does it the distributed way.

Also, so long as XtX isn't toooo fat here, this isn't going to be that brutal. Even 50k columns dense results in a fairly manageable matrix (50k x 50k)

hmm.. that might be useful- the re-factor for the case w/o a needed covariance matrix. You could also calculate the covariance matrix in a distributed manner on the out of core Drms.. My concern is that people love to hammer on new features.. Although point is moot if the answer is to use thinQR(..) to solve which is also bottlenecked by in-memory X.t %*% X

Maybe a question for after the platform is finished...

…elFactory

andrewpalumbo · 2017-01-30T04:19:58Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala

@@ -28,10 +28,11 @@ import org.apache.mahout.math.drm.{drmBroadcast, _}
 import org.apache.mahout.math.drm.RLikeDrmOps._
 import org.apache.mahout.math.scalabindings.{dvec, _}
 import org.apache.mahout.math.scalabindings.RLikeOps._
+import sun.reflect.generics.reflectiveObjects.NotImplementedException


my mistake - in my previous comment.. this should be UnsupportedOperationException.

dlyubimov · 2017-01-30T22:34:25Z

math-scala/src/main/scala/org/apache/mahout/math/algorithms/preprocessing/AsFactor.scala

+            def apply(f: Vector): Double = f.max
+        }))
+      })(0, ::)
+    new AsFactorModel(factorMap.sum.toInt, factorMap)


style nitpicking. since this really does not overload a trait-signed method, it would be best readable if the return type is explicitly stipulated in the declaration.

def fit[K](input: DrmLike[K]):MyModelType = {

rawkintrevo changed the title ~~[MAHOUT-1856][WIP] reate a framework for new Mahout Clustering, Classification, and Optimization Algorithms~~ [MAHOUT-1856][WIP] create a framework for new Mahout Clustering, Classification, and Optimization Algorithms Aug 8, 2016

rawkintrevo added 2 commits January 11, 2017 22:07

[MAHOUT-1856] Empty Commit

ef544c8

rawkintrevo force-pushed the mahout-1856 branch from 0bf7e64 to ef544c8 Compare January 12, 2017 04:12

rawkintrevo added 4 commits January 11, 2017 22:16

[MAHOUT-1856] Empty Commit

7f13682

[MAHOUT-1856] Trying to kickstart CI build

5ece7fd

MAHOUT-1856 Clean up Pom

d140e2d

Changed OLS to OrdinaryLeastSquares

9f047f1

rawkintrevo added 2 commits January 16, 2017 21:21

Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/mahout…

cfbebe5

… into mahout-1856

MAHOUT-1856 Add Summary to OLS and Test

045744a

dlyubimov reviewed Jan 19, 2017

View reviewed changes

rawkintrevo added 3 commits January 20, 2017 09:40

MAHOUT-1856 ClassTags, refactor fitParams, drm input names

048d306

MAHOUT-1856 ClassTags refactor

850e3cd

MAHOUT-1856 Proposed Hyperparameter solution

4d32c87

andrewpalumbo reviewed Jan 22, 2017

View reviewed changes

rawkintrevo added 2 commits January 22, 2017 20:58

MAHOUT-1856 Updated hyperparameters and throw errors for no fit

199dac4

MAHOUT-1856 Updated statistical and unit testing paradigms

c0d1769

rawkintrevo commented Jan 24, 2017

View reviewed changes

dlyubimov reviewed Jan 24, 2017

View reviewed changes

rawkintrevo added 2 commits January 25, 2017 02:05

MAHOUT-1856 refactor hyper parameters and unit test for durbinwatson

bbd7a1b

MAHOUT-1856 refactored all nrow.toInt to safeTononNegInt

6ad2c23

rawkintrevo commented Jan 25, 2017

View reviewed changes

MAHOUT-1856 refactor to Model / ModelFactory paradigm

338707b

andrewpalumbo reviewed Jan 29, 2017

View reviewed changes

rawkintrevo added 2 commits January 29, 2017 19:24

MAHOUT-1856 refactored SE calculations from OLS to LinearRegressorMod…

67a54bd

…elFactory

MAHOUT-1856 apalumbo style comments

510b7de

andrewpalumbo reviewed Jan 30, 2017

View reviewed changes

rawkintrevo added 2 commits January 29, 2017 23:51

MAHOUT-1856 fixed asFactor and added test for frameworks

a805f60

MAHOUT-1856 somehow transform function wasn't included

926f004

dlyubimov approved these changes Jan 30, 2017

View reviewed changes

MAHOUT-1856 renamed to factory to fitter and fixed asFactor test

62b730b

asfgit closed this in 9a31923 Feb 1, 2017


		abstract class Model extends Serializable {

		var fitParams = collection.mutable.Map[String, MahoutVector]()


		var residuals: DrmLike[K] = _

		def fit(drmFeatures: DrmLike[K], drmTarget: DrmLike[K]): Unit


		def invTransform[K: ClassTag](input: DrmLike[K]): DrmLike[K] = {

		if (!isFit){

		@@ -0,0 +1,22 @@
		package org.apache.mahout.math.algorithms.regression

		@@ -0,0 +1,8 @@
		package org.apache.mahout.math.algorithms.regression.tests

		val drmY = drmData(::, 4 until 5)

		val model = new OrdinaryLeastSquares[Int]().fit(drmX, drmY, 'calcStandardErrors → false)


		import scala.reflect.ClassTag

		class AsFactor extends PreprocessorModelFactory{

[MAHOUT-1856][WIP] create a framework for new Mahout Clustering, Classification, and Optimization Algorithms #246

[MAHOUT-1856][WIP] create a framework for new Mahout Clustering, Classification, and Optimization Algorithms #246

Conversation

rawkintrevo commented Aug 4, 2016 • edited Loading

andrewpalumbo commented Aug 4, 2016

andrewpalumbo commented Aug 5, 2016 • edited Loading

rawkintrevo commented Nov 30, 2016

andrewpalumbo commented Jan 16, 2017

andrewpalumbo commented Jan 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlyubimov Jan 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlyubimov Jan 19, 2017 • edited Loading

Choose a reason for hiding this comment

rawkintrevo commented Jan 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewpalumbo Jan 22, 2017 • edited Loading

Choose a reason for hiding this comment

andrewpalumbo Jan 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewpalumbo commented Jan 22, 2017

andrewpalumbo commented Jan 22, 2017

rawkintrevo commented Jan 22, 2017 • edited Loading

andrewpalumbo commented Jan 22, 2017

Choose a reason for hiding this comment

dlyubimov commented Jan 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewpalumbo Jan 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewpalumbo Jan 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewpalumbo Jan 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewpalumbo Jan 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rawkintrevo commented Aug 4, 2016 •

edited

Loading

andrewpalumbo commented Aug 5, 2016 •

edited

Loading

andrewpalumbo commented Jan 16, 2017 •

edited

Loading

dlyubimov Jan 17, 2017 •

edited

Loading

dlyubimov Jan 19, 2017 •

edited

Loading

andrewpalumbo Jan 22, 2017 •

edited

Loading

andrewpalumbo Jan 22, 2017 •

edited

Loading

rawkintrevo commented Jan 22, 2017 •

edited

Loading

dlyubimov commented Jan 24, 2017 •

edited

Loading

andrewpalumbo Jan 29, 2017 •

edited

Loading

andrewpalumbo Jan 29, 2017 •

edited

Loading

andrewpalumbo Jan 29, 2017 •

edited

Loading

andrewpalumbo Jan 29, 2017 •

edited

Loading