-
Notifications
You must be signed in to change notification settings - Fork 945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MAHOUT-1856][WIP] create a framework for new Mahout Clustering, Classification, and Optimization Algorithms #246
Conversation
Thanks @rawkintrevo.. This is a great start. I'd originally thought that we'd put the algos in the math-scala module, but looking at it, I think this makes sense. |
Nice work on the slides also! 💯 Another thought that I had is that we may want to allow In-Core matrices as parameters. Just throwing it out there for discussion. I cant think of a particular use case off the top of my head but It seems that there should be. |
This is WIP so it doesn't really matter that it's failing atm but this isn't good.. Couldn't find maven on the |
Building framework add placeholders for ensemble pipeline and fitness test added readme fixed readme image fixed readme image fixed readme image removed autogen comments from files added readme fixed readme image fixed readme image add placeholders for ensemble pipeline and fitness test added readme fixed readme image fixed readme image fixed readme image removed autogen comments from files add placeholders for ensemble pipeline and fitness test added readme fixed readme image fixed readme image fixed readme image removed autogen comments from files updated pom.xml for 0.12.3-SNAP Added MeanSubtract [MAHOUT-1856] Added Subtract Mean [MAHOUT-1856] Added Durbin Watson Test [MAHOUT-1856] Refactored to use fitParams [MAHOUT-1856] Added Factorize refactored others [MAHOUT-1856] Added Factorize Test [MAHOUT-1856] Refactor to scala-math [MAHOUT-1856] Rename Normalizer to StandardScaler [MAHOUT-1856] Don't collect for ols [MAHOUT-1856] Add Several Fittness Tests [MAHOUT-1856] Clean up files [MAHOUT-1856] Remove old algos
0bf7e64
to
ef544c8
Compare
@dlyubimov, @mahout-team could you review/provide feedback on this? Originally Trevor had a separate module for this, and I asked him to move it into math-scala. |
I should note that my above comment was meant as a question to all concerned: "Does this (the algorithms base traits package) Belong in math-scala or in its own module... Trevor initially had it in its own module, I asked that he move it to math-scala, due in most part to size concerns of the entire binary distribution.. however I am now questioning whether that is the correct call.. I can think of several good reasons that it should have its own module, (and others that it should be in math-scala).. For now, I think that math-scala is an ok place, and we can just move it to another module if we want later. @smarthi @pferrel @rawkintrevo @dlyubimov please weigh in here. Not a big deal right now, but thinking ahead it may be good to get the location right. |
|
||
abstract class Model extends Serializable { | ||
|
||
var fitParams = collection.mutable.Map[String, MahoutVector]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So model requires all parameters be named and be vectors? Shouldn't this be an artifact of a more specialized models, like say glms? there are plenty of ML models that would probably not fit that fairly rigid definition, not easily or pragmatically, at least.
regressor.fit(drmY, drmX) | ||
fitParams("beta0") = regressor.fitParams("beta") | ||
|
||
val Y = drmY(1 until drmY.nrow.toInt, 0 until 1).checkpoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider giving the callee an option to specify cache hint here, since it seems essential that this algorithm relies on plenty of things being put into memory. right now this implies memory only, so if it doesn't fit then the algorithm is going to a crawl. (in all fairness, in spark it would go to a crawl with memory and disk spec too, but to put things in perspective, we are probably talking of a difference between crawling snail and snail skeleton 20 years after its death.)
*/ | ||
abstract class Regressor extends Model { | ||
|
||
def fit[Int](drmY: DrmLike[Int], drmX: DrmLike[Int]): Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps first parameter being predictors and second parameter being target is more intuitive signature for most
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also maybe drmT for target ?
Also: why [Int] only? In any kind of fitter, it should be fit[K](...)
there's no reason why correspondence between predictor and target cannot be established on any type of key in general case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also : i am not sure fitter should extend a model. Rather, fitter should return a model, i.e.,
fit[k](...) : Model
, right? i think that's how this pattern goes in most kits. Fitter is just a startegy.
And i'd abstain from doing abstract classes in Scala, unless trait absolutely cannot do it. (and it can in this case). Abstract class points to a specific, single and necessary base implementation in hirerarchy, which is too constraining without need for the actual implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on swapping Y and X- I've had to catch myself more than once on that already. I think the original motivation was a tip of the hat to R's Y ~ x
but I agree with you.
Re [Int]
I realized that later, but haven't gone through to swap them all back to K
. It is (or was) K
in some places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess if this is abstract enough, we also need to be able admit hyperparameters which are of course specific for every fitter. in R this is trivial (any call can be made a bag of whatever named arguments), but in Scala this may need a bit of a thought (if this abstraction needs to be that high). otherwise, i guess most scala kits just create a concrete fit signature per implementation.
if the Regressor trait is meant to be common to all possible regression class algorithms, we either need a way to universally pass in the hyperparameters, or just not have fit abstraction in the regressor trait at all . (then what i guess :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like perhaps fit[K](X:drmLike[K],T:drmLike[K], (Symbol,Any)*):Model
where the optional list is hyperparameter list. Then hyperparameterized calls could be something like:
fit(X,Y, 'alpha ->10.0, 'lambda -> 1.e-15)
etc.
This loses strong type-iness of the signature but call is not that ugly, and it might be ok if specific implementations are cleanly documented .
Thanks for the quick review @dlyubimov will incorporate your sugguestions |
|
||
var residuals: DrmLike[K] = _ | ||
|
||
def fit(drmFeatures: DrmLike[K], drmTarget: DrmLike[K]): Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that fit(..)
should be able to return a List of errors per sample so possibly: def fit(drmFeatures: DrmLike[K], drmTarget: DrmLike[K]): Any
would be a better signature for the trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well, drmFeatures
is somewhat confusing, with the Drm being a matrix of samples x features, i.e. features are columns, I think something like drmSamples
, or drmObservations
, or even drmX
may be more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by errors- do you mean like code errors, or the residuals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes eg. residuals, a hessian matrix, etc, anything that a user who is developing their own algorithm might like to see returned from their own fit(..)
method. My point was No reason to limit the return value to a Unit
, rather I'd think Any
would be a more appropriate return value from the base trait. Though maybe This would not work with the pipeline structure that you're setting up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewpalumbo re drmFeatures
vs drmX
and @dlyubimov
We're mixing conventions-
I feel like, for consistency- we either use drmFeatures
and drmTarget
(similar to sparkML) or drmX
and drmY
similar to sklearn and R- leaving for now, but open to debate- I have a slight bias towards drmX
, drmY
var addIntercept: Boolean = hyperparameters.asInstanceOf[Map[String, Boolean]].getOrElse("addIntercept", true) | ||
|
||
var summary = "" | ||
def fit(drmFeatures: DrmLike[K], drmTarget: DrmLike[K]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing on from our discussion on Slack I would think that Fit may be a more appropriate place for Hyperparameters eg:
fit(observed_independent: Drm[K], observerd_targets: Drm[K], hyperparamters: Option[List[double]]): List[double]
I think that this may be a matter of Convention, so If you're following a convention that I am not familiar with, this may be fine. However I feel that this may be more robust.
d = 2 : no auto-correlation | ||
d > 2 : negative auto-correlation | ||
*/ | ||
def DurbinWaton[K](model: Regressor[K]): Regressor[K] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misspelling- "DurbinWatson"
|
||
def transform[K: ClassTag](input: DrmLike[K]): DrmLike[K] ={ | ||
if (!isFit) { | ||
//throw an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this complete? I.e., do you want to throw an error here for this release?
def invTransform[K: ClassTag](input: DrmLike[K]): DrmLike[K] = { | ||
|
||
if (!isFit) { | ||
//throw an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same qestion here as before- do you want to throw an error here for 0.13.0?
def transform[K: ClassTag](input: DrmLike[K]): DrmLike[K] = { | ||
|
||
if (!isFit) { | ||
//throw an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a bit confusing (as well as the other mentions of 'if (!isFit) {// throw }`. So if i want to standardize a Drm to (mean = 0,std_dev = 1), I would need to do something like:
val drmStandardized = StandardScaler(unscaledDrm).fit().transform()
E.g., does the Transformer
trait need to extend Model
? If so, we do need this to throw / catch here if fit()
is not called by the user. Or possibly just move fit()
into the constructor in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As well, I think that this could be another argument for moving Hyperparamaters into fit(...)
, e.g. If for some reason we wanted to standardize on N(mean = 0, stdDev = 2), we could still call StandardScaler
and fit (Map["mu" -> 0, "sigma" ->2])
:
val drmStandardized = StandardScaler(unscaledDrm).fit(Map["mu" -> 0, "sigma" ->2]).transform()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct and agreed- sklearn convention is to also have a fitTransform
method to do it in one step.
will add that.
(estimate - Ranswers).sum should be < epsilon | ||
|
||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a couple of more tests here; at least one for Transform(...)
Trevor- looks really good to me- I've left some comments mainly about hyperparameter being moved to fit(...) from model, I think that his makes sense in many ways, E.g, When doing an highly iterative Hyperparameter search, It would eliminate a good amount of overhead to call:
rather than re-constructing the entire class each time. As well as i noted in line, I think that the |
I want to emphasize though that if you're building up to follow a convention (sk-learn-like, MlLib etc) which I am not familiar with That may e better to follow than my suggestions to make this framework as easy on new users as possible. |
Thanks for the review @andrewpalumbo ! sklearn parameters are set when the MLlib on the other hand, passes parameter maps in BOTH however, allow hyper parameters to be updated. And in the case you refer too, the model would not be re-instantiated, but something like this: model.param1 = 1
model.fit(X, y)
model.param1 = 2
model.fit To your point, I also want to make this as easy as possible for new users- so I think it would be best to leave the option to pass a parameter map at initiation, and also expose it as a optional parameter of the It also seems it would be logical (to avoid duplicating code) so have a
The would be called on the map when instantiated, and again if a |
@rawkintrevo Great- best of both worlds then! |
drmTarget: DrmLike[K], | ||
hyperparameters: Map[String, Any] = Map("" -> None)): Unit = { | ||
|
||
var hyperparameters: Option[Map[String,Any]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a setHyperparameters
right about here...
|
||
def fit(drmFeatures: DrmLike[K], | ||
drmTarget: DrmLike[K], | ||
hyperparameters: Map[String,Any] = Map("" -> None)): Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case it was not quite clear, my suggestion was to use hyperparameters: (Symbol, Any)*
. First, symbols are faster for something that meant to be an id, and second, have somewhat more palatable notation, e.g.,
val model = fit(X,Y,'k -> 10, 'alpha -> 1e-5)
and the signature overall
def fit(drmFeatures: DrmLike[K],
+ drmTarget: DrmLike[K],
+ hyperparameters: (Symbol, Any)*): Model
of course the implementation can easily get a map, should it need to:
val hmap = hyperparameters.toMap
That's actually a Scala pattern i developed and used in a similar situation.
/** | ||
* Abstract of Regressors | ||
*/ | ||
trait Regressor[K] extends Model { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regressor still extends model? Regressor's fit IMO should be a factory method
w.r.t. model instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that Spark and Flink follow the paradigm you are suggesting, however sklearn doesn't. If we're just going off of what others do, following the other larger packages- yea we should probably follow conventions of what other scala based "Big Data" packages do. However, I can't understand WHY they do it that way- it makes the code hard to read/follow and I assume is an artifact of all the serialization and the way they execute models (having to ship object around for map / reduce phases), that is to say they do it because they are forced to and at the expense of readability.
In Mahout, most of that is taken care of at the distributed engine level.
If we start going down the rabbit hole of "do as Spark and Flink do" we may find ourselves with entire class just for the summary of a linear model. I for one, want to stay as far away from that as possible. I'd like to see algorithm code (these and future) be as succinct and tractable as possible so that
- New contributors aren't intimidated (that is to say are encouraged to commit algorithms)
- Those algorithms can be easily reviewed and maintained with minimal Scala knowledge (as it limits the pool of willing and able contributors who understand the actual math at play)
That isn't to say, at the end of the day, your proposal is incorrect- you usually are correct and I value and appreciate you taking the time to review. I am saying, " i think that's how this pattern goes in most kits." is neither necessary nor sufficient imo, as in some respects I'm explicitly trying to avoid the approach of other packages, in this case- refactoring something to be more complex with no clear understanding of the benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in most kits i know (including two i am working on right now myself) the pattern is that fit returns a model object. sckit seems to be on the outlier end on this.
Also i think the approach "package A does it like this and we do/don't like it therefore it is good/bad" to be a dogma fallacy. IMO We just need to do what makes sense.
And it makes sense to me to serialize or persist the model, not the (fitting algorithm+model). This will cause problems both on user and implementation ends IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to model
persistence. Though we need some thinking here. We could use persist/read the model as a set of Drms, since we have native-to-mahout support for this. ie. drm.dfsWrite(..), dfsRead(...). this can be quite clunky though. (see NaiveBayes model.persist(...) eg.)
d > 2 : negative auto-correlation | ||
*/ | ||
def DurbinWatson[K](model: Regressor[K]): Regressor[K] = { | ||
val e: DrmLike[K] = model.residuals(1 until model.residuals.nrow.toInt, 0 until 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nrow.toInt
is generally dangerous as it does not catch the algorithm limitation. The problem with this (and i actually have run into this before) is that the algorithm obviously has in this case a limitation of 2 bln rows, and it should explicitly fall apart once this limit is reached, instead of silently producing a nonsense. I think there's a method specifically for this purpose in one of our math-scala package drm
, safeToNonNegInt
that would throw IllegalArgument if conversion loses significant bits.
Of course the best approach is to avoid such limitations in the first place, but if unavoidable, please use safeToNonNegInt
.
val Y = drmTarget(1 until drmTarget.nrow.toInt, 0 until 1).checkpoint(cacheHint) | ||
val Y_lag = drmTarget(0 until drmTarget.nrow.toInt - 1, 0 until 1).checkpoint(cacheHint) | ||
val X = drmFeatures(1 until drmFeatures.nrow.toInt, 0 until 1).checkpoint(cacheHint) | ||
val X_lag = drmFeatures(0 until drmFeatures.nrow.toInt - 1, 0 until 1).checkpoint(cacheHint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed all of these- but have since updated with safeToNonNegInt(
|
||
def invTransform[K: ClassTag](input: DrmLike[K]): DrmLike[K] = { | ||
|
||
if (!isFit){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style point: if (!isFit){
- space after close parens before brace ie: if (!isFit) {
Here and other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isFit has been more or less abstracted away by the Model/ModelFactory paradigm. need to clean this up
@@ -0,0 +1,8 @@ | |||
package org.apache.mahout.math.algorithms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File needs License
@@ -0,0 +1,22 @@ | |||
package org.apache.mahout.math.algorithms.regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
License
@@ -0,0 +1,8 @@ | |||
package org.apache.mahout.math.algorithms.regression.tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license
@@ -0,0 +1,8 @@ | |||
package org.apache.mahout.math.algorithms.regression.tests | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove all references to your name. (autogenereated by Intellij, etc..)
@@ -0,0 +1,135 @@ | |||
package org.apache.mahout.math.algorithms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license
val drmY = drmData(::, 4 until 5) | ||
|
||
val model = new OrdinaryLeastSquares[Int]().fit(drmX, drmY, 'calcStandardErrors → false) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the new factory method correctly, could this just be somethinng like:
val olsModel = OrdinaryLeastSquares[Int].fit(drmX, drmY, 'calcStandardErrors → false)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking closer at the code, I see that I was wrong in last comment.. Do we need helper objects for this?
.gitignore
Outdated
@@ -20,3 +20,6 @@ mr/temp | |||
temp | |||
foo | |||
math-tests/ | |||
derby.log | |||
bin/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this meant to be added? -- bin/*
don't think we want that in .gitignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope- intellij run amok
|
||
import scala.reflect.ClassTag | ||
|
||
class AsFactor extends PreprocessorModelFactory{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after opening {
} | ||
|
||
override def invTransform[K](input: DrmLike[K]) = { | ||
// not yet implemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throw NotImplementedException
here?
var colCentersV: MahoutVector = means | ||
|
||
def setCenters(centers: MahoutVector): Unit ={ | ||
if (means.length != centers.length){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before opening brace here and line above
if (addIntercept) { | ||
X = X cbind 1 | ||
} | ||
val drmXtXinv = solve(X.t %*% X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any way around pulling X.t %*% X
into memory here? I thought that recently you'd said that this was being done differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short answer is you can solve for beta without doing this- but not the var/co-var matrix. I can refactor such that if 'calcStandardErrors -> false
then it does it the distributed way.
Also, so long as XtX isn't toooo fat here, this isn't going to be that brutal. Even 50k columns dense results in a fairly manageable matrix (50k x 50k)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.. that might be useful- the re-factor for the case w/o a needed covariance matrix. You could also calculate the covariance matrix in a distributed manner on the out of core Drms.. My concern is that people love to hammer on new features.. Although point is moot if the answer is to use thinQR(..)
to solve which is also bottlenecked by in-memory X.t %*% X
Maybe a question for after the platform is finished...
@@ -28,10 +28,11 @@ import org.apache.mahout.math.drm.{drmBroadcast, _} | |||
import org.apache.mahout.math.drm.RLikeDrmOps._ | |||
import org.apache.mahout.math.scalabindings.{dvec, _} | |||
import org.apache.mahout.math.scalabindings.RLikeOps._ | |||
import sun.reflect.generics.reflectiveObjects.NotImplementedException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake - in my previous comment.. this should be UnsupportedOperationException
.
def apply(f: Vector): Double = f.max | ||
})) | ||
})(0, ::) | ||
new AsFactorModel(factorMap.sum.toInt, factorMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style nitpicking. since this really does not overload a trait-signed method, it would be best readable if the return type is explicitly stipulated in the declaration.
def fit[K](input: DrmLike[K]):MyModelType = {
Relevant JIRA: https://issues.apache.org/jira/browse/MAHOUT-1856
Readme.md provides a more comprehensive (yet still incomplete) overview.
Key Points:
Top Level Class:
Model has one method- fit, and coefs.
Transformers map a vector input to a vector output (same or different length)
Regressors map a vector input to a single output (e.g. a Double)
Classifiers extend Transformers which have created a probability vector by 'selecting' the class and returning the label (instead of the entire p-vector)
Pipelines and Ensembles are models as well, except they are composed from other models listed above, or from other pipelines and ensembles.
ToDo:
This is merely a conversation started on what to do.
I've included OLS as an example regressor and a normalizer as an example transformer, only for illustrative purposes. I really don't want to pack to many algos in to this initial commit, just an example/ proof of concept so we can say, yea- this framework makes sense for this kind of model OR ooh, we probably want to have these features too.