[FLINK-1994] [ml] Add different gain calculation schemes to SGD #1397

rawkintrevo · 2015-11-24T03:22:22Z

Continuation of pull request 1384.

Fixed long line issue, rebased on top of current master, and changed title per instructions.

tillrohrmann · 2015-11-24T12:55:30Z

There are still some scalastyle violations

error file=/home/travis/build/apache/flink/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala message=File line length exceeds 100 characters line=56
error file=/home/travis/build/apache/flink/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala message=File line length exceeds 100 characters line=221
error file=/home/travis/build/apache/flink/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala message=File line length exceeds 100 characters line=224

You can check that locally by simply running mvn clean verify.

rawkintrevo · 2015-11-25T15:48:14Z

That last fail looked like it happened in Kafka from something I never touched? E.g. random. Is there anyway to just retry the travis-ci build?

StephanEwen · 2015-11-25T15:52:28Z

The Kafka tests are not written in a bullet proof way. It may happen that the Kafka test-cluster has a hickup, in which case the tests cannot succeed...

You can (force) push changes to your branch, that re-triggers the CI.

rawkintrevo · 2015-11-26T02:02:11Z

So... looks like this passed. What do I do next?

chiwanpark · 2015-11-26T03:56:26Z

flink-staging/flink-ml/pom.xml

Please don't change indent.

rawkintrevo · 2015-12-19T03:40:47Z

I think I really screwed this up... I was just trying to undo the changes to flink-staging/flink-ml/pom.xml

chiwanpark · 2015-12-24T09:53:44Z

Hi @rawkintrevo, I think you should rebase your branch instead of merging master branch. Could you update your branch? Note that you should force push (git push -f origin flink-1994) after rebasing your branch.

rawkintrevo · 2016-01-07T15:12:13Z

So do I need to do anything else for this to get merged?

chiwanpark · 2016-01-08T01:31:46Z

Hi @rawkintrevo, I'm sorry about waiting you.

I have looked your pull request. Almost of changes are good but I have some few comments.

First, there are some meaningless changes such as indentation in flink-ml/pom.xml file and definition of optimize method of GradientDescent class. Please revert this changes.

Second, how about using Enum for the added parameter? Numbers cannot express meaning of optimization method.

Third, I think that we should apply this changes to documentation. Please add this content into the FlinkML documentation. :)

tillrohrmann · 2016-01-08T17:06:42Z

flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala

Indentation:

override def optimize( data: ... initialWeights: ...) : DataSet[WeightVector] = { }

rawkintrevo · 2016-01-09T00:11:12Z

on your second point- I agree that Enum would be better for human readable variable setting, but if one was doing a grid parameter search it would be easier to search over a range of integers than having to establish an array of specifically named variables. Not significantly harder, but still slightly more tedious.

Further, since I'm just learning scala- limited research/understanding leads me to believe 1) scala doesn't have an enum that works just like Java, and 2) changing the cases to more informative string names would achieve the desired goal?

In the mean time working on documentation and reverting the indentation (which I thought I had already done), and thanks for bearing with me as I am still learning.

chiwanpark · 2016-01-09T05:42:58Z

Hi @rawkintrevo, you can convert scala Enum to Int like following:

object Parameter extends Enumeration {
  type Parameter = Value
  val Param1, Param2, Param3, Param4 = Value
}

val convertedInt: Int = Parameter.Param1.id

Also, these values can be caught by pattern matching:

import Parameter._
val param: Parameter = getSomeEnumValues()

param match {
  case Param1 => // blahblah
  case Param2 => // blahblah
  case Param3 => // blahblah
  case Param4 => // blahblah
}

rawkintrevo · 2016-01-09T16:16:22Z

still refactoring to enumeration, but wanted to toss this up in case anyone is watching- I added a decay parameter which significantly generalizes the Xu and Inverse Scaling Methods. Also I added some docs, but am open to suggestions, for example- once I figure out how to set the method with enumeration, I'll include a note?

UPDATE: The more I think about this, I feel like any implementation I can think of adds a lot of complexity for very little marginal gain. That said I don't understand the advantage of enumeration over passing strings as the main benefit is it being better for humans reading the code.

If the benefits are significant can that be an improvement on another ticket? Any solution I come up with I feel is going to be awkward and I fear will be a candidate for refactoring anyway. I think my energy is better spent developing tests for this at the moment.

chiwanpark · 2016-01-11T07:13:30Z

I still would like to use enumeration because if we use string parameters, the user cannot check that the input parameter is valid. But if you have some problems to use enumeration, I'll modify your pull request before merging it.

Failing CI seems unrelated this PR. I'll check and test this.

rawkintrevo · 2016-01-21T01:29:38Z

I can't get the enumeration thing to work. If you can modify it that would be awesome. Thanks.

tillrohrmann · 2016-01-21T10:24:46Z

docs/libs/ml/optimization.md

Where is decay explained?

In which formula is k actually used?

tillrohrmann · 2016-01-21T13:44:08Z

I had some minor comments. I also opened a PR against your branch @rawkintrevo which shows how one could solve the problem with the enumerations (rawkintrevo#1). Would be great if we can fix these last issues :-)

rawkintrevo · 2016-01-21T14:05:48Z

For sure, definitely want to close the books on this one, and I appreciate all your help and patience as I have been learning. I will work on this tonight.

tillrohrmann · 2016-01-21T14:21:54Z

👍

Added SGD gain calculation schemes fixed optimal SGD calculation scheme FLINK-1994: Added 4 new effective learning rates [FLINK-1994] [ml] Add different gain calculation schemes to SGD fixed long lines in GradientDescent.scala [FLINK-1994][ml]Add different gain calculation schemes to SGD [FLINK-1994][ml] Add different gain calculation schemes to SGD [FLINK-1994][ml] Add different gain calculation schemes to SGD Added SGD gain calculation schemes fixed optimal SGD calculation scheme [FLINK-1994] [ml] Add different gain calculation schemes to SGD FLINK-1994: Added 3 new effective learning rates Added SGD gain calculation schemes fixed optimal SGD calculation scheme [FLINK-1994] [ml] Add different gain calculation schemes to SGD fixed long lines in GradientDescent.scala [FLINK-1994][ml] Add different gain calculation schemes to SGD [Flink-1994][ml] Add different gain calculation schemes to SGD [FLINK-1994][ml] Updated docs, refactored optimizationMethod from Int to String [FLINK-1994][ml] Added test and example to docs [FLINK-1994][ml] Fixed Int Artifacts in LinearRegression.scala Added LearningRateMethod to IterativeSolver The learning rate method defines how the effective learning step is calculated for each iteration step of the IterativeSolver. Fixed docs, merged enumeration from Till, fixed typo in Wus method

rawkintrevo · 2016-01-22T04:11:08Z

Sorry for the multiple commits- It wasn't registering the changes here so I kept pushing... I can squash them if you want.

I update the docs per your comments, added a brief description of the meaning of the decay rate and noticed a typo in Wu's method (forgot a negative sign in the exponent).

chiwanpark · 2016-01-22T04:38:00Z

docs/libs/ml/optimization.md

This line should be changed to .setOptimizationMethod(LearningRateMethod.Xu).

It should actually be .setLearningRateMethod(LearningRateMethod.Xu).

chiwanpark · 2016-01-22T04:48:31Z

@rawkintrevo Thanks for update! I still prefer using a learning method instance as parameter to remove method-specific parameter such as decay. But It is not mandatory.

Looks good to merge. 👍

tillrohrmann · 2016-01-22T10:00:45Z

Not squashing the commits here is the right way to go. We'll squash them once we merge it.

tillrohrmann · 2016-01-22T10:03:28Z

docs/libs/ml/optimization.md

This should be renamed to LearningRateMethod

tillrohrmann · 2016-01-22T10:06:47Z

LGTM, will merge it then. I'll address @chiwanpark comment concerning the decay value while merging it. Thanks for your work @rawkintrevo :-)

chiwanpark · 2016-01-22T10:13:30Z

@tillrohrmann Thanks! It is good news.
@rawkintrevo Thanks for contribution 👍

rawkintrevo · 2016-01-22T11:41:22Z

Thank you guys!

Added SGD gain calculation schemes fixed optimal SGD calculation scheme FLINK-1994: Added 4 new effective learning rates [FLINK-1994] [ml] Add different gain calculation schemes to SGD fixed long lines in GradientDescent.scala [FLINK-1994][ml]Add different gain calculation schemes to SGD [FLINK-1994][ml] Add different gain calculation schemes to SGD [FLINK-1994][ml] Add different gain calculation schemes to SGD Added SGD gain calculation schemes fixed optimal SGD calculation scheme [FLINK-1994] [ml] Add different gain calculation schemes to SGD FLINK-1994: Added 3 new effective learning rates Added SGD gain calculation schemes fixed optimal SGD calculation scheme [FLINK-1994] [ml] Add different gain calculation schemes to SGD fixed long lines in GradientDescent.scala [FLINK-1994][ml] Add different gain calculation schemes to SGD [Flink-1994][ml] Add different gain calculation schemes to SGD [FLINK-1994][ml] Updated docs, refactored optimizationMethod from Int to String [FLINK-1994][ml] Added test and example to docs [FLINK-1994][ml] Fixed Int Artifacts in LinearRegression.scala Added LearningRateMethod to IterativeSolver The learning rate method defines how the effective learning step is calculated for each iteration step of the IterativeSolver. Fixed docs, merged enumeration from Till, fixed typo in Wus method [FLINK-1994][ml] Added 4 new effective learning rate methods [FLINK-1994][ml] Add different gain calulation schemes to SGD This closes apache#1397.

chiwanpark reviewed Nov 26, 2015
View reviewed changes

flink-staging/flink-ml/pom.xml Outdated

Copy link

Member

chiwanpark Nov 26, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change indent.

rawkintrevo force-pushed the flink-1994 branch from 0a6b52f to eafe068 Compare December 31, 2015 18:28

tillrohrmann reviewed Jan 8, 2016
View reviewed changes

rawkintrevo force-pushed the flink-1994 branch from 1cd4b14 to 14fe0ff Compare January 9, 2016 21:15

tillrohrmann reviewed Jan 21, 2016
View reviewed changes

rawkintrevo force-pushed the flink-1994 branch from 7a781aa to 94362a2 Compare January 22, 2016 03:39

rawkintrevo added 2 commits January 21, 2016 21:57

[FLINK-1994][ml] Added 4 new effective learning rate methods

26ee243

[FLINK-1994][ml] Add different gain calulation schemes to SGD

721943a

chiwanpark reviewed Jan 22, 2016
View reviewed changes

tillrohrmann reviewed Jan 22, 2016
View reviewed changes

docs/libs/ml/optimization.md

Copy link

Contributor

tillrohrmann Jan 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be renamed to LearningRateMethod

asfgit closed this in 9c2f516 Jan 22, 2016

rmetzger added the component=Library/MachineLearning label Mar 14, 2019

[FLINK-1994] [ml] Add different gain calculation schemes to SGD #1397

[FLINK-1994] [ml] Add different gain calculation schemes to SGD #1397

Conversation

rawkintrevo commented Nov 24, 2015

Uh oh!

tillrohrmann commented Nov 24, 2015

Uh oh!

rawkintrevo commented Nov 25, 2015

Uh oh!

StephanEwen commented Nov 25, 2015

Uh oh!

rawkintrevo commented Nov 26, 2015

Uh oh!

chiwanpark Nov 26, 2015

Choose a reason for hiding this comment

Uh oh!

rawkintrevo commented Dec 19, 2015

Uh oh!

chiwanpark commented Dec 24, 2015

Uh oh!

rawkintrevo commented Jan 7, 2016

Uh oh!

chiwanpark commented Jan 8, 2016

Uh oh!

tillrohrmann Jan 8, 2016

Choose a reason for hiding this comment

Uh oh!

rawkintrevo commented Jan 9, 2016

Uh oh!

chiwanpark commented Jan 9, 2016

Uh oh!

rawkintrevo commented Jan 9, 2016

Uh oh!

chiwanpark commented Jan 11, 2016

Uh oh!

rawkintrevo commented Jan 21, 2016

Uh oh!

tillrohrmann Jan 21, 2016

Choose a reason for hiding this comment

Uh oh!

tillrohrmann Jan 21, 2016

Choose a reason for hiding this comment

Uh oh!

tillrohrmann commented Jan 21, 2016

Uh oh!

rawkintrevo commented Jan 21, 2016

Uh oh!

tillrohrmann commented Jan 21, 2016

Uh oh!

rawkintrevo commented Jan 22, 2016

Uh oh!

chiwanpark Jan 22, 2016

Choose a reason for hiding this comment

Uh oh!

tillrohrmann Jan 22, 2016

Choose a reason for hiding this comment

Uh oh!

chiwanpark commented Jan 22, 2016

Uh oh!

tillrohrmann commented Jan 22, 2016

Uh oh!

tillrohrmann Jan 22, 2016

Choose a reason for hiding this comment

Uh oh!

tillrohrmann commented Jan 22, 2016

Uh oh!

chiwanpark commented Jan 22, 2016

Uh oh!

rawkintrevo commented Jan 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants