Skip to content

Conversation

@rawkintrevo
Copy link
Contributor

Continuation of pull request 1384.

Fixed long line issue, rebased on top of current master, and changed title per instructions.

#1384

@tillrohrmann
Copy link
Contributor

There are still some scalastyle violations

error file=/home/travis/build/apache/flink/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala message=File line length exceeds 100 characters line=56
error file=/home/travis/build/apache/flink/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala message=File line length exceeds 100 characters line=221
error file=/home/travis/build/apache/flink/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/optimization/GradientDescent.scala message=File line length exceeds 100 characters line=224

You can check that locally by simply running mvn clean verify.

@rawkintrevo
Copy link
Contributor Author

That last fail looked like it happened in Kafka from something I never touched? E.g. random. Is there anyway to just retry the travis-ci build?

@StephanEwen
Copy link
Contributor

The Kafka tests are not written in a bullet proof way. It may happen that the Kafka test-cluster has a hickup, in which case the tests cannot succeed...

You can (force) push changes to your branch, that re-triggers the CI.

@rawkintrevo
Copy link
Contributor Author

So... looks like this passed. What do I do next?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't change indent.

@rawkintrevo
Copy link
Contributor Author

I think I really screwed this up... I was just trying to undo the changes to flink-staging/flink-ml/pom.xml

@chiwanpark
Copy link
Member

Hi @rawkintrevo, I think you should rebase your branch instead of merging master branch. Could you update your branch? Note that you should force push (git push -f origin flink-1994) after rebasing your branch.

@rawkintrevo
Copy link
Contributor Author

So do I need to do anything else for this to get merged?

@chiwanpark
Copy link
Member

Hi @rawkintrevo, I'm sorry about waiting you.

I have looked your pull request. Almost of changes are good but I have some few comments.

First, there are some meaningless changes such as indentation in flink-ml/pom.xml file and definition of optimize method of GradientDescent class. Please revert this changes.

Second, how about using Enum for the added parameter? Numbers cannot express meaning of optimization method.

Third, I think that we should apply this changes to documentation. Please add this content into the FlinkML documentation. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation:

override def optimize(
    data: ...
    initialWeights: ...)
  : DataSet[WeightVector] = {

}

@rawkintrevo
Copy link
Contributor Author

on your second point- I agree that Enum would be better for human readable variable setting, but if one was doing a grid parameter search it would be easier to search over a range of integers than having to establish an array of specifically named variables. Not significantly harder, but still slightly more tedious.

Further, since I'm just learning scala- limited research/understanding leads me to believe 1) scala doesn't have an enum that works just like Java, and 2) changing the cases to more informative string names would achieve the desired goal?

In the mean time working on documentation and reverting the indentation (which I thought I had already done), and thanks for bearing with me as I am still learning.

@chiwanpark
Copy link
Member

Hi @rawkintrevo, you can convert scala Enum to Int like following:

object Parameter extends Enumeration {
  type Parameter = Value
  val Param1, Param2, Param3, Param4 = Value
}

val convertedInt: Int = Parameter.Param1.id

Also, these values can be caught by pattern matching:

import Parameter._
val param: Parameter = getSomeEnumValues()

param match {
  case Param1 => // blahblah
  case Param2 => // blahblah
  case Param3 => // blahblah
  case Param4 => // blahblah
}

@rawkintrevo
Copy link
Contributor Author

still refactoring to enumeration, but wanted to toss this up in case anyone is watching- I added a decay parameter which significantly generalizes the Xu and Inverse Scaling Methods. Also I added some docs, but am open to suggestions, for example- once I figure out how to set the method with enumeration, I'll include a note?

UPDATE: The more I think about this, I feel like any implementation I can think of adds a lot of complexity for very little marginal gain. That said I don't understand the advantage of enumeration over passing strings as the main benefit is it being better for humans reading the code.

If the benefits are significant can that be an improvement on another ticket? Any solution I come up with I feel is going to be awkward and I fear will be a candidate for refactoring anyway. I think my energy is better spent developing tests for this at the moment.

@chiwanpark
Copy link
Member

I still would like to use enumeration because if we use string parameters, the user cannot check that the input parameter is valid. But if you have some problems to use enumeration, I'll modify your pull request before merging it.

Failing CI seems unrelated this PR. I'll check and test this.

@rawkintrevo
Copy link
Contributor Author

I can't get the enumeration thing to work. If you can modify it that would be awesome. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is decay explained?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which formula is k actually used?

@tillrohrmann
Copy link
Contributor

I had some minor comments. I also opened a PR against your branch @rawkintrevo which shows how one could solve the problem with the enumerations (rawkintrevo#1). Would be great if we can fix these last issues :-)

@rawkintrevo
Copy link
Contributor Author

For sure, definitely want to close the books on this one, and I appreciate all your help and patience as I have been learning. I will work on this tonight.

@tillrohrmann
Copy link
Contributor

👍

Added SGD gain calculation schemes

fixed optimal SGD calculation scheme

FLINK-1994: Added 4 new effective learning rates

[FLINK-1994] [ml] Add different gain calculation schemes to SGD

fixed long lines in GradientDescent.scala

[FLINK-1994][ml]Add different gain calculation schemes to SGD

[FLINK-1994][ml] Add different gain calculation schemes to SGD

[FLINK-1994][ml] Add different gain calculation schemes to SGD

Added SGD gain calculation schemes

fixed optimal SGD calculation scheme

[FLINK-1994] [ml] Add different gain calculation schemes to SGD

FLINK-1994: Added 3 new effective learning rates

Added SGD gain calculation schemes

fixed optimal SGD calculation scheme

[FLINK-1994] [ml] Add different gain calculation schemes to SGD

fixed long lines in GradientDescent.scala

[FLINK-1994][ml] Add different gain calculation schemes to SGD

[Flink-1994][ml] Add different gain calculation schemes to SGD

[FLINK-1994][ml] Updated docs, refactored optimizationMethod from Int to String

[FLINK-1994][ml] Added test and example to docs

[FLINK-1994][ml] Fixed Int Artifacts in LinearRegression.scala

Added LearningRateMethod to IterativeSolver

The learning rate method defines how the effective learning step is calculated for
each iteration step of the IterativeSolver.

Fixed docs, merged enumeration from Till, fixed typo in Wus method
@rawkintrevo
Copy link
Contributor Author

Sorry for the multiple commits- It wasn't registering the changes here so I kept pushing... I can squash them if you want.

I update the docs per your comments, added a brief description of the meaning of the decay rate and noticed a typo in Wu's method (forgot a negative sign in the exponent).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be changed to .setOptimizationMethod(LearningRateMethod.Xu).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should actually be .setLearningRateMethod(LearningRateMethod.Xu).

@chiwanpark
Copy link
Member

@rawkintrevo Thanks for update! I still prefer using a learning method instance as parameter to remove method-specific parameter such as decay. But It is not mandatory.

Looks good to merge. 👍

@tillrohrmann
Copy link
Contributor

Not squashing the commits here is the right way to go. We'll squash them once we merge it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be renamed to LearningRateMethod

@tillrohrmann
Copy link
Contributor

LGTM, will merge it then. I'll address @chiwanpark comment concerning the decay value while merging it. Thanks for your work @rawkintrevo :-)

@chiwanpark
Copy link
Member

@tillrohrmann Thanks! It is good news.
@rawkintrevo Thanks for contribution 👍

@rawkintrevo
Copy link
Contributor Author

Thank you guys!

tillrohrmann pushed a commit to tillrohrmann/flink that referenced this pull request Jan 22, 2016
Added SGD gain calculation schemes

fixed optimal SGD calculation scheme

FLINK-1994: Added 4 new effective learning rates

[FLINK-1994] [ml] Add different gain calculation schemes to SGD

fixed long lines in GradientDescent.scala

[FLINK-1994][ml]Add different gain calculation schemes to SGD

[FLINK-1994][ml] Add different gain calculation schemes to SGD

[FLINK-1994][ml] Add different gain calculation schemes to SGD

Added SGD gain calculation schemes

fixed optimal SGD calculation scheme

[FLINK-1994] [ml] Add different gain calculation schemes to SGD

FLINK-1994: Added 3 new effective learning rates

Added SGD gain calculation schemes

fixed optimal SGD calculation scheme

[FLINK-1994] [ml] Add different gain calculation schemes to SGD

fixed long lines in GradientDescent.scala

[FLINK-1994][ml] Add different gain calculation schemes to SGD

[Flink-1994][ml] Add different gain calculation schemes to SGD

[FLINK-1994][ml] Updated docs, refactored optimizationMethod from Int to String

[FLINK-1994][ml] Added test and example to docs

[FLINK-1994][ml] Fixed Int Artifacts in LinearRegression.scala

Added LearningRateMethod to IterativeSolver

The learning rate method defines how the effective learning step is calculated for
each iteration step of the IterativeSolver.

Fixed docs, merged enumeration from Till, fixed typo in Wus method

[FLINK-1994][ml] Added 4 new effective learning rate methods

[FLINK-1994][ml] Add different gain calulation schemes to SGD

This closes apache#1397.
@asfgit asfgit closed this in 9c2f516 Jan 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants