New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression with categorical data and introducing formula language #31

Merged
merged 55 commits into from Aug 19, 2016

Conversation

Projects
None yet
3 participants
@lokeshh

lokeshh commented Jul 4, 2016

No description provided.

@lokeshh lokeshh changed the title from Regression with categorical data and introducing formula language to [WIP] Regression with categorical data and introducing formula language Jul 4, 2016

Show outdated Hide outdated spec/spec_regression.rb
@@ -0,0 +1,26 @@
require 'rspec_helper.rb'

This comment has been minimized.

@v0dro

v0dro Jul 4, 2016

Member

could you rename that file to spec_helper please?

@v0dro

v0dro Jul 4, 2016

Member

could you rename that file to spec_helper please?

This comment has been minimized.

@lokeshh

lokeshh Jul 4, 2016

Thanks. Renamed.

@lokeshh

lokeshh Jul 4, 2016

Thanks. Renamed.

Show outdated Hide outdated .rubocop.yml
AllCops:
Include:
- 'lib/statsample-glm/glm/formula.rb'
- 'lib/statsample-glm/glm/regression.rb'

This comment has been minimized.

@lokeshh

lokeshh Jul 6, 2016

@zverok I'm trying to include rubocop for the code I will be writing. The Rubocop gives many offences in the current code. I'm not planning to solve them at this time. So, is there a way to just include these two files (the two files above), so that Rubocop only runs for these two files and none except these two? I tried just an obvious thing and it isn't working. Rubocop still runs for all other files.

Thanks

@lokeshh

lokeshh Jul 6, 2016

@zverok I'm trying to include rubocop for the code I will be writing. The Rubocop gives many offences in the current code. I'm not planning to solve them at this time. So, is there a way to just include these two files (the two files above), so that Rubocop only runs for these two files and none except these two? I tried just an obvious thing and it isn't working. Rubocop still runs for all other files.

Thanks

This comment has been minimized.

@zverok

zverok Jul 7, 2016

You can play with include/exclude settings in .rubocop.yml. As far as I can recall, it is not very straightforward, but manageable to include and exclude virtually any sets of files.
Let me know if it would became hard, I'll try to experiment myself.

@zverok

zverok Jul 7, 2016

You can play with include/exclude settings in .rubocop.yml. As far as I can recall, it is not very straightforward, but manageable to include and exclude virtually any sets of files.
Let me know if it would became hard, I'll try to experiment myself.

This comment has been minimized.

@lokeshh

lokeshh Jul 8, 2016

Ok. I will try.

@lokeshh

lokeshh Jul 8, 2016

Ok. I will try.

This comment has been minimized.

@zverok

zverok Jul 8, 2016

Unfortunately, as I checked it, there seems to be no easy way to "include only this and that file". However, there are two solution:

  1. Readable yet non-DRY: just list ALL the files in exclude (except your two ones)
  2. Smart yet not very readable: use in Exclude regexp with negative lookbehind, like this:
Exclude:
  - !ruby/regexp /(?<!formula\.rb|regression\.rb)$/

It reads like "matches end-of-line" (in fact, any line therefore), UNLESS it was this or that filename before the matched end-of-line.
If you'll prefer the latter solution, please leave the comment in .rubocop.yml, explaining it.

@zverok

zverok Jul 8, 2016

Unfortunately, as I checked it, there seems to be no easy way to "include only this and that file". However, there are two solution:

  1. Readable yet non-DRY: just list ALL the files in exclude (except your two ones)
  2. Smart yet not very readable: use in Exclude regexp with negative lookbehind, like this:
Exclude:
  - !ruby/regexp /(?<!formula\.rb|regression\.rb)$/

It reads like "matches end-of-line" (in fact, any line therefore), UNLESS it was this or that filename before the matched end-of-line.
If you'll prefer the latter solution, please leave the comment in .rubocop.yml, explaining it.

This comment has been minimized.

@lokeshh

lokeshh Jul 8, 2016

Thanks a lot for this 👍

@lokeshh

lokeshh Jul 8, 2016

Thanks a lot for this 👍

This comment has been minimized.

@lokeshh

lokeshh Jul 10, 2016

I ended up following the first solution.

@lokeshh

lokeshh Jul 10, 2016

I ended up following the first solution.

Show outdated Hide outdated spec/formula_spec.rb
it { is_expected.to be_a Array }
its(:first) { is_expected.to eq Statsample::GLM::Token }
its(:to_basic_formula) { is_expected.to eq 'y ~ 1+a(-)+b(-)' }

This comment has been minimized.

@zverok

zverok Jul 7, 2016

Considering line 9, do you plan to add #to_basic_formula to Ruby's core Array class, or to descend from it?..

@zverok

zverok Jul 7, 2016

Considering line 9, do you plan to add #to_basic_formula to Ruby's core Array class, or to descend from it?..

This comment has been minimized.

@lokeshh

lokeshh Jul 7, 2016

Oh! I didn't see that coming. While writing specs for parse_formula I thought it will be little hard to check each and every token, so I thought about converting all tokens to formula and verify the result with ease. That's why I created to_basic_formula but I think I've to resort to something else since it won't be a good idea to create a class just for to_basic_formula to work.

@lokeshh

lokeshh Jul 7, 2016

Oh! I didn't see that coming. While writing specs for parse_formula I thought it will be little hard to check each and every token, so I thought about converting all tokens to formula and verify the result with ease. That's why I created to_basic_formula but I think I've to resort to something else since it won't be a good idea to create a class just for to_basic_formula to work.

This comment has been minimized.

@zverok

zverok Jul 7, 2016

I am not sure (because don't want to interrupt at early stages of design), but "intuitively" for me the next convention could possibly be:

  • Formula.new('y ~ ...') creates already parsed formula;
  • and the class Formula has methods like #tokens, #to_basic_formula and so on.

But that's from "bird view" look, I can miss everyting about it :)

@zverok

zverok Jul 7, 2016

I am not sure (because don't want to interrupt at early stages of design), but "intuitively" for me the next convention could possibly be:

  • Formula.new('y ~ ...') creates already parsed formula;
  • and the class Formula has methods like #tokens, #to_basic_formula and so on.

But that's from "bird view" look, I can miss everyting about it :)

This comment has been minimized.

@lokeshh

lokeshh Jul 8, 2016

@zverok We might need to relate the parsed formula terms with input terms, so we need to store the input terms. I have included an option form in #parse_formula so when its form=:string it works like #to_basic_formula.

@lokeshh

lokeshh Jul 8, 2016

@zverok We might need to relate the parsed formula terms with input terms, so we need to store the input terms. I have included an option form in #parse_formula so when its form=:string it works like #to_basic_formula.

This comment has been minimized.

@zverok

zverok Jul 8, 2016

Hmm. It becames a bit mangled here (at least for me). Fix me if I'm wrong:

  • Formula#initialize, in fact, parses the formula (into left part and tokens of right part);
  • Formula#parse_formula, in fact, converts formula to some "canonical" set of tokens (and immediately returns them)
  • For testing, we want string representation of this set, so #parse_formula has an option "return string representation of canonical set of tokens";
  • #parse_formula will perform "canonicalization" on each call, yet result will always be the same.

Is it accurate description?..

If it is, what do you think of this kind of design:

  • Formula#initialize parses formula (storing original flow of tokens in @tokens) and canonicalizes it (storing tokens in @canonical_tokens);
  • Formula#canonical_to_s renders canonical formula string
  • Wherever you need this canonical tokens, you use formula, not "naked" array. For ex., here could be written something like
@formula.process_dataframe(@df)

...or something like this.

Better incapsulation, it seems.

NB: I use the word "canonical"/"canonicalize", though I'm absolutely not sure it is good term for what's going on in this formula processing :)

@zverok

zverok Jul 8, 2016

Hmm. It becames a bit mangled here (at least for me). Fix me if I'm wrong:

  • Formula#initialize, in fact, parses the formula (into left part and tokens of right part);
  • Formula#parse_formula, in fact, converts formula to some "canonical" set of tokens (and immediately returns them)
  • For testing, we want string representation of this set, so #parse_formula has an option "return string representation of canonical set of tokens";
  • #parse_formula will perform "canonicalization" on each call, yet result will always be the same.

Is it accurate description?..

If it is, what do you think of this kind of design:

  • Formula#initialize parses formula (storing original flow of tokens in @tokens) and canonicalizes it (storing tokens in @canonical_tokens);
  • Formula#canonical_to_s renders canonical formula string
  • Wherever you need this canonical tokens, you use formula, not "naked" array. For ex., here could be written something like
@formula.process_dataframe(@df)

...or something like this.

Better incapsulation, it seems.

NB: I use the word "canonical"/"canonicalize", though I'm absolutely not sure it is good term for what's going on in this formula processing :)

This comment has been minimized.

@lokeshh

lokeshh Jul 10, 2016

Yes the description you specified is accurate. The design looks great for we can do things like @formula.process_dataframe(@df) as you mentioned and parse_formula returns the same thing every time.

I'm adopting this design but before I want to let you know one thing. I also need to give Formula class a dataframe at the moment of parsing because I need to know whether a term is numerical or category while parsing. It looks a bit weird for Formula to accept a dataframe as well at the time of initialization. Although if I stick to parse_formula then I can pass the dataframe as an argument. What do you say?

@lokeshh

lokeshh Jul 10, 2016

Yes the description you specified is accurate. The design looks great for we can do things like @formula.process_dataframe(@df) as you mentioned and parse_formula returns the same thing every time.

I'm adopting this design but before I want to let you know one thing. I also need to give Formula class a dataframe at the moment of parsing because I need to know whether a term is numerical or category while parsing. It looks a bit weird for Formula to accept a dataframe as well at the time of initialization. Although if I stick to parse_formula then I can pass the dataframe as an argument. What do you say?

This comment has been minimized.

@zverok

zverok Jul 12, 2016

(Sorry for delay, was pretty busy)

I'm adopting this design but before I want to let you know one thing. I also need to give Formula class a dataframe at the moment of parsing because I need to know whether a term is numerical or category while parsing.

Is it changing Formula behavior? How? Can't this behavior change be postponed until process_dataframe?
Or, another interesting question: could the same formula be applied to different dataframes (one having a as categorical and another having it numerical)? If it is, than "numericality" of variables should not be in Formula's state. If kind of variables is important on creation, than maybe something like this could possibly work:

f1 = Formula.new('y~a+b', a: :numeric, b: :categorical)
f2 = Formula.from_df('y~a+b', df) # checks whether there are corresponding vectors + their type

I am afraid I'm not too familiar with this case, so many of my ideas are coming from generic "common reason", I'll need more details to guess the most sane architecture.

@zverok

zverok Jul 12, 2016

(Sorry for delay, was pretty busy)

I'm adopting this design but before I want to let you know one thing. I also need to give Formula class a dataframe at the moment of parsing because I need to know whether a term is numerical or category while parsing.

Is it changing Formula behavior? How? Can't this behavior change be postponed until process_dataframe?
Or, another interesting question: could the same formula be applied to different dataframes (one having a as categorical and another having it numerical)? If it is, than "numericality" of variables should not be in Formula's state. If kind of variables is important on creation, than maybe something like this could possibly work:

f1 = Formula.new('y~a+b', a: :numeric, b: :categorical)
f2 = Formula.from_df('y~a+b', df) # checks whether there are corresponding vectors + their type

I am afraid I'm not too familiar with this case, so many of my ideas are coming from generic "common reason", I'll need more details to guess the most sane architecture.

This comment has been minimized.

@lokeshh

lokeshh Jul 12, 2016

@zverok Ok. Let me make it fully working first, then you can see all the details and then we can come up with a better design.

@lokeshh

lokeshh Jul 12, 2016

@zverok Ok. Let me make it fully working first, then you can see all the details and then we can come up with a better design.

This comment has been minimized.

@zverok

zverok Jul 12, 2016

👍 Pretty reasonable plan.

@zverok

zverok Jul 12, 2016

👍 Pretty reasonable plan.

This comment has been minimized.

@lokeshh

lokeshh Jul 30, 2016

The Formula Language implementation is complete. Could you please have a look now?

@lokeshh

lokeshh Jul 30, 2016

The Formula Language implementation is complete. Could you please have a look now?

Show outdated Hide outdated spec/formula_spec.rb
subject { formula.parse_formula }
it { is_expected.to be_a Array }
its(:first) { is_expected.to eq Statsample::GLM::Token }

This comment has been minimized.

@zverok

zverok Jul 7, 2016

BTW, if all elements are expected to be tokens (I'm not sure about it), you can write something like:

it { is_expected.to all be_a Statsample::GLM::Token }
@zverok

zverok Jul 7, 2016

BTW, if all elements are expected to be tokens (I'm not sure about it), you can write something like:

it { is_expected.to all be_a Statsample::GLM::Token }

This comment has been minimized.

@lokeshh

lokeshh Jul 7, 2016

Yes, all elements are expected to be tokens. Thanks

@lokeshh

lokeshh Jul 7, 2016

Yes, all elements are expected to be tokens. Thanks

Show outdated Hide outdated spec/formula_spec.rb
it { is_expected.to be_a Array }
its(:first) { is_expected.to eq Statsample::GLM::Token }
its(:to_basic_formula) { is_expected.to eq 'y ~ 1+a(-)+b(-)+a(-):b(-)' }
end

This comment has been minimized.

@zverok

zverok Jul 7, 2016

I understand it is a pretty early stage, but those contexts are looking suspiciously similar :) Maybe, some shared context will play? If I'm not missing something, it could even be simplified up to this level or something like this:

context 'both reoccur' do
  include_context 'formulae checker', 'y ~ a+b+a:b', 'y ~ 1+a(-)+b(-)+a(-):b(-)'
  # or even
  include_context 'formulae checker', 'y ~ a+b+a:b' => 'y ~ 1+a(-)+b(-)+a(-):b(-)'
end
@zverok

zverok Jul 7, 2016

I understand it is a pretty early stage, but those contexts are looking suspiciously similar :) Maybe, some shared context will play? If I'm not missing something, it could even be simplified up to this level or something like this:

context 'both reoccur' do
  include_context 'formulae checker', 'y ~ a+b+a:b', 'y ~ 1+a(-)+b(-)+a(-):b(-)'
  # or even
  include_context 'formulae checker', 'y ~ a+b+a:b' => 'y ~ 1+a(-)+b(-)+a(-):b(-)'
end

This comment has been minimized.

@lokeshh

lokeshh Jul 7, 2016

@zverok Yes they are all similar, just the formula value changes. Thanks for this!

@lokeshh

lokeshh Jul 7, 2016

@zverok Yes they are all similar, just the formula value changes. Thanks for this!

Show outdated Hide outdated spec/formula_checker.rb
@@ -0,0 +1,7 @@
RSpec.shared_context "formula checker", a => b do
let(:model) { described_class.new df, a, :logistic }

This comment has been minimized.

@lokeshh

lokeshh Jul 10, 2016

@zverok How do I make include_context work? I do not know how to accept arugments and couldn't find easily it the internet.

I'm using it via include_context "formula checker", 'y~a:e' => %w[a e_B e_C].sort but its saying a is not defined in formula_checker.rb.

@lokeshh

lokeshh Jul 10, 2016

@zverok How do I make include_context work? I do not know how to accept arugments and couldn't find easily it the internet.

I'm using it via include_context "formula checker", 'y~a:e' => %w[a e_B e_C].sort but its saying a is not defined in formula_checker.rb.

This comment has been minimized.

@zverok

zverok Jul 11, 2016

RSpec.shared_context "formula checker" do |params|
  let(:formula) { params.keys.first }
  let(:vectors) { params.values.first }

  let(:model) { described_class.new df, formula, :logistic }
  subject { model.df_for_regression }

  it { is_expected.to be_a Daru::DataFrame }
  its(:'vectors.to_a.sort') { is_expected.to eq vectors.sort } 
end
@zverok

zverok Jul 11, 2016

RSpec.shared_context "formula checker" do |params|
  let(:formula) { params.keys.first }
  let(:vectors) { params.values.first }

  let(:model) { described_class.new df, formula, :logistic }
  subject { model.df_for_regression }

  it { is_expected.to be_a Daru::DataFrame }
  its(:'vectors.to_a.sort') { is_expected.to eq vectors.sort } 
end
Show outdated Hide outdated lib/statsample-glm/glm/formula.rb
@@ -10,7 +10,7 @@ def initialize(formula)
@y, *@tokens = split_to_tokens(formula)
@y = @y.value
@tokens = @tokens.uniq.sort
add_contant_term_if_required
manage_contant_term

This comment has been minimized.

@v0dro

v0dro Jul 10, 2016

Member

There's a typo. It's constant.

@v0dro

v0dro Jul 10, 2016

Member

There's a typo. It's constant.

This comment has been minimized.

@lokeshh

lokeshh Jul 10, 2016

Thanks solved.

@lokeshh

lokeshh Jul 10, 2016

Thanks solved.

Show outdated Hide outdated spec/formula_spec.rb
@@ -54,5 +54,28 @@
it { is_expected.to eq '1+a(-)+a:b(-)+b:d(-)' }
end
context 'contant management' do

This comment has been minimized.

@v0dro

v0dro Jul 11, 2016

Member

typo. Constant.

@v0dro

v0dro Jul 11, 2016

Member

typo. Constant.

Show outdated Hide outdated lib/statsample-glm/glm/formula.rb
@tokens.unshift Token.new('1') unless
@tokens.include?(Token.new('1')) ||
@tokens.include?(Token.new('0'))
@tokens.delete Token.new('0')

This comment has been minimized.

@zverok

zverok Jul 11, 2016

I'd say that having Token() method could make such kind of code cleaner:

@tokens.unshift Token('1') unless @tokens.include?(Token('1')) || @tokens.include?(Token('0'))
@tokens.delete Token('0')

(This approach also allows some optimizations, like Token('1') always returning the same, constant, object, instead of creating tons of same objects on each and every check "if something is Token.new('1')")
WDYT?

@zverok

zverok Jul 11, 2016

I'd say that having Token() method could make such kind of code cleaner:

@tokens.unshift Token('1') unless @tokens.include?(Token('1')) || @tokens.include?(Token('0'))
@tokens.delete Token('0')

(This approach also allows some optimizations, like Token('1') always returning the same, constant, object, instead of creating tons of same objects on each and every check "if something is Token.new('1')")
WDYT?

This comment has been minimized.

@lokeshh

lokeshh Jul 13, 2016

I think its a great idea but not sure how to implement it. Should Token be a class method of Statsample::GLM?

@lokeshh

lokeshh Jul 13, 2016

I think its a great idea but not sure how to implement it. Should Token be a class method of Statsample::GLM?

This comment has been minimized.

@zverok

zverok Jul 13, 2016

Yes, I think you can try it this way.

@zverok

zverok Jul 13, 2016

Yes, I think you can try it this way.

This comment has been minimized.

@lokeshh

lokeshh Jul 13, 2016

Can we use this way to sort the tokens with Token.new '1' involved? Currently I do @tokens.sort to sort the tokens so that Token.new '1' appears in the front. I don't think it will be possible to do the same using Token method or would it?

@lokeshh

lokeshh Jul 13, 2016

Can we use this way to sort the tokens with Token.new '1' involved? Currently I do @tokens.sort to sort the tokens so that Token.new '1' appears in the front. I don't think it will be possible to do the same using Token method or would it?

This comment has been minimized.

@zverok

zverok Jul 14, 2016

I'm not sure I understand your question. Could you please elaborate/show the code?..

@zverok

zverok Jul 14, 2016

I'm not sure I understand your question. Could you please elaborate/show the code?..

This comment has been minimized.

@lokeshh

lokeshh Jul 19, 2016

Thanks @zverok

I implemented it in this manner:

      TOKEN_1 = Token.new('1')
      def self.Token val
        return TOKEN_1 if val == '1'
        Token.new(val)
      end

But it gives, this error:

/home/ubuntu/workspace/statsample-glm/lib/statsample-glm/glm/formula.rb:59:in `<class:FormulaWrapper>': uninitialized constant Statsample::GLM::FormulaWrapper::Token (NameError)

Then I changed the first line to:

      TOKEN_1 = Statsample::GLM::Token.new('1')

but now the error is:

/home/ubuntu/workspace/statsample-glm/lib/statsample-glm/glm/formula.rb:59:in `<class:FormulaWrapper>': uninitialized constant Statsample::GLM::Token (NameError)
@lokeshh

lokeshh Jul 19, 2016

Thanks @zverok

I implemented it in this manner:

      TOKEN_1 = Token.new('1')
      def self.Token val
        return TOKEN_1 if val == '1'
        Token.new(val)
      end

But it gives, this error:

/home/ubuntu/workspace/statsample-glm/lib/statsample-glm/glm/formula.rb:59:in `<class:FormulaWrapper>': uninitialized constant Statsample::GLM::FormulaWrapper::Token (NameError)

Then I changed the first line to:

      TOKEN_1 = Statsample::GLM::Token.new('1')

but now the error is:

/home/ubuntu/workspace/statsample-glm/lib/statsample-glm/glm/formula.rb:59:in `<class:FormulaWrapper>': uninitialized constant Statsample::GLM::Token (NameError)

This comment has been minimized.

@zverok

zverok Jul 19, 2016

As far as I can see, that's because Token class is defined in the same file where you are trying to define your constant, but it is defined after constant definition. So, when Ruby evaluates TOKEN_1 = Token.new('1'), it knows absolutely nothing about Token class existance. You can solve this by moving entire Token class above FormulaWrapper.

@zverok

zverok Jul 19, 2016

As far as I can see, that's because Token class is defined in the same file where you are trying to define your constant, but it is defined after constant definition. So, when Ruby evaluates TOKEN_1 = Token.new('1'), it knows absolutely nothing about Token class existance. You can solve this by moving entire Token class above FormulaWrapper.

This comment has been minimized.

@lokeshh

lokeshh Jul 20, 2016

I decided to split FormulaWrapper and Token classes into different files because FormulaWrapper is going to get big after including support for symbols '*' and '/'. So, is there a way to make it work in this scenario?

@lokeshh

lokeshh Jul 20, 2016

I decided to split FormulaWrapper and Token classes into different files because FormulaWrapper is going to get big after including support for symbols '*' and '/'. So, is there a way to make it work in this scenario?

This comment has been minimized.

@zverok

zverok Jul 21, 2016

Just require 'token' in formula_wrapper source file, and everything should be fine.

@zverok

zverok Jul 21, 2016

Just require 'token' in formula_wrapper source file, and everything should be fine.

This comment has been minimized.

@lokeshh

lokeshh Jul 21, 2016

Haha... Yeah!

@lokeshh

lokeshh Jul 21, 2016

Haha... Yeah!

@lokeshh

This comment has been minimized.

Show comment
Hide comment
@lokeshh

lokeshh Jul 21, 2016

One question: Is it worthwhile to document classes and methods that won't be directly used by the user.

For example should I document FormulaWrapper's public methods? Although they won't be used by users directly but can be beneficial to a developer.

lokeshh commented Jul 21, 2016

One question: Is it worthwhile to document classes and methods that won't be directly used by the user.

For example should I document FormulaWrapper's public methods? Although they won't be used by users directly but can be beneficial to a developer.

@v0dro

This comment has been minimized.

Show comment
Hide comment
@v0dro

v0dro Jul 21, 2016

Member

@lokeshh one or two lines of documentation and a very brief overview of the arguments and their expected types should be enough.

Member

v0dro commented Jul 21, 2016

@lokeshh one or two lines of documentation and a very brief overview of the arguments and their expected types should be enough.

@zverok

This comment has been minimized.

Show comment
Hide comment
@zverok

zverok Jul 22, 2016

For example should I document FormulaWrapper's public methods? Although they won't be used by users directly but can be beneficial to a developer.

I think that at least at first, you should be guided by "what documentation is generated for the gem" idea. So, all classes which will not be directly used by librarie's user should be marked @private and not go into generated docs. Whether to describe them for other developers is your choice -- I myself typically add some very brief docs for really non-trivial things.

zverok commented Jul 22, 2016

For example should I document FormulaWrapper's public methods? Although they won't be used by users directly but can be beneficial to a developer.

I think that at least at first, you should be guided by "what documentation is generated for the gem" idea. So, all classes which will not be directly used by librarie's user should be marked @private and not go into generated docs. Whether to describe them for other developers is your choice -- I myself typically add some very brief docs for really non-trivial things.

@lokeshh

This comment has been minimized.

Show comment
Hide comment
@lokeshh

lokeshh Jul 30, 2016

@zverok New Rubocop guidelines are quite strict. For example the max length allowed of a method is 10. Should I consider changing them or adhere to them?

lokeshh commented Jul 30, 2016

@zverok New Rubocop guidelines are quite strict. For example the max length allowed of a method is 10. Should I consider changing them or adhere to them?

@v0dro

This comment has been minimized.

Show comment
Hide comment
@v0dro

v0dro Jul 31, 2016

Member

@zverok New Rubocop guidelines are quite strict. For example the max length allowed of a method is 10. Should I consider changing them or adhere to them?

10 is ridiculously small. It cannot be applicable to a scientific library. Is it possible to ignore the method length constraint altogether? Please do so if yes.

Member

v0dro commented Jul 31, 2016

@zverok New Rubocop guidelines are quite strict. For example the max length allowed of a method is 10. Should I consider changing them or adhere to them?

10 is ridiculously small. It cannot be applicable to a scientific library. Is it possible to ignore the method length constraint altogether? Please do so if yes.

@zverok

This comment has been minimized.

Show comment
Hide comment
@zverok

zverok Jul 31, 2016

@zverok New Rubocop guidelines are quite strict. For example the max length allowed of a method is 10. Should I consider changing them or adhere to them?

Sorry, missed this one. Of course, we should stay sane at the first place, not just follow Rubocop's style blindly (as far as I know, even Rubocop's author/maintainer doesn't do this ;))

10-line methods are ok for some business logic, where everything is delegated down by stack, but of course it is too small for scientific libraries and complex algorithms (in fact, we COULD extract from DataFrame like ~30-50 small helper objects and end up with "readable code" like @vector_names_monitor.process(something, something), but I doubt it would be any good ;))

zverok commented Jul 31, 2016

@zverok New Rubocop guidelines are quite strict. For example the max length allowed of a method is 10. Should I consider changing them or adhere to them?

Sorry, missed this one. Of course, we should stay sane at the first place, not just follow Rubocop's style blindly (as far as I know, even Rubocop's author/maintainer doesn't do this ;))

10-line methods are ok for some business logic, where everything is delegated down by stack, but of course it is too small for scientific libraries and complex algorithms (in fact, we COULD extract from DataFrame like ~30-50 small helper objects and end up with "readable code" like @vector_names_monitor.process(something, something), but I doubt it would be any good ;))

@lokeshh

This comment has been minimized.

Show comment
Hide comment
@lokeshh

lokeshh Aug 2, 2016

I've updated the restrictions following the ones mentioned in Daru.

lokeshh commented Aug 2, 2016

I've updated the restrictions following the ones mentioned in Daru.

@v0dro

This comment has been minimized.

Show comment
Hide comment
@v0dro

v0dro Aug 3, 2016

Member

If this is done, then please resolve conflicts and remove the WIP.

Member

v0dro commented Aug 3, 2016

If this is done, then please resolve conflicts and remove the WIP.

@lokeshh

This comment has been minimized.

Show comment
Hide comment
@lokeshh

lokeshh Aug 3, 2016

This can't be merged unless a Daru with categorical data is released.

lokeshh commented Aug 3, 2016

This can't be merged unless a Daru with categorical data is released.

@v0dro

This comment has been minimized.

Show comment
Hide comment
@v0dro

v0dro Aug 4, 2016

Member

Alright. Let's finish off the missing data PR and then get back to this.

Member

v0dro commented Aug 4, 2016

Alright. Let's finish off the missing data PR and then get back to this.

@lokeshh

This comment has been minimized.

Show comment
Hide comment
@lokeshh

lokeshh Aug 19, 2016

Great! The build is passing. This can be merged.

lokeshh commented Aug 19, 2016

Great! The build is passing. This can be merged.

@v0dro v0dro changed the title from [WIP] Regression with categorical data and introducing formula language to Regression with categorical data and introducing formula language Aug 19, 2016

@v0dro v0dro merged commit 39e13d6 into SciRuby:master Aug 19, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment