Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame summary without ReportBuilder fixes #221 #288

Closed
wants to merge 8 commits into from

Conversation

krishnak9
Copy link

@krishnak9 krishnak9 commented Dec 25, 2016

#221 fixes

Copy link
Member

@v0dro v0dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at my suggestions. Also, make the sure the build passes.

ReportBuilder.new(no_title: true).add(self).send(method)
#uncomment this line to use ReportBuilder gem to generate data frame summary
#ReportBuilder.new(no_title: true).add(self).send(method)
#comment below line when using Reportbuilder gem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for these comments or even then g_summary method. ReportBuilder should go.

@@ -1419,6 +1422,29 @@ def report_building(b) # :nodoc: #
end
end
end
#custom function for dataframe summary
def g_summary()
frame_summary = "= \t\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df_summary is better. Why the \t\n at the end? Have you seen how exactly ReportBuilder prepares reports?

def g_summary()
frame_summary = "= \t\n"
frame_summary << "\tNumber of rows: #{nrows}\n"
@raw = self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use self directly? Using instance variables like this is not acceptable since it stays for absolutely no reason in the object and consumes resources. IVs should not be initialized unless you absolutely must.


frame_summary << "\t n : #{count}\n"
frame_summary << "\t n valid : #{valid_count}\n"
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should go through the code of the report builder gem and see ALL the data that it outputs. It exists for more reasons than just telling n and n_valid.

The patch you submit must emulate that output as much as possible. Try running reportbuilder through different examples and see their respective outputs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do all the changes as you mentioned in the suggestions and will go through the reportbuilder gem code and try to emulate all of its features.

@krishnak9
Copy link
Author

Can I use text-table gem to create table for distribution in vector.

def summary(method=:to_text)
ReportBuilder.new(no_title: true).add(self).send(method)
summary_building()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of creating separate method, just leaving summary empty caller for this method? Please move that method's body here directly.

frequencies.sort_by(&:to_s).each do |k,v|
key = @index.include?(k) ? @index[k] : k
t.row [key, v, ('%0.2f%%' % (v.quo(count_values(*Daru::MISSING_VALUES))*100))]
def summary_building level=0 # rubocop:disable Metrics/AbcSize,Metrics/MethodLength
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rubocop's metrics exist for reason. You can't just disable them in new code, you need to think how you can pass the cops.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it was already disabled in the previous code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes :) But daru was evolving quickly, and now we try to make the code quality better on each PR, without compromises!

def summary_building level=0 # rubocop:disable Metrics/AbcSize,Metrics/MethodLength

v_summary = ""
v_summary << " "*2*level+"n :#{size}\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This " "*2*level repeating on each line is pretty ugly. Could it be moved to variable?..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay i will move that to variable.

v_summary = ""
v_summary << " "*2*level+"n :#{size}\n"
valid_count = 0
self.to_a.each do |val_n|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self. is unnecessary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and we never use each to count something, use Array#count.
...and, finally, I believe vector already has method "number of valid values"

v_summary << " "*2*level+"n :#{size}\n"
valid_count = 0
self.to_a.each do |val_n|
if (val_n!=nil && !(val_n.is_a?(Float) && val_n.nan?))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • () around condition is unnecessary
  • we never do != nil in Ruby; you can do if !val.nil?, or, in the most cases, if val && ....
  • ...or even hide it in helper method if valid_value?(val)


s.text "median: #{median}" if @type==:numeric || @type==:numeric
end
if @type==:numeric || @type==:numeric
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. elsif
  2. why the || of same condition?

end
if @type==:numeric || @type==:numeric
v_summary << " "*2*level+"median: #{median}\n"
end
if @type==:numeric
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the if of same condition as above?

@zverok
Copy link
Collaborator

zverok commented Jan 9, 2017

Well, looking closer, I see most of my notes are related to pre-existing code... But still, the library becomes mature now, and so we are struggling for better code quality with each commit. Report building was one of the parts sitting here since ancient times, and the idea of rewriting it is not only drop ReportBuilder, but also make the code shiny and readable.

@krishnak9
Copy link
Author

I will try to implement as much as good as I can make.I will try to improve on my next commits. And I will make a new pull request with new branch when changes are final so that these confusing commits doesn't show up on commit history.

@krishnak9
Copy link
Author

ohh okay i will make the changes. Are vector_summary and vector_indent better ?

@zverok
Copy link
Collaborator

zverok commented Jan 9, 2017

Well, we are already inside Vectors method, why we need any prefix at all?..

s.text "factors: #{factors.to_a.join(',')}"
s.text "mode: #{mode}"

s.table(name: 'Distribution') do |t|
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To implement table can i use text-table gem ,any suggestion for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the idea of "drop reportbuilder" is to make the list of dependencies smaller. Will text-table make the code significantly simpler? And if so, is it the better "table formatting" gem selection? It was not updated for two years.

@v0dro
Copy link
Member

v0dro commented Jan 28, 2017

Closing this. Send a new PR with fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants