-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame summary without ReportBuilder fixes #221 #288
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a look at my suggestions. Also, make the sure the build passes.
ReportBuilder.new(no_title: true).add(self).send(method) | ||
#uncomment this line to use ReportBuilder gem to generate data frame summary | ||
#ReportBuilder.new(no_title: true).add(self).send(method) | ||
#comment below line when using Reportbuilder gem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for these comments or even then g_summary
method. ReportBuilder should go.
@@ -1419,6 +1422,29 @@ def report_building(b) # :nodoc: # | |||
end | |||
end | |||
end | |||
#custom function for dataframe summary | |||
def g_summary() | |||
frame_summary = "= \t\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df_summary
is better. Why the \t\n
at the end? Have you seen how exactly ReportBuilder prepares reports?
def g_summary() | ||
frame_summary = "= \t\n" | ||
frame_summary << "\tNumber of rows: #{nrows}\n" | ||
@raw = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use self
directly? Using instance variables like this is not acceptable since it stays for absolutely no reason in the object and consumes resources. IVs should not be initialized unless you absolutely must.
|
||
frame_summary << "\t n : #{count}\n" | ||
frame_summary << "\t n valid : #{valid_count}\n" | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should go through the code of the report builder gem and see ALL the data that it outputs. It exists for more reasons than just telling n
and n_valid
.
The patch you submit must emulate that output as much as possible. Try running reportbuilder through different examples and see their respective outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do all the changes as you mentioned in the suggestions and will go through the reportbuilder gem code and try to emulate all of its features.
Can I use text-table gem to create table for distribution in vector. |
def summary(method=:to_text) | ||
ReportBuilder.new(no_title: true).add(self).send(method) | ||
summary_building() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of creating separate method, just leaving summary
empty caller for this method? Please move that method's body here directly.
frequencies.sort_by(&:to_s).each do |k,v| | ||
key = @index.include?(k) ? @index[k] : k | ||
t.row [key, v, ('%0.2f%%' % (v.quo(count_values(*Daru::MISSING_VALUES))*100))] | ||
def summary_building level=0 # rubocop:disable Metrics/AbcSize,Metrics/MethodLength |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rubocop's metrics exist for reason. You can't just disable them in new code, you need to think how you can pass the cops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it was already disabled in the previous code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, yes :) But daru was evolving quickly, and now we try to make the code quality better on each PR, without compromises!
def summary_building level=0 # rubocop:disable Metrics/AbcSize,Metrics/MethodLength | ||
|
||
v_summary = "" | ||
v_summary << " "*2*level+"n :#{size}\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This " "*2*level
repeating on each line is pretty ugly. Could it be moved to variable?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay i will move that to variable.
v_summary = "" | ||
v_summary << " "*2*level+"n :#{size}\n" | ||
valid_count = 0 | ||
self.to_a.each do |val_n| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.
is unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and we never use each
to count something, use Array#count
.
...and, finally, I believe vector already has method "number of valid values"
v_summary << " "*2*level+"n :#{size}\n" | ||
valid_count = 0 | ||
self.to_a.each do |val_n| | ||
if (val_n!=nil && !(val_n.is_a?(Float) && val_n.nan?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
()
around condition is unnecessary- we never do
!= nil
in Ruby; you can doif !val.nil?
, or, in the most cases,if val && ....
- ...or even hide it in helper method
if valid_value?(val)
|
||
s.text "median: #{median}" if @type==:numeric || @type==:numeric | ||
end | ||
if @type==:numeric || @type==:numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
elsif
- why the
||
of same condition?
end | ||
if @type==:numeric || @type==:numeric | ||
v_summary << " "*2*level+"median: #{median}\n" | ||
end | ||
if @type==:numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the if
of same condition as above?
Well, looking closer, I see most of my notes are related to pre-existing code... But still, the library becomes mature now, and so we are struggling for better code quality with each commit. Report building was one of the parts sitting here since ancient times, and the idea of rewriting it is not only drop ReportBuilder, but also make the code shiny and readable. |
I will try to implement as much as good as I can make.I will try to improve on my next commits. And I will make a new pull request with new branch when changes are final so that these confusing commits doesn't show up on commit history. |
ohh okay i will make the changes. Are vector_summary and vector_indent better ? |
Well, we are already inside |
s.text "factors: #{factors.to_a.join(',')}" | ||
s.text "mode: #{mode}" | ||
|
||
s.table(name: 'Distribution') do |t| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To implement table can i use text-table gem ,any suggestion for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the idea of "drop reportbuilder" is to make the list of dependencies smaller. Will text-table make the code significantly simpler? And if so, is it the better "table formatting" gem selection? It was not updated for two years.
Closing this. Send a new PR with fixes. |
#221 fixes