New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add importer for log files #75

Merged
merged 12 commits into from May 30, 2018

Conversation

Projects
None yet
3 participants
@rohitner
Contributor

rohitner commented May 10, 2018

This PR integrates the gem request-log-analyzer to convert rails logs to Daru::DataFrame.

@zverok

Adding to what said in comments, changes without tests are unverifiable :)

@@ -35,4 +35,6 @@ Gem::Specification.new do |spec|
spec.add_development_dependency 'yard'
spec.add_development_dependency 'guard-rspec' if RUBY_VERSION >= '2.2.5'
spec.add_runtime_dependency 'request-log-analyzer', '~> 1.13.4'

This comment has been minimized.

@zverok

zverok May 10, 2018

Collaborator

Note that no other file format gem is required unconditionally: daru-io relies on user installing the proper gems. Otherwise, anybody who wants daru-io just for CSV import, will install TON of other gems for pdf, excel and whatnot. For example, look here: https://github.com/SciRuby/daru-io/blob/master/lib/daru/io/importers/excelx.rb#L12 — we have optional_gem construct for requiring something ONLY when it is used.

module IO
module Importers
class Rails < Base
Daru::DataFrame.register_io_module :read_rails, self

This comment has been minimized.

@zverok

zverok May 10, 2018

Collaborator

read_rails_log is better name, read_rails is too vague.

def initialize
require 'request_log_analyzer'
RequestLogAnalyzer::Source::LogParser.class_eval do
def initialize(format, options = {})

This comment has been minimized.

@zverok

zverok May 10, 2018

Collaborator

Better extract it all into importers/rails_logs/request_log_analyzer_patch.rb, create there module RailsLogAnalyzerPatch, and here just include that into RailsLogAnalyzer.

This comment has been minimized.

@rohitner

rohitner May 10, 2018

Contributor

The current initialize,parse and read methods are going to be the same for all the importers that will be added. Shall I instead create module RequestLogAnalyzerPatch in importers/request_log_analyzer.rb? Then I will pass :rails3, :apache and :amazon_s3 to the module in rails.rb, apache.rb and amazon_s3.rb.

end
end
end
end

This comment has been minimized.

@zverok

zverok May 10, 2018

Collaborator

In fact, here you are copy-pasting and slightly changing code from other gem. It is totally unreliable: if the gem changes a slightest bit, you'll never follow this with your code. Why do you need this rewriting?

This comment has been minimized.

@rohitner

rohitner May 10, 2018

Contributor

I will try to come up with a single method instead of changing the code.

@rohitner rohitner force-pushed the rohitner:rails_logs branch from 0b16aff to ef50d50 May 11, 2018

@rohitner

This comment has been minimized.

Contributor

rohitner commented May 12, 2018

Please review the changes so that I can start the documentation and fix rubocop tests.

@zverok

Also, please don't ignore Rubocop's suggestions.

@@ -56,6 +56,47 @@ def self.from(relation)
def self.read(path)
new.read(path)
end
def self.parse_log(file,format)

This comment has been minimized.

@zverok

zverok May 12, 2018

Collaborator

Why do you think it is appropriate to add method parse_log, related to a very specific task, to Base importer?.. All code related to some importer should be totally local to this importer: either in importers/rails_log.rb, or in importers/rails_log/request_log_analyzer_patch.rb (the latter is preferable, as well as extracting patch to a module, instead of doing class_eval)

This comment has been minimized.

@rohitner

rohitner May 12, 2018

Contributor

parse_log is a helper method as apache and amazon_s3 importers will also use it. Making this part local will later require adding the same code for the remaining two importers. I feel the method is not that specific to be ineligible for the base class.

This comment has been minimized.

@zverok

zverok May 12, 2018

Collaborator

I feel the method is not that specific to be ineligible for the base class.

In fact, it is. The base class provides the common base all importers need, not "some 3 of 20 importers may need that method, so we'll just drop it here". It requires specific external gem. If several different importers will need it, the module with this "patch" could be stored in, like importers/shared/request_log_analyzer_patch.rb, and required/included where it is necessary.

def call()
ind = [:method, :path, :ip, :timestamp, :line_type, :lineno, :source,
:controller, :action, :format, :params, :rendered_file,
:partial_duration, :status, :duration, :view, :db]

This comment has been minimized.

@zverok

zverok May 12, 2018

Collaborator

That definitely should be a constant.

row = []
ind.each do |attr|
row << hash[attr]
end

This comment has been minimized.

@zverok

zverok May 12, 2018

Collaborator

That should be map, at least.
But if I am reading it correctly, Hash#values_at should be totally enough.

# initializes the instance variables @path and @file_data to nil
def initialize
@path = nil
@file_data = nil

This comment has been minimized.

@zverok

zverok May 15, 2018

Collaborator

Why do you need this vars here?

include RequestLogAnalyzerPatch
Daru::DataFrame.register_io_module :read_rails_log, self
# initializes the instance variables @path and @file_data to nil

This comment has been minimized.

@zverok

zverok May 15, 2018

Collaborator

Comments should not just retell the code, code should speak for itself.
For such simple methods, comments aren't necessary at all.

# @example Reading from plaintext file
# instance = Daru::IO::Importers::RailsLog.read("rails_test.log")
def read(path)
@path = path

This comment has been minimized.

@zverok

zverok May 15, 2018

Collaborator

Why?

# instance = Daru::IO::Importers::RailsLog.read("rails_test.log")
def read(path)
@path = path
@file_data = RequestLogAnalyzerPatch.parse_log(@path,:rails3)

This comment has been minimized.

@zverok

zverok May 15, 2018

Collaborator

Why :rails3, couldn't it be another log format?..

This comment has been minimized.

@athityakumar

athityakumar May 15, 2018

Member

Having the log format as another argument (with maybe :rails3 as default) would be a more generic way to do this.

@@ -0,0 +1,73 @@
module RequestLogAnalyzerPatch
require 'request_log_analyzer'

This comment has been minimized.

@zverok

zverok May 15, 2018

Collaborator

This file now is required always when daru-io is required. It should be made optional_gem in importer's initialize, like other importers do.

@@ -0,0 +1,73 @@
module RequestLogAnalyzerPatch

This comment has been minimized.

@zverok

zverok May 15, 2018

Collaborator

When I proposed this module name, I've meant the following: You define ONLY parse_hash in that module, and then, in importer (not here in separate method) you do the following:

parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format))
parser.extend(RequestLogAnalyzerPatch) # only this instance has method replaced
parser.parse_hash(path)

This way it is easy to see what you "inserting" into the third-party gem, and keep it local.

# @example Reading from rails log file
# format = RequestLogAnalyzer::FileFormat.load(:rails3)
# instance = RequestLogAnalyzer::Source::LogParser.new(format)
# list = instance.parse_hash('test_rails.log')

This comment has been minimized.

@zverok

zverok May 15, 2018

Collaborator

This is internal to implementation, so does NOT need any comment about "how it can be used".
Instead, the comment that is absolutely necessary here is: explain that this method only copy-pasted (from where exactly; from which version exactly) and this and that lines changed (with "how it was, why it was changed" in comments). This way, it would be possible to follow request_log_analyzer's development, if it would change the method significantly.

@@ -68,23 +68,23 @@ def call(query)
private
def attempt_sqlite3_connection(db)

This comment has been minimized.

@zverok

zverok May 15, 2018

Collaborator

db is totally good name. Please add this to Naming/UncommunicativeMethodParamName: cop's AllowedNames: settings.

In general, if Rubocop is starting to fail on code unrelated to your PR, it is better to notify maintainer (@athityakumar) than judge yourself.

order: %i[method path ip timestamp line_type lineno source
controller action format params rendered_file
partial_duration status duration view db],
:'timestamp.to_a' => [20180312174118], # rubocop:disable Style/NumericLiterals

This comment has been minimized.

@zverok

zverok May 15, 2018

Collaborator

Don't disable here, just do what Rubocop suggests.

@file_data.each do |hash|
data.add_row(INDEX.map { |attr| hash[attr] })
end
data

This comment has been minimized.

@athityakumar

athityakumar May 15, 2018

Member

I think df is more readable than data, and has either been consistently named so in other Importers or rather just returned a Daru::DataFrame directly.

# # 1 GET / 127.0.0.1 2018022716 completed 12 /home/roh Rails...
# # ... ... ... ... ... ... ... ... ...
def call
data = Daru::DataFrame.new({},index: INDEX).transpose

This comment has been minimized.

@athityakumar

athityakumar May 15, 2018

Member

Um, INDEX is actually supposed to be the order of the DataFrame and has been weirdly named so just due to transpose operation? 🤔

data = Daru::DataFrame.new({},index: INDEX).transpose
@file_data.each do |hash|
data.add_row(INDEX.map { |attr| hash[attr] })
end

This comment has been minimized.

@athityakumar

athityakumar May 15, 2018

Member

Can't the whole transpose + add_row logic be replaced by Daru::DataFrame.rows(array_of_arrays, order: ORDER)?

This comment has been minimized.

@rohitner

rohitner May 16, 2018

Contributor

It can be, but then the array_of_arrays will need further processing as all vectors don't have the same length due to missing attributes in some of them.

@rohitner

This comment has been minimized.

Contributor

rohitner commented May 16, 2018

@zverok @athityakumar please review and suggest changes for rubocop failure.

@rohitner

This comment has been minimized.

Contributor

rohitner commented May 17, 2018

How about adding importers of all three formats in a single file? It would eliminate the additional separate module for the patch. The log.rb file will look like this

module Importers
  class Log < Base
    Daru::Dataframe.register_io_module :read_log, self
    ORDER_RAILS  = %i[...].freeze
    ORDER_APACHE = %i[...].freeze
    ORDER_S3     = %i[...].freeze
    ...
    def read(path, format: :rails3) # default format is :rails3
      case format
      when :rails3    then order = ORDER_RAILS
      when :apache    then order = ORDER_APACHE
      when :amazon_s3 then order = ORDER_S3
      else raise ArgumentError, 'Invalid format provided.'
      ...
      @file_data = parser.parse_hash(path, order)
      self
    end
    ...
    private
    module RequestLogAnalyzerPatch
    ...
      def parse_hash(file, order)
      ...
          parsed_list[i] = order.map ...
      ...
      end
    end
  end
end

In the above method, I will have need to modify the line containing register_io_module to take one more argument specifying the format. WDYT?

@zverok

This comment has been minimized.

Collaborator

zverok commented May 17, 2018

How about adding importers of all three formats in a single file? It would eliminate the additional separate module for the patch. The log.rb file will look like this

That may work. Just the code can be simpler:

ORDERS = {
  rails: %i[...].freeze,
  apache: %i[...].freeze,
  s3: %i[...].freeze
}
# and then...
order = ORDERS.fetch(format)
Daru::DataFrame.register_io_module :read_rails_log, self
# Checks for required gem dependencies of RailsLog importer
# and requires the patch for request-log-analyzer gem

This comment has been minimized.

@zverok

zverok May 17, 2018

Collaborator

Just remove the comment. Please don't repeat what method code says clearly (even if other importers do that :)).

# Reads data from a rails log file
#
# @!method self.read(path)

This comment has been minimized.

@zverok

zverok May 17, 2018

Collaborator

Please document format parameter too.

# instance = Daru::IO::Importers::RailsLog.read("rails_test.log")
def read(path, format: :rails3)
parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format))
parser.extend(RequestLogAnalyzerPatch)

This comment has been minimized.

@zverok

zverok May 17, 2018

Collaborator

This means, that RequestLogAnalyzerPatch should look like this:

module RequestLogAnalyzerPatch
  # look, no other nested modules!
  def parse_hash(file)
  end
end

This way, when you do this extend, parse_hash becomes redefined.

# parse_io and parse_line of the LogParser class. Assigns all the necessary instance
# variables defined in the above specified methods. Creates a request for each line of
# the file stream and stores the hash of parsed information in raw_list. Each element of
# parsed_list is an array of one parsed entry in log file.

This comment has been minimized.

@zverok

zverok May 17, 2018

Collaborator

Ugh, now, I see a problem here (I thought that LogParser already has method parse_hash, you just fixing something in it):

combine the methods parse_file, parse_io and parse_line of the LogParser class

Why???? What's the point of combining three methods in one 40-lines mess?.. Why can't you use original methods as they were, just calling them?

This comment has been minimized.

@rohitner

rohitner May 17, 2018

Contributor

parse_file returns nil and so does parse_io. That's why I have tried fixing these methods in the earlier commits instead of making a new single method. But then it was a lot of rewriting. It was a hard time finding points to tap out the parsed hash from the gem as it is only meant for CLI.

This comment has been minimized.

@zverok

zverok May 17, 2018

Collaborator

Ugh. You should have spend a bit more time on understanding LogParser logic. From reading the code, I believe what you need is (without ANY patching) just call LogParser#each_request:

Reads the input, which can either be a file, sequence of files or STDIN to parse lines specified in the FileFormat. This lines will be combined into Request instances, that will be yielded.

...and, as it is aliased as each, and Enumerable is included, ALL you really need is:

RequestLogAnalyzer::Source::LogParser
  .new(RequestLogAnalyzer::FileFormat.load(format))
  .map { converting Request into line for dataframe }

...and that would be completely it.

@rohitner rohitner force-pushed the rohitner:rails_logs branch from 1b0e783 to ecc7f41 May 17, 2018

@rohitner rohitner changed the title from add importer for rails log to add importer for log files May 19, 2018

@rohitner rohitner force-pushed the rohitner:rails_logs branch from 7c1b62d to b1d77bc May 24, 2018

@rohitner

This comment has been minimized.

Contributor

rohitner commented May 27, 2018

@zverok please review

@zverok

zverok approved these changes May 28, 2018

👍

# # ... ... ... ... ... ... ... ... ...
def call
Daru::DataFrame.rows(@file_data, order: ORDERS.fetch(@format)
.map { |attr| attr == :path ? :resource_path : attr })

This comment has been minimized.

@zverok

zverok May 28, 2018

Collaborator

I'd say RENAME_FIELDS.fetch(attr, attr) with

RENAME_FIELDS = {
  path: :resource_path
}

It would be more regular (even if for now used only for one field).

@zverok

This comment has been minimized.

Collaborator

zverok commented May 28, 2018

(Just turn off the Lint/SplatKeywordArguments, it is honestly meaningless and already removed in the master).

@zverok zverok merged commit dce4be1 into SciRuby:master May 30, 2018

1 check failed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
@zverok

This comment has been minimized.

Collaborator

zverok commented May 30, 2018

Merged! Congrats 🎉

@rohitner rohitner referenced this pull request Jun 6, 2018

Open

Fix rubocop offences #74

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment