Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add importer for log files #75

Merged
merged 12 commits into from
May 30, 2018
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,6 @@ Style/EmptyElse:
Style/EmptyMethod:
EnforcedStyle: compact

Style/FileName:
Enabled: false

Style/FormatString:
EnforcedStyle: percent

Expand Down Expand Up @@ -110,3 +107,21 @@ RSpec/NestedGroups:

RSpec/ContextWording:
Enabled: false

### Security -----------------------------------------------------------

Security/Open:
Enabled: false

### Naming -------------------------------------------------------------

Naming/FileName:
Enabled: false

Naming/MemoizedInstanceVariableName:
Exclude:
- 'lib/daru/io/exporters/excel.rb'

Naming/UncommunicativeMethodParamName:
AllowedNames:
- 'db'
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ group :optional do
gem 'mongo'
gem 'nokogiri'
gem 'redis'
gem 'request-log-analyzer', '~> 1.13.4'
gem 'roo', '~> 2.7.0'
gem 'rsruby'
gem 'snappy'
Expand Down
2 changes: 1 addition & 1 deletion daru-io.gemspec
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
lib = File.expand_path('../lib', __FILE__)
lib = File.expand_path('lib', __dir__)
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
require 'daru/io/version'

Expand Down
10 changes: 5 additions & 5 deletions lib/daru/io/importers/plaintext.rb
Original file line number Diff line number Diff line change
Expand Up @@ -81,14 +81,14 @@ def process_row(row,empty)
end
end

def try_string_to_number(s)
case s
def try_string_to_number(str)
case str
when INT_PATTERN
s.to_i
str.to_i
when FLOAT_PATTERN
s.tr(',', '.').to_f
str.tr(',', '.').to_f
else
s
str
end
end
end
Expand Down
59 changes: 59 additions & 0 deletions lib/daru/io/importers/rails_log.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
require 'daru/io/importers/base'

module Daru
module IO
module Importers
# RailsLog Importer Class, that extends `read_rails_log` methods
# to `Daru::DataFrame`
class RailsLog < Base
Daru::DataFrame.register_io_module :read_rails_log, self

# Checks for required gem dependencies of RailsLog importer
# and requires the patch for request-log-analyzer gem
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove the comment. Please don't repeat what method code says clearly (even if other importers do that :)).

def initialize
optional_gem 'request-log-analyzer', '~> 1.13.4', requires: 'request_log_analyzer'
require 'daru/io/importers/shared/request_log_analyzer_patch'
end

# Reads data from a rails log file
#
# @!method self.read(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document format parameter too.

#
# @param path [String] Path to rails log file, where the dataframe is to be
# imported from.
#
# @return [Daru::IO::Importers::RailsLog]
#
# @example Reading from plaintext file
# instance = Daru::IO::Importers::RailsLog.read("rails_test.log")
def read(path, format: :rails3)
parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format))
parser.extend(RequestLogAnalyzerPatch)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means, that RequestLogAnalyzerPatch should look like this:

module RequestLogAnalyzerPatch
  # look, no other nested modules!
  def parse_hash(file)
  end
end

This way, when you do this extend, parse_hash becomes redefined.

@file_data = parser.parse_hash(path)
self
end

# header of the parsed information
ORDER = %i[method path ip timestamp line_type lineno source
controller action format params rendered_file
partial_duration status duration view db].freeze

# Imports a `Daru::DataFrame` from a RailsLog Importer instance and rails log file
#
# @return [Daru::DataFrame]
#
# @example Reading from a rails log file
# df = instance.call
#
# => #<Daru::DataFrame(150x17)>
# # method path ip timestamp line_type lineno source contr...
# # 0 GET / 127.0.0.1 2018022607 completed 5 /home/roh Rails...
# # 1 GET / 127.0.0.1 2018022716 completed 12 /home/roh Rails...
# # ... ... ... ... ... ... ... ... ...
def call
Daru::DataFrame.rows(@file_data, order: ORDER)
end
end
end
end
end
48 changes: 48 additions & 0 deletions lib/daru/io/importers/shared/request_log_analyzer_patch.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
module RequestLogAnalyzerPatch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I proposed this module name, I've meant the following: You define ONLY parse_hash in that module, and then, in importer (not here in separate method) you do the following:

parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format))
parser.extend(RequestLogAnalyzerPatch) # only this instance has method replaced
parser.parse_hash(path)

This way it is easy to see what you "inserting" into the third-party gem, and keep it local.

module RequestLogAnalyzer::Source # rubocop:disable Style/ClassAndModuleChildren
# LogParser class, that reads log data from a given source and uses a file format
# definition to parse all relevent information about requests from the file
class LogParser
# Patch for the gem request-log-analyzer version 1.13.4 to combine the methods parse_file,
# parse_io and parse_line of the LogParser class. Assigns all the necessary instance
# variables defined in the above specified methods. Creates a request for each line of
# the file stream and stores the hash of parsed information in raw_list. Each element of
# parsed_list is an array of one parsed entry in log file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, now, I see a problem here (I thought that LogParser already has method parse_hash, you just fixing something in it):

combine the methods parse_file, parse_io and parse_line of the LogParser class

Why???? What's the point of combining three methods in one 40-lines mess?.. Why can't you use original methods as they were, just calling them?

Copy link
Contributor Author

@rohitner rohitner May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_file returns nil and so does parse_io. That's why I have tried fixing these methods in the earlier commits instead of making a new single method. But then it was a lot of rewriting. It was a hard time finding points to tap out the parsed hash from the gem as it is only meant for CLI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. You should have spend a bit more time on understanding LogParser logic. From reading the code, I believe what you need is (without ANY patching) just call LogParser#each_request:

Reads the input, which can either be a file, sequence of files or STDIN to parse lines specified in the FileFormat. This lines will be combined into Request instances, that will be yielded.

...and, as it is aliased as each, and Enumerable is included, ALL you really need is:

RequestLogAnalyzer::Source::LogParser
  .new(RequestLogAnalyzer::FileFormat.load(format))
  .map { converting Request into line for dataframe }

...and that would be completely it.

def parse_hash(file) # rubocop:disable Metrics/AbcSize, Metrics/MethodLength
log_file = File.open(file, 'rb')
@max_line_length = max_line_length
@line_divider = line_divider
@current_lineno = 0
@file_format = file_format
@current_source = File.expand_path(file)
raw_list = []
while (line = log_file.gets(@line_divider, @max_line_length))
@current_lineno += 1
unless (request_data = @file_format.parse_line(line) { |wt, message| warn(wt, message) })
next
end
request_data = request_data.merge(source: @current_source, lineno: @current_lineno)
@parsed_lines += 1
update_current_request(request_data)
raw_hash = @file_format.request(request_data).attributes
raw_list << raw_hash unless raw_hash.nil?
end
parsed_list = []
raw_list.each do |hash|
parsed_list << hash if hash.key? :method
end
(0...parsed_list.size).each do |i|
j = raw_list.index(parsed_list[i])
k = raw_list.index(parsed_list[i+1])
k = k.nil? ? raw_list.size : k
(j...k).each do |l|
parsed_list[i].merge!(raw_list[l])
end
parsed_list[i] = Daru::IO::Importers::RailsLog::ORDER
.map { |attr| parsed_list[i].include?(attr) ? parsed_list[i][attr] : nil }
end
parsed_list
end
end
end
end
1 change: 1 addition & 0 deletions lib/daru/io/link.rb
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ class << self
# | `Daru::DataFrame.read_json` | {Daru::IO::Importers::JSON#read} |
# | `Daru::DataFrame.from_mongo` | {Daru::IO::Importers::Mongo#from} |
# | `Daru::DataFrame.read_plaintext` | {Daru::IO::Importers::Plaintext#read} |
# | `Daru::DataFrame.read_rails_log` | {Daru::IO::Importers::RailsLog#read} |
# | `Daru::DataFrame.read_rdata` | {Daru::IO::Importers::RData#read} |
# | `Daru::DataFrame.read_rds` | {Daru::IO::Importers::RDS#read} |
# | `Daru::DataFrame.from_redis` | {Daru::IO::Importers::Redis#from} |
Expand Down
16 changes: 16 additions & 0 deletions spec/daru/io/importers/rails_log_spec.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
RSpec.describe Daru::IO::Importers::RailsLog do
subject { described_class.read(path).call }

context 'parsing rails log' do
let(:path) { 'spec/fixtures/rails_log/rails.log' }

it_behaves_like 'exact daru dataframe',
ncols: 17,
nrows: 1,
order: %i[method path ip timestamp line_type lineno source
controller action format params rendered_file
partial_duration status duration view db],
:'timestamp.to_a' => [20_180_312_174_118],
:'duration.to_a' => [0.097]
end
end
7 changes: 7 additions & 0 deletions spec/fixtures/rails_log/rails.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Started GET "/articles/9" for 127.0.0.1 at 2018-03-12 17:41:18 +0530
Processing by ArticlesController#show as HTML
Parameters: {"id"=>"9"}
Article Load (1.4ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" = ? LIMIT ? [["id", 9], ["LIMIT", 1]]
Rendering articles/show.html.erb within layouts/application
Rendered articles/show.html.erb within layouts/application (2.9ms)
Completed 200 OK in 97ms (Views: 50.6ms | ActiveRecord: 1.4ms)