-
Notifications
You must be signed in to change notification settings - Fork 152
APEXMALHAR-2365-Creation of generic log parser #520
Conversation
incomingString = new String(inputTuple, encoding); | ||
} else { | ||
incomingString = new String(inputTuple, CharEncoding.UTF_8); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little more compact:
{code}
String enc = encoding != null ? encoding : CharEncoding.UTF_8;
incomingString = new String(inputTuple, encoding);
{code}
Actually, if the encoding is not expected to change midstream, you can check the field and set
it in setup() -- then you can avoid the test for every tuple here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I'll move it in setup.
return; | ||
} | ||
logger.info("Input string {} ", incomingString); | ||
logger.info("Parsing with log format {}", this.geLogFileFormat()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation is not right
* @param tuple | ||
* @param errorMsg | ||
*/ | ||
public void admitError(String tuple, String errorMsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "admitError" ? Should it be emitError ?
pattern.append(field.getRegex()).append(" "); | ||
} | ||
logger.info("Created pattern for parsing the log {}", pattern.toString().trim()); | ||
this.setPattern(pattern.toString().trim()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're trimming the final result, shouldn't we also trim each field inside the loop ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are creating a pattern with regex given in schema. Pattern is created by appending space to the regex of each field. ex. "^([0-9.]+) ([a-zA-Z0-9_-]{1,16}) (\d{3}) (\d+|-)"
So we are trimming this after the whole pattern is created because we don't need space at the end of the pattern but space in between two regex groups is needed. Hence we should not trim each field.
Pattern compile = Pattern.compile(this.pattern); | ||
Matcher m = compile.matcher(log); | ||
int count = m.groupCount(); | ||
int i = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"i" can be moved inside the if.
"field": "bytes", | ||
"regex": "(\\d+|-)" | ||
}] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing final newline
"field": "bytes", | ||
"regex": "(\\d+|-)" | ||
}] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing final newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address above comments, Rest of the changes are fine
return; | ||
} | ||
logger.info("Input string {} ", incomingString); | ||
logger.info("Parsing with log format {}", this.geLogFileFormat()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change log level to debug for this.
} | ||
|
||
private static final Logger logger = LoggerFactory.getLogger(LogParser.class); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new line at end of file
* KeyValPair<String,String><br> | ||
* Key being the tuple and Val being the reason. | ||
*/ | ||
public class LogParser extends Parser<byte[], KeyValPair<String, String>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is new operator add @InterfaceStability.Unstable until it becomes stable.
*/ | ||
public JSONObject createJsonFromLog(String log) throws JSONException | ||
{ | ||
Pattern compile = Pattern.compile(this.pattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache compiled pattern to avoid compilation for every tuple.
} | ||
|
||
private static final Logger logger = LoggerFactory.getLogger(LogSchemaDetails.class); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tushargosavi I have done all the changes suggested by you. Missing new line at the end is also done in this as well as previous commit.
25aaba3
to
df337fe
Compare
@jogshraddha all looks good, just one last thing |
df337fe
to
21aa3d6
Compare
Travis build failing. I will close and reopen this PR |
@tushargosavi : I have squashed all commits and reopened PR since build was failing. |
Hi @jogshraddha , @tushargosavi , This operator should be under org.apache.apex.malhar.contrib.parser. Any reason otherwise ? Thanks, |
@devtagare : All the other parser like JsonParser CsvParser etc are there under com.datatorrent.contrib.parser hence I have added this LogParser there itself. Let me know if the proper way is to put it under org.apache.apex.malhar.contrib.parser then I will make those changes. |
Jira : (https://issues.apache.org/jira/browse/APEXMALHAR-2365)
APEXMALHAR-2365-Creation of generic log parser