Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

APEXMALHAR-2365-Creation of generic log parser #520

Merged
merged 1 commit into from
Dec 20, 2016

Conversation

jogshraddha
Copy link
Contributor

Jira : (https://issues.apache.org/jira/browse/APEXMALHAR-2365)
APEXMALHAR-2365-Creation of generic log parser

incomingString = new String(inputTuple, encoding);
} else {
incomingString = new String(inputTuple, CharEncoding.UTF_8);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little more compact:
{code}
String enc = encoding != null ? encoding : CharEncoding.UTF_8;
incomingString = new String(inputTuple, encoding);
{code}

Actually, if the encoding is not expected to change midstream, you can check the field and set
it in setup() -- then you can avoid the test for every tuple here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'll move it in setup.

return;
}
logger.info("Input string {} ", incomingString);
logger.info("Parsing with log format {}", this.geLogFileFormat());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is not right

* @param tuple
* @param errorMsg
*/
public void admitError(String tuple, String errorMsg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "admitError" ? Should it be emitError ?

pattern.append(field.getRegex()).append(" ");
}
logger.info("Created pattern for parsing the log {}", pattern.toString().trim());
this.setPattern(pattern.toString().trim());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're trimming the final result, shouldn't we also trim each field inside the loop ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are creating a pattern with regex given in schema. Pattern is created by appending space to the regex of each field. ex. "^([0-9.]+) ([a-zA-Z0-9_-]{1,16}) (\d{3}) (\d+|-)"
So we are trimming this after the whole pattern is created because we don't need space at the end of the pattern but space in between two regex groups is needed. Hence we should not trim each field.

Pattern compile = Pattern.compile(this.pattern);
Matcher m = compile.matcher(log);
int count = m.groupCount();
int i = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"i" can be moved inside the if.

"field": "bytes",
"regex": "(\\d+|-)"
}]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing final newline

"field": "bytes",
"regex": "(\\d+|-)"
}]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing final newline

Copy link
Contributor

@tushargosavi tushargosavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address above comments, Rest of the changes are fine

return;
}
logger.info("Input string {} ", incomingString);
logger.info("Parsing with log format {}", this.geLogFileFormat());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change log level to debug for this.

}

private static final Logger logger = LoggerFactory.getLogger(LogParser.class);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line at end of file

* KeyValPair<String,String><br>
* Key being the tuple and Val being the reason.
*/
public class LogParser extends Parser<byte[], KeyValPair<String, String>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is new operator add @InterfaceStability.Unstable until it becomes stable.

*/
public JSONObject createJsonFromLog(String log) throws JSONException
{
Pattern compile = Pattern.compile(this.pattern);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache compiled pattern to avoid compilation for every tuple.

}

private static final Logger logger = LoggerFactory.getLogger(LogSchemaDetails.class);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tushargosavi I have done all the changes suggested by you. Missing new line at the end is also done in this as well as previous commit.

@tushargosavi
Copy link
Contributor

@jogshraddha all looks good, just one last thing
squash all commits and include Apex Jira id the commit message see http://apex.apache.org/contributing.html for commands to squash the commits.

@jogshraddha
Copy link
Contributor Author

Travis build failing. I will close and reopen this PR

@jogshraddha jogshraddha reopened this Dec 19, 2016
@jogshraddha
Copy link
Contributor Author

@tushargosavi : I have squashed all commits and reopened PR since build was failing.

@asfgit asfgit merged commit 21aa3d6 into apache:master Dec 20, 2016
@devtagare
Copy link
Contributor

Hi @jogshraddha , @tushargosavi ,

This operator should be under org.apache.apex.malhar.contrib.parser.

Any reason otherwise ?

Thanks,
Dev

@jogshraddha
Copy link
Contributor Author

@devtagare : All the other parser like JsonParser CsvParser etc are there under com.datatorrent.contrib.parser hence I have added this LogParser there itself. Let me know if the proper way is to put it under org.apache.apex.malhar.contrib.parser then I will make those changes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants