Skip to content

Commit

Permalink
Merge pull request #194 from avast/comment_at_end_of_condition
Browse files Browse the repository at this point in the history
Add comment_behind and comment_before_token to Expression builder
  • Loading branch information
metthal committed Mar 10, 2022
2 parents 71babc6 + 7102aba commit e17198e
Show file tree
Hide file tree
Showing 8 changed files with 257 additions and 70 deletions.
6 changes: 4 additions & 2 deletions docs/rtd/creating_rulesets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -338,8 +338,8 @@ basic expressions and find the most suitable one.
* ``of(spec, set)`` - represents ``<spec> of <set>`` (``of(all(), them())``)
* ``of(spec, set, range)`` - represents ``<spec> of <set> in <range>`` (``of(all(), them(), range(intVal(100), intVal(200)))``)
* ``paren(expr, [newline])`` - represents parentheses around expressions and ``newline`` indicator for putting enclosed expression on its own line (``paren(intVal(10))``)
* ``conjunction(terms, [newline])`` - represents conjunction of ``terms`` and optionally puts them on each separate line if ``newline`` is set (``conjunction({id("rule1"), id("rule2")})``)
* ``disjunction(terms, [newline])`` - represents disjunction of ``terms`` and optionally puts them on each separate line if ``newline`` is set (``disjunction({id("rule1"), id("rule2")})``)
* ``conjunction(terms, [newline])`` - represents conjunction of ``terms`` and optionally puts them on each separate line if ``newline`` is set (``conjunction({id("rule1"), id("rule2")})``). The ``terms`` parameter can be an array containing other expressions to be put together in the conjunction. But also ``terms`` can be an array of pairs, where each pair contains a term to be put in the conjunction and a comment, which will be associated with the term and printed on the same line
* ``disjunction(terms, [newline])`` - represents disjunction of ``terms`` and optionally puts them on each separate line if ``newline`` is set (``disjunction({id("rule1"), id("rule2")})``). The ``terms`` parameter can be an array containing other expressions to be put together in the disjunction. But also ``terms`` can be an array of pairs, where each pair contains a term to be put in the disjunction and a comment, which will be associated with the term and printed on the same line

**Complex expression methods**

Expand Down Expand Up @@ -375,6 +375,8 @@ basic expressions and find the most suitable one.
* ``readUInt8(be)`` - represents call to special function ``uint8(be)`` (``intVal(100).readUInt8()``)
* ``readUInt16(be)`` - represents call to special function ``uint16(be)`` (``intVal(100).readUInt16()``)
* ``readUInt32(be)`` - represents call to special function ``uint32(be)`` (``intVal(100).readUInt32()``)
* ``comment(message, [multiline], [indent], [linebreak])`` - adds a comment ``message`` to the expression which then appears in the formatted text before the expression. Only the ``message`` parameter is required, the ``multiline`` (default ``false``), ``indent`` (default "") and ``linebreak`` (default ``true``) parameters are optional
* ``commentBehind(message, [multiline], [indent], [linebreak])`` - adds a comment ``message`` to the expression which then appears in the formatted text after the expression. Only the ``message`` parameter is required, the ``multiline`` (default ``false``), ``indent`` (default "") and ``linebreak`` (default ``true``) parameters are optional

**Hex strings**

Expand Down
6 changes: 5 additions & 1 deletion include/yaramod/builder/yara_expression_builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#include <memory>
#include <string>
#include <iostream>

#include "yaramod/types/expression.h"
#include "yaramod/types/token_stream.h"
Expand Down Expand Up @@ -166,7 +167,8 @@ class YaraExpressionBuilder
YaraExpressionBuilder& operator<<(const YaraExpressionBuilder& other);
YaraExpressionBuilder& operator>>(const YaraExpressionBuilder& other);

YaraExpressionBuilder& comment(const std::string& message, bool multiline = false, const std::string& indent = {});
YaraExpressionBuilder& comment(const std::string& message, bool multiline = false, const std::string& indent = {}, bool linebreak = true);
YaraExpressionBuilder& commentBehind(const std::string& message, bool multiline = false, const std::string& indent = {}, bool linebreak = true);
YaraExpressionBuilder& call(const std::vector<YaraExpressionBuilder>& args);
/**
* Calls function from an expression
Expand Down Expand Up @@ -257,6 +259,8 @@ YaraExpressionBuilder range(const YaraExpressionBuilder& low, const YaraExpressi

YaraExpressionBuilder conjunction(const YaraExpressionBuilder& lhs, const YaraExpressionBuilder& rhs, bool linebreak = false);
YaraExpressionBuilder disjunction(const YaraExpressionBuilder& lhs, const YaraExpressionBuilder& rhs, bool linebreak = false);
YaraExpressionBuilder conjunction(const YaraExpressionBuilder& lhs, const std::string& lhscomment, const YaraExpressionBuilder& rhs);
YaraExpressionBuilder disjunction(const YaraExpressionBuilder& lhs, const std::string& lhscomment, const YaraExpressionBuilder& rhs);
YaraExpressionBuilder conjunction(const std::vector<YaraExpressionBuilder>& terms, bool linebreaks = false);
YaraExpressionBuilder disjunction(const std::vector<YaraExpressionBuilder>& terms, bool linebreaks = false);
YaraExpressionBuilder conjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& terms);
Expand Down
8 changes: 8 additions & 0 deletions include/yaramod/types/token_stream.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,14 @@ class TokenStream
TokenIt erase(TokenIt element);
TokenIt erase(TokenIt first, TokenIt last);


// Puts comment to the front.
void comment(const std::string& message, bool multiline = false, const std::string& indent = {}, bool linebreak = true);
// Puts comment to the back.
void commentBehind(const std::string& message, bool multiline = false, const std::string& indent = {}, bool linebreak = true);
// Puts comment before the insert_before parameter token.
void commentBeforeToken(const std::string& message, TokenIt insert_before, bool multiline = false, const std::string& indent = {}, bool linebreak = true);

// Steals all data from donor and append it at the end.
void moveAppend(TokenStream* donor);
// Steals all data from donor and append it at position before.
Expand Down
164 changes: 119 additions & 45 deletions src/builder/yara_expression_builder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,31 @@ YaraExpressionBuilder logicalFormula(std::vector<YaraExpressionBuilder> terms, c
return formula;
}

template <typename Op>
YaraExpressionBuilder logicalFormula(std::vector<YaraExpressionBuilder> terms, std::vector<std::string> comments, const Op& op)
{
if (terms.empty())
return boolVal(true);

if (terms.size() == 1)
return terms.front();

auto formula = op(terms[0], comments[0], terms[1]);
for (std::size_t i = 2; i < terms.size(); ++i)
{
if (!terms[i].canBeBool())
{
const auto& expr = terms[i].get();
error_handle("Expected boolean, got '" + expr->getText() + "' of type " + expr->getTypeString());
}
if (i >= 2)
formula = op(formula, comments[i-1], terms[i]);
}

formula.setType(Expression::Type::Bool);
return formula;
}

} //namespace

/**
Expand Down Expand Up @@ -459,39 +484,24 @@ YaraExpressionBuilder& YaraExpressionBuilder::call(const std::vector<YaraExpress
*
* @return Builder.
*/
YaraExpressionBuilder& YaraExpressionBuilder::comment(const std::string& message, bool multiline, const std::string& indent)
YaraExpressionBuilder& YaraExpressionBuilder::comment(const std::string& message, bool multiline, const std::string& indent, bool linebreak)
{
if (!message.empty())
{
TokenIt insert_before = _tokenStream->begin();
std::stringstream ss;
ss << indent;
if (multiline)
{
ss << "/*";
if (message.front() != '\n')
ss << " ";
for (auto c : message)
{
ss << c;
if (c == '\n')
ss << indent;
}
if (message.back() != '\n')
ss << " ";
ss << "*/";
_tokenStream->emplace(insert_before, TokenType::COMMENT, ss.str());
}
else
{
for (auto item : message)
if (item == '\n')
throw YaraExpressionBuilderError("Error: one-line comment must not contain \\n.");
ss << "// " << message;
_tokenStream->emplace(insert_before, TokenType::ONELINE_COMMENT, ss.str());
}
_tokenStream->emplace(insert_before, TokenType::NEW_LINE, "\n");
}
_tokenStream->comment(message, multiline, indent, linebreak);
return *this;
}

/**
* Puts comment behind the expression.
*
* @param message The comment message.
* @param multiline If set, the commet will be multiline.
* @param indent Additional indent added to the indentation computed by the autoformatter.
*
* @return Builder.
*/
YaraExpressionBuilder& YaraExpressionBuilder::commentBehind(const std::string& message, bool multiline, const std::string& indent, bool linebreak)
{
_tokenStream->commentBehind(message, multiline, indent, linebreak);
return *this;
}

Expand Down Expand Up @@ -1240,6 +1250,58 @@ YaraExpressionBuilder disjunction(const YaraExpressionBuilder& lhs, const YaraEx
return YaraExpressionBuilder(std::move(ts), std::move(expression), Expression::Type::Bool);
}

/**
* Creates conjunction.
*
* @param lhs Left-hand side.
* @param lhscomment First comment.
* @param rhs Right-hand side.
*
* @return Builder.
*/
YaraExpressionBuilder conjunction(const YaraExpressionBuilder& lhs, const std::string& lhscomment, const YaraExpressionBuilder& rhs)
{
if (!lhs.canBeBool())
error_handle(ArgType::Left, "and", "bool", rhs.get());
else if (!rhs.canBeBool())
error_handle(ArgType::Right, "and", "bool", lhs.get());

auto ts = std::make_shared<TokenStream>();
ts->moveAppend(lhs.getTokenStream());
TokenIt andToken = ts->emplace_back(TokenType::AND, "and");
ts->commentBehind(lhscomment, false, "", true);
ts->moveAppend(rhs.getTokenStream());

auto expression = std::make_shared<AndExpression>(lhs.get(), andToken, rhs.get(), true);
return YaraExpressionBuilder(std::move(ts), std::move(expression), Expression::Type::Bool);
}

/**
* Creates disjunction.
*
* @param lhs Left-hand side.
* @param lhscomment First comment.
* @param rhs Right-hand side.
*
* @return Builder.
*/
YaraExpressionBuilder disjunction(const YaraExpressionBuilder& lhs, const std::string& lhscomment, const YaraExpressionBuilder& rhs)
{
if (!lhs.canBeBool())
error_handle(ArgType::Left, "or", "bool", lhs.get());
else if (!rhs.canBeBool())
error_handle(ArgType::Right, "or", "bool", rhs.get());

auto ts = std::make_shared<TokenStream>();
ts->moveAppend(lhs.getTokenStream());
TokenIt orToken = ts->emplace_back(TokenType::OR, "or");
ts->commentBehind(lhscomment, false, "", true);
ts->moveAppend(rhs.getTokenStream());

auto expression = std::make_shared<OrExpression>(lhs.get(), orToken, rhs.get(), true);
return YaraExpressionBuilder(std::move(ts), std::move(expression), Expression::Type::Bool);
}

/**
* Creates conjunction of terms.
*
Expand Down Expand Up @@ -1280,17 +1342,23 @@ YaraExpressionBuilder disjunction(const std::vector<YaraExpressionBuilder>& term
*
* @return Builder.
*/
YaraExpressionBuilder conjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& terms)
YaraExpressionBuilder conjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& commented_terms)
{
std::vector<YaraExpressionBuilder> commented_terms;
for (const auto& pair : terms)
std::vector<YaraExpressionBuilder> terms;
std::vector<std::string> comments;
for (const auto& pair : commented_terms)
{
if (!pair.first.canBeBool())
error_handle(ArgType::Single, "and", "bool", pair.first.get());
commented_terms.push_back(pair.first);
commented_terms.back().comment(pair.second, true);
terms.push_back(pair.first);
comments.push_back(pair.second);
}
return logicalFormula(commented_terms, [](YaraExpressionBuilder& term1, YaraExpressionBuilder& term2) { return conjunction(term1, term2, true); });
auto output = logicalFormula(terms, comments, []( YaraExpressionBuilder& term1, std::string& comment1, YaraExpressionBuilder& term2) {
return conjunction(term1, comment1, term2);
});

output.commentBehind(comments.back(), false, "", false);
return output;
}

/**
Expand All @@ -1301,17 +1369,23 @@ YaraExpressionBuilder conjunction(const std::vector<std::pair<YaraExpressionBuil
*
* @return Builder.
*/
YaraExpressionBuilder disjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& terms)
YaraExpressionBuilder disjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& commented_terms)
{
std::vector<YaraExpressionBuilder> commented_terms;
for (const auto& pair : terms)
std::vector<YaraExpressionBuilder> terms;
std::vector<std::string> comments;
for (const auto& pair : commented_terms)
{
if (!pair.first.canBeBool())
error_handle(ArgType::Single, "or", "bool", pair.first.get());
commented_terms.push_back(pair.first);
commented_terms.back().comment(pair.second, true);
error_handle(ArgType::Single, "and", "bool", pair.first.get());
terms.push_back(pair.first);
comments.push_back(pair.second);
}
return logicalFormula(commented_terms, [](YaraExpressionBuilder& term1, YaraExpressionBuilder& term2) { return disjunction(term1, term2, true); });
auto output = logicalFormula(terms, comments, []( YaraExpressionBuilder& term1, std::string& comment1, YaraExpressionBuilder& term2) {
return disjunction(term1, comment1, term2);
});

output.commentBehind(comments.back(), false, "", false);
return output;
}

/**
Expand Down
6 changes: 4 additions & 2 deletions src/python/yaramod_python.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,8 @@ void addTokenStreamClass(py::module& module)
.def_property_readonly("front", &TokenStream::front)
.def_property_readonly("back", &TokenStream::back)
.def_property_readonly("tokens", &TokenStream::getTokens)
.def_property_readonly("tokens_as_text", &TokenStream::getTokensAsText);
.def_property_readonly("tokens_as_text", &TokenStream::getTokensAsText)
.def("comment_before_token", &TokenStream::commentBeforeToken, py::arg("message"), py::arg("insert_before"), py::arg("multiline") = false, py::arg("indent") = "", py::arg("linebreak") = true);
}

void addExpressionClasses(py::module& module)
Expand Down Expand Up @@ -829,7 +830,8 @@ void addBuilderClasses(py::module& module)
})
.def("__getitem__", &YaraExpressionBuilder::operator[])
.def("access", &YaraExpressionBuilder::access)
.def("comment", &YaraExpressionBuilder::comment, py::arg("message"), py::arg("multiline") = false, py::arg("indent") = "")
.def("comment", &YaraExpressionBuilder::comment, py::arg("message"), py::arg("multiline") = false, py::arg("indent") = "", py::arg("linebreak") = true)
.def("comment_behind", &YaraExpressionBuilder::commentBehind, py::arg("message"), py::arg("multiline") = false, py::arg("indent") = "", py::arg("linebreak") = true)
.def("contains", &YaraExpressionBuilder::contains)
.def("matches", &YaraExpressionBuilder::matches)
.def("iequals", &YaraExpressionBuilder::iequals)
Expand Down
56 changes: 56 additions & 0 deletions src/types/token_stream.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -488,6 +488,62 @@ void TokenStream::removeRedundantDoubleNewlines()
}
}

void TokenStream::comment(const std::string& message, bool multiline, const std::string& indent, bool linebreak)
{
auto insert_before = empty() ? end() : begin();
commentBeforeToken(message, insert_before, multiline, indent, linebreak);
}

void TokenStream::commentBehind(const std::string& message, bool multiline, const std::string& indent, bool linebreak)
{
auto insert_before = end();
while (insert_before != begin())
{
auto predecessor = std::prev(insert_before);
if (predecessor->getType() == TokenType::NEW_LINE)
--insert_before;
else
break;
}
commentBeforeToken(message, insert_before, multiline, indent, linebreak);
}

void TokenStream::commentBeforeToken(const std::string& message, TokenIt insert_before, bool multiline, const std::string& indent, bool linebreak)
{
if (!message.empty())
{
std::stringstream ss;
ss << indent;
if (multiline)
{
ss << "/*";
if (message.front() != '\n')
ss << " ";
for (auto c : message)
{
ss << c;
if (c == '\n')
ss << indent;
}
if (message.back() != '\n')
ss << " ";
ss << "*/";
emplace(insert_before, TokenType::COMMENT, ss.str());
}
else
{
for (auto item : message)
if (item == '\n')
throw YaramodError("Error: one-line comment must not contain \\n.");
ss << "// " << message;
emplace(insert_before, TokenType::ONELINE_COMMENT, ss.str());
}
// if (insert_before != end() && (linebreak || !multiline))
if (linebreak)
emplace(insert_before, TokenType::NEW_LINE, "\n");
}
}

void TokenStream::addMissingNewLines()
{
BracketStack brackets;
Expand Down
Loading

0 comments on commit e17198e

Please sign in to comment.