Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comment_behind and comment_before_token to Expression builder #194

Merged
merged 4 commits into from
Mar 10, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions docs/rtd/creating_rulesets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -337,8 +337,8 @@ basic expressions and find the most suitable one.
* ``forLoop(spec, set, body)`` - represents ``for`` loop over set of string references (``forLoop(any(), set({stringRef("$*")}), matchAt("$", intVal(100))``)
* ``of(spec, set)`` - represents ``<spec> of <set>`` (``of(all(), them())``)
* ``paren(expr, [newline])`` - represents parentheses around expressions and ``newline`` indicator for putting enclosed expression on its own line (``paren(intVal(10))``)
* ``conjunction(terms, [newline])`` - represents conjunction of ``terms`` and optionally puts them on each separate line if ``newline`` is set (``conjunction({id("rule1"), id("rule2")})``)
* ``disjunction(terms, [newline])`` - represents disjunction of ``terms`` and optionally puts them on each separate line if ``newline`` is set (``disjunction({id("rule1"), id("rule2")})``)
* ``conjunction(terms, [newline])`` - represents conjunction of ``terms`` and optionally puts them on each separate line if ``newline`` is set (``conjunction({id("rule1"), id("rule2")})``). The ``terms`` parameter can be an array containing other expressions to be put together in the conjunction. But also ``terms`` can be an array of pairs, where each pair contains a term to be put in the conjunction and a comment, which will be associated with the term and printed on the same line
* ``disjunction(terms, [newline])`` - represents disjunction of ``terms`` and optionally puts them on each separate line if ``newline`` is set (``disjunction({id("rule1"), id("rule2")})``). The ``terms`` parameter can be an array containing other expressions to be put together in the disjunction. But also ``terms`` can be an array of pairs, where each pair contains a term to be put in the disjunction and a comment, which will be associated with the term and printed on the same line

**Complex expression methods**

Expand Down Expand Up @@ -374,6 +374,8 @@ basic expressions and find the most suitable one.
* ``readUInt8(be)`` - represents call to special function ``uint8(be)`` (``intVal(100).readUInt8()``)
* ``readUInt16(be)`` - represents call to special function ``uint16(be)`` (``intVal(100).readUInt16()``)
* ``readUInt32(be)`` - represents call to special function ``uint32(be)`` (``intVal(100).readUInt32()``)
* ``comment(message, [multiline], [indent], [linebreak])`` - adds a comment ``message`` to the expression which then appears in the formatted text before the expression. Only the ``message`` parameter is required, the ``multiline`` (default ``false``), ``indent`` (default "") and ``linebreak`` (default ``true``) parameters are optional
* ``commentBehind(message, [multiline], [indent], [linebreak])`` - adds a comment ``message`` to the expression which then appears in the formatted text after the expression. Only the ``message`` parameter is required, the ``multiline`` (default ``false``), ``indent`` (default "") and ``linebreak`` (default ``true``) parameters are optional

**Hex strings**

Expand Down
6 changes: 5 additions & 1 deletion include/yaramod/builder/yara_expression_builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#include <memory>
#include <string>
#include <iostream>

#include "yaramod/types/expression.h"
#include "yaramod/types/token_stream.h"
Expand Down Expand Up @@ -166,7 +167,8 @@ class YaraExpressionBuilder
YaraExpressionBuilder& operator<<(const YaraExpressionBuilder& other);
YaraExpressionBuilder& operator>>(const YaraExpressionBuilder& other);

YaraExpressionBuilder& comment(const std::string& message, bool multiline = false, const std::string& indent = {});
YaraExpressionBuilder& comment(const std::string& message, bool multiline = false, const std::string& indent = {}, bool linebreak = true);
YaraExpressionBuilder& commentBehind(const std::string& message, bool multiline = false, const std::string& indent = {}, bool linebreak = true);
YaraExpressionBuilder& call(const std::vector<YaraExpressionBuilder>& args);
/**
* Calls function from an expression
Expand Down Expand Up @@ -255,6 +257,8 @@ YaraExpressionBuilder range(const YaraExpressionBuilder& low, const YaraExpressi

YaraExpressionBuilder conjunction(const YaraExpressionBuilder& lhs, const YaraExpressionBuilder& rhs, bool linebreak = false);
YaraExpressionBuilder disjunction(const YaraExpressionBuilder& lhs, const YaraExpressionBuilder& rhs, bool linebreak = false);
YaraExpressionBuilder conjunction(const YaraExpressionBuilder& lhs, const std::string& lhscomment, const YaraExpressionBuilder& rhs);
YaraExpressionBuilder disjunction(const YaraExpressionBuilder& lhs, const std::string& lhscomment, const YaraExpressionBuilder& rhs);
YaraExpressionBuilder conjunction(const std::vector<YaraExpressionBuilder>& terms, bool linebreaks = false);
YaraExpressionBuilder disjunction(const std::vector<YaraExpressionBuilder>& terms, bool linebreaks = false);
YaraExpressionBuilder conjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& terms);
Expand Down
8 changes: 8 additions & 0 deletions include/yaramod/types/token_stream.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,14 @@ class TokenStream
TokenIt erase(TokenIt element);
TokenIt erase(TokenIt first, TokenIt last);


// Puts comment to the front.
void comment(const std::string& message, bool multiline = false, const std::string& indent = {}, bool linebreak = true);
// Puts comment to the back.
void commentBehind(const std::string& message, bool multiline = false, const std::string& indent = {}, bool linebreak = true);
// Puts comment before the insert_before parameter token.
void commentBeforeToken(const std::string& message, TokenIt insert_before, bool multiline = false, const std::string& indent = {}, bool linebreak = true);

// Steals all data from donor and append it at the end.
void moveAppend(TokenStream* donor);
// Steals all data from donor and append it at position before.
Expand Down
164 changes: 119 additions & 45 deletions src/builder/yara_expression_builder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,31 @@ YaraExpressionBuilder logicalFormula(std::vector<YaraExpressionBuilder> terms, c
return formula;
}

template <typename Op>
YaraExpressionBuilder logicalFormula(std::vector<YaraExpressionBuilder> terms, std::vector<std::string> comments, const Op& op)
{
if (terms.empty())
return boolVal(true);

if (terms.size() == 1)
return terms.front();

auto formula = op(terms[0], comments[0], terms[1]);
for (std::size_t i = 2; i < terms.size(); ++i)
{
if (!terms[i].canBeBool())
{
const auto& expr = terms[i].get();
error_handle("Expected boolean, got '" + expr->getText() + "' of type " + expr->getTypeString());
}
if (i >= 2)
formula = op(formula, comments[i-1], terms[i]);
}

formula.setType(Expression::Type::Bool);
return formula;
}

} //namespace

/**
Expand Down Expand Up @@ -459,39 +484,24 @@ YaraExpressionBuilder& YaraExpressionBuilder::call(const std::vector<YaraExpress
*
* @return Builder.
*/
YaraExpressionBuilder& YaraExpressionBuilder::comment(const std::string& message, bool multiline, const std::string& indent)
YaraExpressionBuilder& YaraExpressionBuilder::comment(const std::string& message, bool multiline, const std::string& indent, bool linebreak)
{
if (!message.empty())
{
TokenIt insert_before = _tokenStream->begin();
std::stringstream ss;
ss << indent;
if (multiline)
{
ss << "/*";
if (message.front() != '\n')
ss << " ";
for (auto c : message)
{
ss << c;
if (c == '\n')
ss << indent;
}
if (message.back() != '\n')
ss << " ";
ss << "*/";
_tokenStream->emplace(insert_before, TokenType::COMMENT, ss.str());
}
else
{
for (auto item : message)
if (item == '\n')
throw YaraExpressionBuilderError("Error: one-line comment must not contain \\n.");
ss << "// " << message;
_tokenStream->emplace(insert_before, TokenType::ONELINE_COMMENT, ss.str());
}
_tokenStream->emplace(insert_before, TokenType::NEW_LINE, "\n");
}
_tokenStream->comment(message, multiline, indent, linebreak);
return *this;
}

/**
* Puts comment behind the expression.
*
* @param message The comment message.
* @param multiline If set, the commet will be multiline.
* @param indent Additional indent added to the indentation computed by the autoformatter.
*
* @return Builder.
*/
YaraExpressionBuilder& YaraExpressionBuilder::commentBehind(const std::string& message, bool multiline, const std::string& indent, bool linebreak)
{
_tokenStream->commentBehind(message, multiline, indent, linebreak);
return *this;
}

Expand Down Expand Up @@ -1205,6 +1215,58 @@ YaraExpressionBuilder disjunction(const YaraExpressionBuilder& lhs, const YaraEx
return YaraExpressionBuilder(std::move(ts), std::move(expression), Expression::Type::Bool);
}

/**
* Creates conjunction.
*
* @param lhs Left-hand side.
* @param lhscomment First comment.
* @param rhs Right-hand side.
*
* @return Builder.
*/
YaraExpressionBuilder conjunction(const YaraExpressionBuilder& lhs, const std::string& lhscomment, const YaraExpressionBuilder& rhs)
{
if (!lhs.canBeBool())
error_handle(ArgType::Left, "and", "bool", rhs.get());
else if (!rhs.canBeBool())
error_handle(ArgType::Right, "and", "bool", lhs.get());

auto ts = std::make_shared<TokenStream>();
ts->moveAppend(lhs.getTokenStream());
TokenIt andToken = ts->emplace_back(TokenType::AND, "and");
ts->commentBehind(lhscomment, false, "", true);
ts->moveAppend(rhs.getTokenStream());

auto expression = std::make_shared<AndExpression>(lhs.get(), andToken, rhs.get(), true);
return YaraExpressionBuilder(std::move(ts), std::move(expression), Expression::Type::Bool);
}

/**
* Creates disjunction.
*
* @param lhs Left-hand side.
* @param lhscomment First comment.
* @param rhs Right-hand side.
*
* @return Builder.
*/
YaraExpressionBuilder disjunction(const YaraExpressionBuilder& lhs, const std::string& lhscomment, const YaraExpressionBuilder& rhs)
{
if (!lhs.canBeBool())
error_handle(ArgType::Left, "or", "bool", lhs.get());
else if (!rhs.canBeBool())
error_handle(ArgType::Right, "or", "bool", rhs.get());

auto ts = std::make_shared<TokenStream>();
ts->moveAppend(lhs.getTokenStream());
TokenIt orToken = ts->emplace_back(TokenType::OR, "or");
ts->commentBehind(lhscomment, false, "", true);
ts->moveAppend(rhs.getTokenStream());

auto expression = std::make_shared<OrExpression>(lhs.get(), orToken, rhs.get(), true);
return YaraExpressionBuilder(std::move(ts), std::move(expression), Expression::Type::Bool);
}

/**
* Creates conjunction of terms.
*
Expand Down Expand Up @@ -1245,17 +1307,23 @@ YaraExpressionBuilder disjunction(const std::vector<YaraExpressionBuilder>& term
*
* @return Builder.
*/
YaraExpressionBuilder conjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& terms)
YaraExpressionBuilder conjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& commented_terms)
{
std::vector<YaraExpressionBuilder> commented_terms;
for (const auto& pair : terms)
std::vector<YaraExpressionBuilder> terms;
std::vector<std::string> comments;
for (const auto& pair : commented_terms)
{
if (!pair.first.canBeBool())
error_handle(ArgType::Single, "and", "bool", pair.first.get());
commented_terms.push_back(pair.first);
commented_terms.back().comment(pair.second, true);
terms.push_back(pair.first);
comments.push_back(pair.second);
}
return logicalFormula(commented_terms, [](YaraExpressionBuilder& term1, YaraExpressionBuilder& term2) { return conjunction(term1, term2, true); });
auto output = logicalFormula(terms, comments, []( YaraExpressionBuilder& term1, std::string& comment1, YaraExpressionBuilder& term2) {
return conjunction(term1, comment1, term2);
});

output.commentBehind(comments.back(), false, "", false);
return output;
}

/**
Expand All @@ -1266,17 +1334,23 @@ YaraExpressionBuilder conjunction(const std::vector<std::pair<YaraExpressionBuil
*
* @return Builder.
*/
YaraExpressionBuilder disjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& terms)
YaraExpressionBuilder disjunction(const std::vector<std::pair<YaraExpressionBuilder, std::string>>& commented_terms)
{
std::vector<YaraExpressionBuilder> commented_terms;
for (const auto& pair : terms)
std::vector<YaraExpressionBuilder> terms;
std::vector<std::string> comments;
for (const auto& pair : commented_terms)
{
if (!pair.first.canBeBool())
error_handle(ArgType::Single, "or", "bool", pair.first.get());
commented_terms.push_back(pair.first);
commented_terms.back().comment(pair.second, true);
error_handle(ArgType::Single, "and", "bool", pair.first.get());
terms.push_back(pair.first);
comments.push_back(pair.second);
}
return logicalFormula(commented_terms, [](YaraExpressionBuilder& term1, YaraExpressionBuilder& term2) { return disjunction(term1, term2, true); });
auto output = logicalFormula(terms, comments, []( YaraExpressionBuilder& term1, std::string& comment1, YaraExpressionBuilder& term2) {
return disjunction(term1, comment1, term2);
});

output.commentBehind(comments.back(), false, "", false);
return output;
}

/**
Expand Down
6 changes: 4 additions & 2 deletions src/python/yaramod_python.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,8 @@ void addTokenStreamClass(py::module& module)
.def_property_readonly("front", &TokenStream::front)
.def_property_readonly("back", &TokenStream::back)
.def_property_readonly("tokens", &TokenStream::getTokens)
.def_property_readonly("tokens_as_text", &TokenStream::getTokensAsText);
.def_property_readonly("tokens_as_text", &TokenStream::getTokensAsText)
.def("comment_before_token", &TokenStream::commentBeforeToken, py::arg("message"), py::arg("insert_before"), py::arg("multiline") = false, py::arg("indent") = "", py::arg("linebreak") = true);
}

void addExpressionClasses(py::module& module)
Expand Down Expand Up @@ -821,7 +822,8 @@ void addBuilderClasses(py::module& module)
})
.def("__getitem__", &YaraExpressionBuilder::operator[])
.def("access", &YaraExpressionBuilder::access)
.def("comment", &YaraExpressionBuilder::comment, py::arg("message"), py::arg("multiline") = false, py::arg("indent") = "")
.def("comment", &YaraExpressionBuilder::comment, py::arg("message"), py::arg("multiline") = false, py::arg("indent") = "", py::arg("linebreak") = true)
.def("comment_behind", &YaraExpressionBuilder::commentBehind, py::arg("message"), py::arg("multiline") = false, py::arg("indent") = "", py::arg("linebreak") = true)
.def("contains", &YaraExpressionBuilder::contains)
.def("matches", &YaraExpressionBuilder::matches)
.def("iequals", &YaraExpressionBuilder::iequals)
Expand Down
56 changes: 56 additions & 0 deletions src/types/token_stream.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -490,6 +490,62 @@ void TokenStream::removeRedundantDoubleNewlines()
}
}

void TokenStream::comment(const std::string& message, bool multiline, const std::string& indent, bool linebreak)
{
auto insert_before = empty() ? end() : begin();
commentBeforeToken(message, insert_before, multiline, indent, linebreak);
}

void TokenStream::commentBehind(const std::string& message, bool multiline, const std::string& indent, bool linebreak)
{
auto insert_before = end();
while (insert_before != begin())
{
auto predecessor = std::prev(insert_before);
if (predecessor->getType() == TokenType::NEW_LINE)
--insert_before;
else
break;
}
commentBeforeToken(message, insert_before, multiline, indent, linebreak);
}

void TokenStream::commentBeforeToken(const std::string& message, TokenIt insert_before, bool multiline, const std::string& indent, bool linebreak)
{
if (!message.empty())
{
std::stringstream ss;
ss << indent;
if (multiline)
{
ss << "/*";
if (message.front() != '\n')
ss << " ";
for (auto c : message)
{
ss << c;
if (c == '\n')
ss << indent;
}
if (message.back() != '\n')
ss << " ";
ss << "*/";
emplace(insert_before, TokenType::COMMENT, ss.str());
}
else
{
for (auto item : message)
if (item == '\n')
throw YaramodError("Error: one-line comment must not contain \\n.");
ss << "// " << message;
emplace(insert_before, TokenType::ONELINE_COMMENT, ss.str());
}
// if (insert_before != end() && (linebreak || !multiline))
if (linebreak)
emplace(insert_before, TokenType::NEW_LINE, "\n");
}
}

void TokenStream::addMissingNewLines()
{
BracketStack brackets;
Expand Down
Loading