Skip to content

Defining New Operations

Vinícius Garcia edited this page Jun 20, 2018 · 66 revisions

Declaring new operations is much like declaring new functions. The only difference is that to add a new operation to the default operations map you must specify which operands and operator it will accept, for example a subtraction operation would expect the minus operator between 2 numbers and it would de described by a tuple like {NUM, "-", NUM}.

Since all built-in operations (except the call "()" operation) are defined using this method, you can take a look at how it is done on the builtin-features/operations.h file.

To understand which built-in types of tokens exist read the Basic Token Types, page this will be useful when declaring new operations.

Declaring New Operations

The first thing to do is to declare a function describing how the operation should behave:

packToken my_sum(const packToken& left, const packToken& right,
                 evaluationData* data) {
  return left.asDouble() + right.asDouble();
}

Note that since the return type is a packToken you can use its constructors, meaning you can return, integers, doubles, TokenMaps, TokenLists, etc and they will be converted automatically to packTokens.

The second step is to add it to the default operation map so it can be accessed by the calculator class.

This step is a little more complicated than it was when adding a new function for 2 reasons:

  1. You must declare all operators' precedence before declaring the operations.
  2. The order you add the operations to the default operations map is important, since the operations added earlier will be evaluated earlier and only the first matching operation will be used, so in ambiguous cases the first ones have priority.
struct Startup {
  Startup() {
    OppMap_t& opp = calculator::Default().opPrecedence;

    // Define the operator precedence relative to the other operators
    // any number between 1 and 0x7FFFFFF is a valid precedence.
    // The smaller the number the greater is its precedence.
    opp.add("+", 2);

    // Add the operation function to the default opMap:
    opMap_t& opMap = calculator::Default().opMap;
    opMap.add({NUM, "+", NUM}, &my_sum);
  }
} Startup;

Declaring Unary Operations

There are two types of unary operators: The left side ones and the right side ones.

Declaring any of them is very simple once you know how to declare normal operators as described above and it is explained in details below:

Declaring Left Unary Operations

When declaring a left-unary operation you must follow the same steps as you did for the binary operations with 2 changes:

  • Use opp.addUnary() instead of opp.add()
  • Use the type UNARY on the left side of the operation mask, i.e.: {UNARY, "-", NUM}
packToken left_minus_sign(const packToken& left, const packToken& right,
                          evaluationData* data) {
  // Ignore the left operand and use only the right one:
  return -right.asDouble();
}

struct Startup {
  Startup() {
    OppMap_t& opp = calculator::Default().opPrecedence;
    opMap_t& opMap = calculator::Default().opMap;

    // 1. When declaring its opPrecedence use `addUnary` instead of `add`:
    opp.addUnary("-", 2);
    // 2. When declaring the operation mask,
    //    use the UNARY mask for the left operand:
    opMap.add({UNARY, "-", NUM}, &left_minus_sign);
  }
} Startup;

Declaring Right Unary Operations

When declaring right unary operators it is also much like declaring binary operators with 2 changes:

  • Use opp.addRightUnary() instead of opp.add()
  • Use the type UNARY on the right side of the operation mask, i.e.: {NUM, "-", UNARY}
packToken right_minus_sign(const packToken& left, const packToken& right,
                           evaluationData* data) {
  // Ignore the right operand and use only the left one:
  return -left.asDouble();
}

struct Startup {
  Startup() {
    OppMap_t& opp = calculator::Default().opPrecedence;
    opMap_t& opMap = calculator::Default().opMap;

    // 1. When declaring its opPrecedence use `addRightUnary` instead of `add`:
    opp.addUnary("-", 2);
    // 2. When declaring the operation mask,
    //    use the UNARY mask for the right operand:
    opMap.add({NUM, "-", UNARY}, &right_minus_sign);
  }
} Startup;

Operator's Precedence

About operators precedence: it is a relative value that describes which operator should be evaluated first. For example the operator "*" should have a greater precedence (smaller number) than the operator "+", since in conventional math we expected the expression 2 + 2 * 2 to result in 6 instead of 8.

Note that smaller numbers represent greater precedence. So if you wanted to define the operator "*" to have more precedence than "+" you should set it to 1, that is a valid value and smaller than the precedence of "+" which was set to 2, e.g.:

    OppMap_t& opp = calculator::Default().opPrecedence;
    opp.add("*", 1);
    opp.add("+", 2);

To have a better idea of how this is done, you should check the Startup class of builtin-features/operations.h file. This class has all the built-in operations declared in a sane order and with sane precedences.

Operator's Associativity:

This describes if the operator will evaluate from left to right, or from right to left. Usually it is expected of an operator to work from left to right, e.g.: 3 - 2 - 1 should return 0, i.e: (3 - 2) - 1, but if its associativity was set as right-to-left instead, it would evaluate into: 3 - (2 - 1) == 2.

By default all operators associate from left to right. If you want to change this behavior for an specific operator just set it's operator precedence to a negative value. The negative sign will be discarded but internally it will remember to associate it from right to left.

To exemplify this if we wanted to set the operator "-" to associate from right-to-left having the same precedence we set for the operator "+", we could define it like this:

    OppMap_t& opp = calculator::Default().opPrecedence;
    opp.add("*", 1);
    opp.add("+", 2); opp.add("-", -2);

Operation Matching Loop

To execute an operation, first, the calculator identify the correct function to use. This process is done by the Operation Matching Loop.

Internally the operations are stored by key, where the key is the operator string, e.g.: "==" or "+". So the first step when matching operations is to find the operations that work with that specific operator. All others are discarded. If no match is found for that specific operator then the operation matching loop starts again for the operations that use the ANY_OP wildcard operator.

As a consequence of this, operations with the ANY_OP wild card will never match if there is a more specific operator mask defined for the current operator.

Once the operator group was selected the operations are stored in a list. And starting from the beginning of the list the system tries to execute it. To do so it will first try to match the left operand mask with the left operand and the right operand mask with the right operand (more on masks on the section below). If they match the operation function is called and if it returns normally the matching process stops here.

If it throws instead of returning, it might mean 2 things:

  1. A problem was found, stop it all.
  2. The exception was of the type Operation::Reject. In this case the operation execution is ignored, as if it never happened, and the matching loop resumes.

The Reject exception is useful when the mask system was not enough to restrict the operands you want, and you can then manually check and reject them inside the operation function. This process is a little slower, so if possible you should use this feature only in exceptional cases, and try to put the most common operations first on the list.

To illustrate this feature lets have some examples:

packToken my_reject_operation(const packToken& left, const packToken& right, evaluationData* data) {
  throw Operation::Reject();
}

packToken my_throw_operation(const packToken& left, const packToken& right, evaluationData* data) {
  throw my_exception("my message");
}

packToken my_operation(const packToken& left, const packToken& right, evaluationData* data) {
  return left.asDouble() + right.asDouble();
}

struct Startup {
  Startup() {
    OppMap_t& opp = calculator::Default().opPrecedence;
    opp.add("*", 1);
    opp.add("+", 2);
    
    opMap_t& opMap = calculator::Default().opMap;
    opMap.add({NUM, ANY_OP, NUM}, &my_throw_operation);
    opMap.add({REAL, "+", REAL}, &my_operation);
    opMap.add({NUM, "+", NUM}, &my_reject_operation);
    opMap.add({NUM, "+", INT}, &my_throw_operation);
    opMap.add({NUM, "+", NUM}, &my_operation);
    opMap.add({NUM, "*", NUM}, &my_throw_operation);
  }
} Startup;

First note that the first operation using the ANY_OP wildcard operator will be evaluated in last, even if it is declared first. Because the matching loop for this type of operation runs after the normal loop.

Now if we run: 10.0 + 10.0 the first match would be on the REAL/REAL mask, and it would return normally, making the loop ignore all remaining operations on the list.

The INT/INT operation: 10 + 10 would ignore the REAL/REAL mask and match first my_reject_operation, but it would throw an Operation::Reject and then the loop would resume. After that the operation would match NUM/INT mask, but this time the exception is not an expected one, meaning it will end the loop with an error, and the operation will not be processed.

If we try to run the INT/REAL operation: 10 + 10.0 the first match would be with my_reject_operation resulting in nothing being done, and the second match would be with my_operation that would return and then end the matching loop.

If we try to run the operation: 10 * 10 all the operations using "+" would be ignored and the only one available would throw a my_exception instance ending the matching loop with an error.

Using Masks

The mask system is mostly hidden from the user, and there isn't much to talk about it. But the important rule is this:

  1. Every type is denoted in 8 bits.
  2. The uppermost 3 bits have special meaning:
    1. 100 -> Implies a reference type (not accessible by the user)
    2. 010 -> Implies an iterator type, meaning this Token inherits from the Iterable class (i.e. maps, lists and tuples).
    3. 001 -> Implies the token is a number, e.g. an integer or a real number. Maybe a complex number as well.
  3. The remaining 5 bits are used normally to store the type ID. Ideally they should be different for each type meaning it is possible to have at most 2^5 = 32 different types, which is probably enough.

There is also 2 masks with special meaning: ANY_OP = "0xFF" and ANY_TYPE = 0xFF. Their code isn't important, but it is important to know they should be used when you don't care which operation or which operand your operation will receive.

Ok, and why is this important?

It is important to describe when your operation should be executed and when it should be ignored. For example when describing the my_sum operation described on the beginning of this page, we could write it a little differently:

// Note that the order is important:
opMap.add({INT, "+", INT}, &my_int_sum);
opMap.add({NUM, "+", NUM}, &my_sum);

This way when both operands are integers I can make a function that will always return a token of the INT type. And if they are not, the second operation is used and it will always return a token of the REAL type.

In other occasions you might want your operation to accept any operands on one or both sides, e.g.:

opMap.add({ANY_TYPE, "==", ANY_TYPE}, &my_equality_test);

And sometimes you might want to handle all operations with a given pair of operands by yourself, e.g.:

opMap.add({NUM, ANY_OP, NUM}, &my_numerical_operations_handler);

Creating Advanced Operations

When defining a new operation sometimes we need to access some contextual data to correctly implement the behavior of our operation. To make it possible for the operation function to use contextual information and also keep its signature simple, all contextual information used by the parser is available in the third argument of each function, namely the data argument:

  • evaluationData* data

There is a total of four attributes you might want to use. Of them 2 are already familiar:

  • data->scope: Use this if you need access to the local scope where the operation is taking place.
  • data->op: Use this if your operation accepts many possible operators and you want to find out which one was used.

The other two should be used when you are writing an operation that needs to reference the source variable of one of one your operands:

  • data->left
  • data->right

Both are of the type RefToken and have 2 important attributes: The key and the origin of your operand:

  • data->left.key of type INT or STR and data->left.source of type MAP or LIST respectively.
  • data->right.key of type INT or STR and data->right.source of type MAP or LIST respectively.

They are both of the packTokens and their actual ->type depend from where your operand came from:

  • If source->type == LIST, then the key will be an index of type INT and it came from a TokenList.
  • If source->type == MAP, then the key will be a name of type STR and it came from a TokenMap.
  • If source->type == NONE, it means this is a local variable and it belongs to the local scope.

If your variable is on the local scope, you need to keep in mind some concepts:

  1. The TokenMaps, especially the ones used for scopes, may have parent prototypes.
  2. The upmost prototype of a local scope is usually the global scope.
  3. Your variable may be declared in any of the prototype levels of the local scope or even not declared at all, it might just be a new variable.

Note: To handle this situation there is a special function that allows you to get the actual TokenMap where your variable is declared (or null if it is a new variable) called findMap(). For example if you want to get the actual TokenMap where your left operand was declared you could do it like this: data->scope.findMap(data->left->key)

To make sure your new operator will work without unexpected behaviors you should consider all these cases.

The easiest and also, the recomended approach is just to copy and modify the built-in definition of the assignment operator "=", which is defined in the file builtin-features/operators.inc:

packToken Assign(const packToken& left, const packToken& right, evaluationData* data) {
  packToken& key = data->left->key;
  packToken& origin = data->left->origin;

  // If the left operand has a name:
  if (key->type == STR) {
    std::string& var_name = key.asString();

    // If it is an attribute of a TokenMap:
    if (origin->type == MAP) {
      TokenMap& map = origin.asMap();
      map[var_name] = right;

    // If it is a local variable:
    } else {
      // Find the parent map where this variable is stored:
      TokenMap* map = data->scope.findMap(var_name);

      // Note:
      // It is not possible to assign directly to
      // the global scope. It would be easy for the user
      // to do it by accident, thus:
      if (!map || *map == TokenMap::default_global()) {
        data->scope[var_name] = right;
      } else {
        (*map)[var_name] = right;
      }
    }
  // If the left operand has an index number:
  } else if (key->type & NUM) {
    if (origin->type == LIST) {
      TokenList& list = origin.asList();
      size_t index = key.asInt();
      list[index] = right;
    } else {
      throw std::domain_error("Left operand of assignment is not a list!");
    }
  } else {
    throw undefined_operation(data->op, key, right);
  }
  return right;
}