Skip to content

dansandu/glyph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Glyph

A simple C++ implementation of a CLR(1) parser (canonical left to right parser with one lookahead terminal).

Building and deploying the project

The project uses praline to build and deploy the project. Praline can be cloned from here. Once cloned, the project can be built and deployed using the following console command:

praline.py deploy

This will build and deploy the glyph artifact to the repository making it available to other projects.

Simple symbolic calculator

The following is a simple symbolic calculator application. The project is a directory named calculator with a file inside it named Pralinefile that has the following contents:

organization: foobar
artifact: calculator
version: 1.0.0
artifact_type: executable
dependencies:
- organization: dansandu
  artifact: glyph
  version: 1.0.0.SNAPSHOT

Running the following command inside the project directory console will quickly create the project skeleton:

praline.py main

The command will create a "Hello, world!" executable. Moving forward, the code will be changed to evaluate formulae from the command line. In the project directory open the main source file sources/foobar/calculator/executable.cpp and change it to:

#include "dansandu/glyph/node.hpp"
#include "dansandu/glyph/parser.hpp"
#include "dansandu/glyph/regex_tokenizer.hpp"

#include <iostream>
#include <string>
#include <string_view>
#include <vector>

using dansandu::glyph::node::Node;
using dansandu::glyph::parser::Parser;
using dansandu::glyph::regex_tokenizer::RegexTokenizer;

template<typename T>
auto pop(std::vector<T>& stack)
{
    auto value = std::move(stack.back());
    stack.pop_back();
    return value;
}

int main(int argumentsCount, char** arguments)
{
    if (argumentsCount <= 1)
    {
        std::cout
            << "Pass a formula as a program argument in order to parse it!"
            << std::endl;
        return 0;
    }

    // This is the grammar used to parse arithmetic formulae. Non-terminals
    // start with capital letters while terminals start with lower case
    // letters. Both non-terminals and terminals are alphanumeric. The indexing
    // of each production rule, marked with comments, is used to evaluate the
    // abstract syntax tree yielded by the parser. There are many grammars
    // which recognize arithmetic formulae. This one has operator precedence
    // as a side effect of its structure.
    const auto grammar = R"(
        /*0*/ Start    -> Sums
        /*1*/ Sums     -> Sums add Products
        /*2*/ Sums     -> Products
        /*3*/ Products -> Products multiply Value
        /*4*/ Products -> Value
        /*5*/ Value    -> number
    )";

    // Creating the parser from the grammar.
    const auto parser = Parser{grammar};

    // Each symbol in the grammar is mapped to an integer by the parser to
    // speed up parsing.
    const auto add      = parser.getTerminalSymbol("add");
    const auto multiply = parser.getTerminalSymbol("multiply");
    const auto number   = parser.getTerminalSymbol("number");

    // Whitespace doesn't influence the parsing so it can be discarded from the
    // original string. This will simplify the grammar. Here the whitespace
    // symbol is mapped to a special symbol which is skipped by the parser when
    // encountered.
    const auto whitespace = parser.getDiscardedSymbolPlaceholder();

    // Because terminals are alphanumeric the symbols need to be mapped to
    // their actual representation in the input string. The regex tokenizer
    // maps terminals to patterns. The order of the patterns matters because
    // the first matching pattern is used to generate the token.
    const auto tokenizer =
        RegexTokenizer{{{add,        "\\+"},
                        {multiply,   "\\*"},
                        {number,     "(?:[1-9]\\d*|0)(?:\\.\\d+)?"},
                        {whitespace, "\\s+"}}};

    // The formula to be parsed is an arbitrary string given as a program
    // argument. The abstract syntax tree is generated by the parser.
    const auto formula = std::string_view{arguments[1]};
    const auto nodes   = parser.parse(formula, tokenizer);

    // The abstract syntax tree can be processed with a stack to avoid
    // recursive functions.
    auto stack = std::vector<double>{};

    // Each node in the abstract syntax tree can be either a token from the
    // input string or a production rule index. The left hand side non-terminal
    // of the production rule (before ->) is the result of composing
    // non-terminals and terminals from the right hand side of the rule
    // (after ->).
    for (const auto& node : nodes)
    {
        if (node.isToken())
        {
            // The parser has encountered a terminal in the formula. Terminals
            // are mapped to tokens.
            const auto token = node.getToken();

            // For this application only number tokens are considered. These
            // are the only symbols which hold values. The add and multiply
            // terminals are used in conjunction with their production rule and
            // therefore can be omitted.
            if (token.getSymbol() == number)
            {
                // The number is extracted from the string, converted to double
                // and pushed to the stack. Later on these values are composed
                // using production rules.
                const auto string =
                    std::string{formula.cbegin() + token.begin(),
                                formula.cbegin() + token.end()};
                stack.push_back(std::stod(string));
            }
        }
        else
        {
            // The parser has applied a production rule. The
            // node.getRuleIndex() invocation yields the index of the applied
            // production rule. These indices are used to compose values from
            // the stack.
            switch (node.getRuleIndex())
            {
            case 1:
            {
                // The parser has applied the second production rule from the
                // grammar. The last two values are popped from the stack and
                // summed togheter. The sum is commutative and the order the
                // values are popped from the stack doesn't matter. For an
                // uncommutative operation like subtraction the values should
                // be popped in reverse.
                const auto rhs = pop(stack);
                const auto lhs = pop(stack);
                stack.push_back(lhs + rhs);
                break;
            }
            case 3:
            {
                // The parser has applied the 4th rule. The last two values are
                // popped from the stack and multiplied togheter.
                const auto rhs = pop(stack);
                const auto lhs = pop(stack);
                stack.push_back(lhs * rhs);
                break;
            }
            case 0:
            case 2:
            case 4:
            case 5:
                // These production rules have no net effect on the result of
                // the formula. They just forward values.
                break;
            default:
                // All production rule indices must be handled by the switch.
                throw std::logic_error{"production rules were not exhausted"};
            }
        }
    }

    // The stack should now hold the final result.
    std::cout << formula << " = " << stack.back() << std::endl;

    return 0;
}

The following command runs the application given some program arguments:

praline.py main --arguments "3 + 5 * 10 + 40"

The command should print something like this:

3 + 5 * 10 + 40 = 93

This was the simple symbolic calculator. More features can be added such as subtraction, division, powers, parentheses, signed values, variables, functions or even a fully-fledged programming language. Click here to see a more sophisticated example.