In [1]:
%pylab inline
from enum import Enum

Populating the interactive namespace from numpy and matplotlib


# Algebra Solver

## Problem statement

The first stage would be to being able to solve algebraic expressions with only operations and constants.

For example, solve: $\dfrac{1}{2}\cdot3+5\cdot(2\cdot7^3)+(-1)^2$.

In later stages it might be desirable to solve equalities, inequalities, and express in terms.

### Operations

We should be able to support the following:

* Addition/subtraction
* Multiplication/division (also implicit multiplication)
* Parenthesis
* Exponents
* Square root
* Trigonometric functions
* Logarithms

### Order of operations

We should follow PEMDAS:

1. Parenthesis
2. Exponents
3. Multiplication/Division
4. Addition/Subtraction

## Parsing

### Token class

This will hold the type of token and the value of the token. There are different types of tokens:

**Types:** `operator`, `function`, `term`, `constant`

In [106]:
class Token():
    def __init__(self, type, value):
        self.type = type
        self.value = value
    
    def __repr__(self):
        return "['%s': '%s']" % (self.type, self.value)

### Tokenizer class

This is engine for the parser, all the character/digits will be converted into a list of tokens which can be used in lexical analysis later on.

**Whitespace characters:** `<space>`, `\n`, `\r`, `\t`.

**Supported operators:** `+`, `-`, `*`, `/`, `=`, `^`, `(`, `)`, `,`.

**Supported functions:** `sqrt`, `sin`, `asin`, `cos`, `acos`, `tan`, `atan`, `log`, `ln`.

In [8]:
import string

class Tokenizer():
    whitespace = [' ', '\r', '\n', '\t']
    operators = {'+':'add', '-':'subtract', '*':'multiply', '/':'divide', '=':'equal',
               '^':'power', '(':'left-parenthesis', ')':'right-parenthesis', ',':'comma'}
    functions = ['sqrt', 'sin', 'asin', 'cos', 'acos', 'tan', 'atan', 'log', 'ln']
    
    def __init__(self, expression):
        self.expression = expression
        self.index = 0
        self.tokens = []
        self.tokenize()
        
    def next(self, skip = 1):
        self.index += skip
        
    def end(self):
        return self.index >= len(self.expression)
    
    def nextCharacter(self):
        return self.expression[self.index:self.index+1]
    
    def eatWhitespace(self):
        while self.nextCharacter() in self.whitespace:
            self.next()
            
    def eatDigit(self):
        digit = ''
        while self.nextCharacter().isdigit() or self.nextCharacter() == '.':
            digit += self.nextCharacter()
            self.next()
        return digit
    
    def eatWord(self):
        word = ''
        while self.nextCharacter() in string.ascii_lowercase + string.ascii_uppercase:
            word += self.nextCharacter()
            self.next()
        return word
            
    def tokenize(self):
        while(not self.end()):
            self.eatWhitespace()
            if self.nextCharacter() in self.operators:
                token = Token('operator', self.operators[self.nextCharacter()])
                self.tokens.append(token)
                self.next()
            elif self.nextCharacter() in string.ascii_lowercase + string.ascii_uppercase:
                word = self.eatWord()
                if word in self.functions:
                    token = Token('function', word)
                elif len(word) == 1:
                    token = Token('term', word)
                else:
                    raise ValueError('Invalid character at index %i' % self.index)
                self.tokens.append(token)
            elif self.nextCharacter().isdigit():
                digit = self.eatDigit()
                token = Token('constant', digit)
                self.tokens.append(token)
            else:
                raise ValueError('Invalid character at index %i' % self.index)

We can use the tokenizer with any mathematical expression, like:

In [9]:
tokenizer = Tokenizer('(2g + sqrt(36x) * ln(2) * acos(6)A * 2) / 3.14 tan(5) = log(23,5) + 3')

This is example of the output of the tokenizer. The result is a list of `Token` objects.

In [10]:
tokenizer.tokens

[['operator': 'left-parenthesis'],
 ['constant': '2'],
 ['term': 'g'],
 ['operator': 'add'],
 ['function': 'sqrt'],
 ['operator': 'left-parenthesis'],
 ['constant': '36'],
 ['term': 'x'],
 ['operator': 'right-parenthesis'],
 ['operator': 'multiply'],
 ['function': 'ln'],
 ['operator': 'left-parenthesis'],
 ['constant': '2'],
 ['operator': 'right-parenthesis'],
 ['operator': 'multiply'],
 ['function': 'acos'],
 ['operator': 'left-parenthesis'],
 ['constant': '6'],
 ['operator': 'right-parenthesis'],
 ['term': 'A'],
 ['operator': 'multiply'],
 ['constant': '2'],
 ['operator': 'right-parenthesis'],
 ['operator': 'divide'],
 ['constant': '3.14'],
 ['function': 'tan'],
 ['operator': 'left-parenthesis'],
 ['constant': '5'],
 ['operator': 'right-parenthesis'],
 ['operator': 'equal'],
 ['function': 'log'],
 ['operator': 'left-parenthesis'],
 ['constant': '23'],
 ['operator': 'comma'],
 ['constant': '5'],
 ['operator': 'right-parenthesis'],
 ['operator': 'add'],
 ['constant': '3']]

## Expression tree

The next step is to create an expression tree from the tokens. We should be able to solve expressions with only constants and operations.

### Node class

In [11]:
class Node():
    def __init__(self, left, right, operation):
        self.left = left
        self.right = right
        self.operation = operation
        
class Constant():
    def __init__(self, value):
        self.value = value

### ExpressionTree class

In [100]:
class ExpressionTree():
    def __init__(self, tokens):
        self.tokens = tokens
        self.stack = []
        self.index = 0
        self.parse()
        
    def next(self, skip = 1):
        self.index += skip
        
    def peek(self):
        return self.stack[-1]
        
    def end(self):
        return self.index >= len(self.tokens)
    
    def nextToken(self):
        return self.tokens[self.index:self.index+1][0]
    
    def findClosingParenthesisOffset(self):
        parenthesis = 1
        offset = 0
        while parenthesis > 0:
            if self.index+offset >= len(self.tokens):
                return -1
            if self.tokens[self.index+offset:self.index+offset+1][0].value == 'right-parenthesis':
                parenthesis -= 1
            elif self.tokens[self.index+offset:self.index+offset+1][0].value == 'left-parenthesis':
                parenthesis += 1
            offset += 1
        return offset-1
    
    def parse(self):
        iteration = 0
        while(not self.end()):
            token = self.nextToken()
            print('Iteration %i, Stack: %s, Token: %s' % (iteration, self.stack, token))
            if token.type == 'constant':
                if len(self.stack) > 0 and self.peek().type == 'operator':
                    operation = self.stack.pop()
                    ltoken = self.stack.pop()
                    if operation.value == 'add':
                        self.stack.append(Token('constant', int(ltoken.value) + int(token.value)))
                    elif operation.value == 'multiply':
                        self.stack.append(Token('constant', int(ltoken.value) * int(token.value)))
                    elif operation.value == 'left-parenthesis':
                        offset = self.findClosingParenthesisOffset()
                        print(offset+self.index)
                else:
                    self.stack.append(token)
            elif token.type == 'operator':
                self.stack.append(token)
            elif token.type == 'function':
                raise ValueError('Function at index %i is not supported.' % self.index)
            elif token.type == 'term':
                raise ValueError('Term at index %i is not supported.' % self.index)
            else:
                raise ValueError('Unrecognized token type: %s' % token.type)
            self.next()
            iteration+=1

We can use this to parse the tokens into an expression tree, like so:

In [105]:
tokenizer = Tokenizer('1+2*5')
expressionTree = ExpressionTree(tokenizer.tokens)
print('The result is: %i ' % expressionTree.stack[0].value)

Iteration 0, Stack: [], Token: ['constant': '1']
Iteration 1, Stack: [['constant': '1']], Token: ['operator': 'add']
Iteration 2, Stack: [['constant': '1'], ['operator': 'add']], Token: ['constant': '2']
Iteration 3, Stack: [['constant': '3']], Token: ['operator': 'multiply']
Iteration 4, Stack: [['constant': '3'], ['operator': 'multiply']], Token: ['constant': '5']
The result is: 15 


**Issues / current state**

The current implementation doesn't account for the order of operations. It should not solve the equation, but build an expression tree instead. This was implemented as a test. Also, the parenthesis functionality is not tested, the idea was to recursively call the `ExpressionTree` for each set of a parentheses and push the output to the stack, or created a simplified token list (preferred). 

Finally, some sort of state machine should cycle through the order of operations and simplify the equation accordingly. In theory it should be easy to solve it this way.

Other goal: try to solve the equations with a `Stack` structure, there is probably a method but I don't want to Google that yet.

## Lexical analysis

In this phase we are going to verify if the entered syntax is valid. Things we check in this phase are:

* Is there more than $1$ `=` sign in the expression?
* Does every function get the right amount of arguments?

### LexicalAnalysis class

In [None]:
def LexicalAnalysis():
    def __init__(self, expressionTree):
        self.expressionTree = expressionTree