<a href="https://colab.research.google.com/github/bekykm/phd-lowcode-prototypes/blob/main/Syntax_Analyzer(Toke_Identification).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Sample Syntax Analysis Implementation**

We'll create a parser that handles basic arithmetic assignment statements like:

* x = 5 + 3;
* y = 7;

It checks whether the code follows this grammar:

**Suppose Grammar:**
* stmt        → ID = expr ;
* expr        → term | term + term
* term        → NUMBER


**Define the structure of the Abstract Syntax Tree (AST)** used in syntax analysis to represent **assignment** and **arithmetic expressions** as follows.

In [5]:
# ------------------------------
# Syntax Analysis (AST Construction)
# ------------------------------

# Define AST node classes
class ASTNode:
    pass

class AssignNode(ASTNode):
    def __init__(self, variable, expression):
        self.variable = variable
        self.expression = expression
    def __str__(self):
        return f"Assign({self.variable}, {self.expression})"

class NumberNode(ASTNode):
    def __init__(self, value):
        self.value = value
    def __str__(self):
        return f"{self.value}"

class PlusNode(ASTNode):
    def __init__(self, left, right):
        self.left = left
        self.right = right
    def __str__(self):
        return f"Plus({self.left}, {self.right})"

**Main function to start parsing tokens and build the AST.**

In [6]:
# ------------------------------
# Parser Implementation
# ------------------------------

def parse(tokens):
    """
    Parse the token list into an AST according to the grammar.
    Returns the root AST node or None if syntax error occurs.
    """
    pos = 0 # Token position tracker

    def current_token():   # Returns the current token or None if out of bounds.
        return tokens[pos] if pos < len(tokens) else None

    def consume(expected_type):  # Consumes and returns the current token if it matches the expected type.
        nonlocal pos
        token = current_token()
        if token and token[0] == expected_type:
            pos += 1
            return token
        print(f"Syntax Error: Expected {expected_type}, got {token}")
        return None

    def parse_term():   # Parses a numeric literal and returns a NumberNode.
        token = current_token()
        if token and token[0] == "NUMBER":
            number = consume("NUMBER")
            return NumberNode(int(number[1]))
        return None

    def parse_expr():   # Parses an expression: either a single number or an addition of two terms.
        left = parse_term()
        if not left:
            return None
        token = current_token()
        if token and token[0] == "PLUS":
            consume("PLUS")
            right = parse_term()
            if not right:
                return None
            return PlusNode(left, right)
        return left

    def parse_stmt():    # Parses an assignment statement of the form: ID = expr ;
        token = current_token()
        if token and token[0] == "ID":
            var = consume("ID")[1]
            if not consume("EQUALS"):
                return None
            expr = parse_expr()
            if not expr:
                return None
            if not consume("SEMI"):
                return None
            return AssignNode(var, expr)
        print("Syntax Error: Expected ID at the beginning of statement.")
        return None

    return parse_stmt()

**Token Input Simulation**

Creates a list of tokens that simulate the lexical analysis output for the statement ***x = 5 + 3;***.

In [12]:
# Simulated tokens for: x = 5 + 3;
tokens = [
    ("ID", "x"),
    ("EQUALS", "="),
    ("NUMBER", "5"),
    ("PLUS", "+"),
    ("NUMBER", "3"),
    ("SEMI", ";")
]

# Parse and print the AST
ast = parse(tokens)
print("AST Output:")
print(ast)


AST Output:
Assign(x, Plus(5, 3))
