<a href="https://colab.research.google.com/github/bekykm/phd-lowcode-prototypes/blob/main/Lexical_Analysis(Toke_Identification).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**1. Define Token Class**

**Tokens** can generally include **keywords**, **identifiers**, **operators**, **literals**, and **punctuation**.

Create a class to represent tokens.


In [7]:
class Token:
    # Initialize the token with its type and value
    def __init__(self, type: str, value: str):
        self.type = type  # Token type (e.g., IDENTIFIER, NUMBER)
        self.value = value  # Token value (the actual string)

    # String representation of the token for easier debugging
    def __repr__(self):
        return f'Token({self.type}, {self.value})'

**2. Create a Lexer Class**

This class will handle tokenization.

It will read characters from the input, build tokens, and maintain the current position in the input.

In [13]:
class Lexer:
    # Initialize the lexer with the input text
    def __init__(self, text: str):
        self.text = text  # Save the input text
        self.position = 0  # Initialize the current position in the text
        # Set the current character based on the initial position
        self.current_char = self.text[self.position] if self.text else None
        self.tokens = []  # List to hold tokens

    # Method to raise an exception for invalid characters
    def error(self):
        raise Exception('Invalid character')

    # Advance the current position and update the current character
    def advance(self):
        self.position += 1  # Move to the next character
        if self.position >= len(self.text):  # Check if the end of input is reached
            self.current_char = None  # Set current character to None
        else:
            self.current_char = self.text[self.position]  # Update the current character

    # Skip whitespace characters
    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()  # Advance while whitespace is encountered

    # Generate an IDENTIFIER token (for variable names)
    def id(self):
        result = ''  # String to build the identifier
        while self.current_char is not None and (self.current_char.isalnum() or self.current_char == '_'):
            result += self.current_char  # Append current character to the result
            self.advance()  # Move to the next character
        return Token('IDENTIFIER', result)  # Return the identifier token

    # Generate a NUMBER token (for numeric literals)
    def number(self):
        result = ''  # String to build the number
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char  # Append current character to the result
            self.advance()  # Move to the next character
        return Token('NUMBER', result)  # Return the number token

    # Generate an OPERATOR token (for operators like +, -, *, /)
    def operator(self):
        result = self.current_char  # Current character is the operator
        self.advance()  # Move to the next character
        return Token('OPERATOR', result)  # Return the operator token

         # Main tokenize method to parse the input text and generate tokens
    def tokenize(self):
        while self.current_char is not None:  # Continue until all characters are processed
            if self.current_char.isspace():  # If current character is a whitespace
                self.skip_whitespace()  # Skip the whitespace
                continue  # Continue to the next character
            if self.current_char.isalpha():  # If current character is alphabetic
                self.tokens.append(self.id())  # Generate an identifier token and append to tokens
                continue  # Continue to the next character
            if self.current_char.isdigit():  # If current character is a digit
                self.tokens.append(self.number())  # Generate a number token and append to tokens
                continue  # Continue to the next character
            if self.current_char in '+-*/=':  # If current character is an operator (added '=')
                self.tokens.append(self.operator())  # Generate an operator token and append to tokens
                continue  # Continue to the next character
            self.error()  # Raise an error if an invalid character is found
        return self.tokens  # Return the list of generated tokens

**3. Run the Lexer:** Implement the Main Function

Create a simple input string and run the lexer to see the tokens.

In [14]:
# Function to run the lexer and demonstrate its functionality
def main():
    text = "var1 = 10 + 20 * var2"  # Example input string for the lexer
    lexer = Lexer(text)  # Create a Lexer object with the input text
    tokens = lexer.tokenize()  # Tokenize the input string
    for token in tokens:  # Iterate over the list of tokens
        print(token)  # Print each token

main()  # Call the main function to execute the lexer

Token(IDENTIFIER, var1)
Token(OPERATOR, =)
Token(NUMBER, 10)
Token(OPERATOR, +)
Token(NUMBER, 20)
Token(OPERATOR, *)
Token(IDENTIFIER, var2)
