
- `'IDENTIFIER'`: This pattern uses `^[a-zA-Z_][a-zA-Z0-9_]*$`. 
  - `^`: Asserts the start of the string.
  - `[a-zA-Z_]`: Matches a single letter or underscore at the beginning.
  - `[a-zA-Z0-9_]*`: Matches zero or more occurrences of letters, digits, or underscores.
  - `$`: Asserts the end of the string.

- `'INTEGER'`: This pattern uses `^[0-9]+$`.
  - `^`: Asserts the start of the string.
  - `[0-9]+`: Matches one or more digits.
  - `$`: Asserts the end of the string.

- `'FLOAT'`: This pattern uses `^[0-9]+\.[0-9]+$`.
  - `^`: Asserts the start of the string.
  - `[0-9]+\.[0-9]+`: Matches digits, a dot, and more digits (representing a simple float).
  - `$`: Asserts the end of the string.

- `'OPERATOR'`: This pattern uses `^[+\-*/%]$`.
  - `^`: Asserts the start of the string.
  - `[+\-*/%]`: Matches a single character among +, -, *, /, or %.
  - `$`: Asserts the end of the string.

- `'PUNCTUATION'`: This pattern uses `^[;,()]$`.
  - `^`: Asserts the start of the string.
  - `[;,()]`: Matches a single character among ;, ,, (, or ).
  - `$`: Asserts the end of the string.

- `'KEYWORD'`: This pattern uses `^(if|else|while|for|int|float|return|def|class|import|from)$`.
  - `^`: Asserts the start of the string.
  - `(if|else|while|for|int|float|return|def|class|import|from)`: Matches one of the specified keywords.
  - `$`: Asserts the end of the string.

These regular expressions help define patterns for different types of tokens, allowing the program to recognize and categorize lexemes based on these patterns.

In [None]:
import re

def recognize_tokens(lexeme):
    # Define regular expressions for different types of tokens
    token_patterns = {
    'IDENTIFIER': r'^[a-zA-Z_][a-zA-Z0-9_]*$',  # Matches valid identifiers (start with a letter or underscore, 
                                                # followed by letters, digits, or underscores)
    'INTEGER': r'^[0-9]+$',  # Matches integers (one or more digits)
    'FLOAT': r'^[0-9]+\.[0-9]+$',  # Matches floats (digits followed by a dot and more digits)
    'OPERATOR': r'^[+\-*/%]$',  # Matches operators (+, -, *, /, %)
    'PUNCTUATION': r'^[;,()]$',  # Matches common punctuation symbols (;, ,, (, ))
    'KEYWORD': r'^(if|else|while|for|int|float|return|def|class|import|from)$'  # Matches specific keywords
    # Add more token patterns as needed
}


    # Check each token pattern and return the corresponding token type if matched
    for token_type, pattern in token_patterns.items():
        if re.match(pattern, lexeme):
            return token_type

    # If no pattern is matched, return 'UNKNOWN'
    return 'UNKNOWN'

if __name__ == "__main__":
    user_input = input("Enter one lexeme: ")

    # Recognize the token type for the given lexeme
    token_type = recognize_tokens(user_input)

    # Display the result
    print(f"The lexeme '{user_input}' represents a '{token_type}' token.")
