# Regex Engine

> To implement a **regular expression (regex) domain-specific language (DSL)** using a **finite state machine (FSM)**, you would need to translate regex constructs (such as literals, alternation, concatenation, and quantifiers) into states and transitions in an FSM.

## Design

**Identify Regex Components**:
- **Literals**: Characters that match themselves.
- **Concatenation**: When two patterns are placed side by side, they must both match in sequence.
- **Alternation (`|`)**: A choice between two patterns.
- **Quantifiers (`*`, `+`, `?`)**: These specify how many times a pattern should appear.
- **Grouping (`()`)**: To group expressions together for capturing or for applying quantifiers.
   
**FSM Representation**:
- Each **state** in the FSM represents a point in the regex where we might match a character.
- **Transitions** represent possible state changes based on input characters.
- Special states for handling **alternation** and **quantifiers** would be added.

## Implementation

In [3]:
class FSM:
    def __init__(self):
        self.states = []  # List of states in the FSM
        self.final_states = set()  # Set of final states
        self.transitions = {}  # Dictionary for state transitions
        self.start_state = None  # Starting state of the FSM

    def add_state(self, state, is_final=False):
        self.states.append(state)
        if is_final:
            self.final_states.add(state)
        return state

    def add_transition(self, from_state, to_state, symbol):
        if from_state not in self.transitions:
            self.transitions[from_state] = []
        self.transitions[from_state].append((symbol, to_state))

    def set_start_state(self, state):
        self.start_state = state

    def match(self, input_string):
        current_state = self.start_state
        for char in input_string:
            transitions = self.transitions.get(current_state, [])
            matched = False
            for symbol, next_state in transitions:
                if symbol == char:
                    current_state = next_state
                    matched = True
                    break
            if not matched:
                return False
        return current_state in self.final_states

In [4]:
def parse_regex_to_fsm(regex):
    fsm = FSM()
    start_state = fsm.add_state('start')
    current_state = start_state

    for i, char in enumerate(regex):
        next_state = fsm.add_state(f'state_{i + 1}')
        fsm.add_transition(current_state, next_state, char)
        current_state = next_state
    
    fsm.set_start_state(start_state)
    fsm.final_states.add(current_state)
    return fsm

## Use

In [5]:
regex = "ab"  # Example regex string
fsm = parse_regex_to_fsm(regex)

input_string = "ab"
print(f"Does '{input_string}' match the regex? {fsm.match(input_string)}")  # Expected: True

input_string = "abc"
print(f"Does '{input_string}' match the regex? {fsm.match(input_string)}")  # Expected: False

Does 'ab' match the regex? True
Does 'abc' match the regex? False
