# WFST Tutorial
## Introduction to Weighted Finite State Transducers (WFSTs)
Weighted Finite State Transducers (WFSTs) are automata where each transition between states is labeled with an input symbol, an output symbol, and a weight (usually representing cost or probability). These structures are commonly used in speech recognition, text normalization, and other fields where mapping between sequences with a cost function is needed.

In this notebook, we'll explore how to construct your own WFST using the provided `WFST` class. We will go through key methods, explain their purposes, and demonstrate how to implement custom transitions and states.
Let's dive into the core aspects of building a WFST.


## WFST Class Overview
The `WFST` class enables you to create a custom finite state transducer by adding states, transitions, and weights between them. Below is a list of the key methods and how they work:

- `set_start_state(state)`: Sets and adds the start state of the WFST. This is the initial state from which transitions begin.
- `add_state(state)`: Adds an intermediate state. States can represent different stages in the transducer.
- `add_final_state(state)`: Marks the state as a final state, indicating that the transduction can successfully terminate when reaching this state.
- `add_transition(from_state, to_state, input_symbol, output_symbol, weight)`:
    - Adds a transition (or arc) from `from_state` to `to_state` based on the provided input and output symbols, with an associated weight (or cost).
    - This is the core method for building relationships between states.
- `add_epsilon_transition(from_state, to_state)`: Adds a transition between states without requiring an input (epsilon transition).
- `process(input_sequence)`: Processes an input sequence, attempting to traverse the WFST and return the output sequence along with the total weight (or cost) of the path.


## Step-by-Step Example: Creating a Simple WFST
Let's walk through an example of how to create a WFST with a few states and transitions.



In [None]:
import spacy
from spacy.matcher import Matcher
from wfst import WFST, CompositeWFST, composite_wfst, tokenize, extract_numerical_data

### Step 1: Define the WFST


In [None]:
wfst = WFST('name')

This creates an empty WFST. Now we need to add states and transitions.

### Step 2: Set the Start State
```python
wfst.set_start_state(0)
```
Here, we set the start state to `0`. States are represented by integers.

### Step 3: Add States and Transitions
Let's add some intermediary states and transitions between them.
```python
wfst.add_state(1)
wfst.add_state(2)
wfst.add_transition(0, 1, 'a', 'x', 0.5)
wfst.add_transition(1, 2, 'b', 'y', 1.0)
```
This code adds two intermediary states (1 and 2) and connects them with transitions. The first transition maps input `a` to output `x` with a weight of `0.5`, and the second transition maps `b` to `y` with a weight of `1.0`.

### Step 4: Define a Final State
```python
wfst.add_final_state(2)
```
Marking state `2` as the final state means that if the WFST reaches this state, it successfully processes an input sequence.

### Step 5: Process an Input Sequence
Now let's process an input sequence to see how it works.
```python
input_sequence = ['a', 'b']
output_sequence, total_weight = wfst.process(input_sequence)
print(f'Output: {output_sequence}, Weight: {total_weight}')
```
This processes the input sequence `['a', 'b']` through the WFST and prints the output sequence and the total weight (cost) of the transitions.

You can now build your own WFST by following similar steps, adding states, transitions, and processing sequences.


## Advanced Example: Using Epsilon Transitions
Sometimes, it's necessary to have transitions that don't consume any input (epsilon transitions).
Let's add an epsilon transition to our previous example.
```python
wfst.add_epsilon_transition(2, 0)
```
This adds a transition from state `2` back to state `0` without consuming any input symbol. Epsilon transitions are useful in many applications such as text normalization.


## Building Your Own WFST
You now have the basic tools to build your own WFST. Follow these steps to customize it for different applications:
1. Define your start, intermediate, and final states.
2. Add transitions with appropriate input/output symbols and weights.
3. Use epsilon transitions if needed.
4. Process input sequences to obtain output sequences and their associated costs.

Feel free to experiment and adapt the WFST structure for tasks like text normalization, speech processing, or sequence mapping.
