# Aho-Corasick Algorithm for Pattern Matching
The Aho-Corasick algorithm is a powerful search technique used for matching multiple patterns in a given text. It constructs a Trie (prefix tree) and builds failure links to perform efficient string matching. This method allows us to search for all occurrences of a set of keywords in linear time.

## Key Concepts:
- **Trie Construction**: A tree structure built for all the input patterns (keywords).
- **Failure Links**: These links allow us to fall back to previously matched states if a mismatch occurs.
In this notebook, we'll break down the algorithm step by step and implement it in Scala.

## Step 1: Define the State Case Class
Each node in the Trie corresponds to a state. A state has the following properties:
- **ID**: A unique identifier for the state.
- **Successors**: A map of characters leading to other states (the transitions).
- **EndState**: A boolean flag to mark whether this state corresponds to the end of a keyword.
- **Keyword**: An optional field to store the actual keyword when the state is an end state.

Let's define a `State` class to represent a node in our Trie.

In [4]:
import scala.collection.mutable.Queue
case class State(ID: Int, Successor: Map[String, Int], endState: Boolean, keyword: Option[String] = None)

import scala.collection.mutable.Queue

defined class State

## Step 2: Build the Trie
The next step in the Aho-Corasick algorithm is constructing the Trie. For each keyword, we traverse the Trie, creating new states as needed, and linking them accordingly. The last state of each keyword will be marked as an 'end state' to signify the completion of a keyword.
We will now define a function `buildGraph` to build the Trie from a list of keywords.

In [2]:
def buildGraph(keywords: List[String]): Map[Int, State] = {
  var nextID = 0
  var states = Map[Int, State](nextID -> State(nextID, Map(), endState = false)) // Root state

  for (keyword <- keywords) {
    var currentStateID = 0 // Start at the root state

    for (char <- keyword) {
      val currentState = states(currentStateID)

      // Check if there's already a state for this character, else create a new state
      val nextStateID = currentState.Successor.getOrElse(char.toString, {
        nextID += 1
        nextID
      })

      // Update the current state to include the new successor
      states = states.updated(currentStateID, currentState.copy(Successor = currentState.Successor + (char.toString -> nextStateID)))

      // Add the new state if it doesn't exist
      if (!states.contains(nextStateID)) {
        states += nextStateID -> State(nextStateID, Map(), endState = false)
      }

      // Move to the next state
      currentStateID = nextStateID
    }

    // Mark the last state of the keyword as an end state
    val finalState = states(currentStateID)
    states += currentStateID -> finalState.copy(endState = true, keyword = Some(keyword))  }

  states
}

defined function buildGraph

## Step 3: Visualize the Trie
To better understand the structure of our Trie, we can visualize it. The following function `printGraph` will display each state and its successors, as well as whether it is an end state and which keyword it represents (if applicable).

In [3]:
def printGraph(states: Map[Int, State]): Unit = {
  states.foreach { case (id, state) =>
    val keywordStr = state.keyword match {
      case Some(kw) => s", Keyword = $kw"
      case None => ""
    }
    println(s"State $id: Successors = ${state.Successor}, End State = ${state.endState}$keywordStr")
  }

var keywords = List("hers", "she", "his")
printGraph(buildGraph(keywords))

State 0: Successors = Map(h -> 1, s -> 5), End State = false
State 5: Successors = Map(h -> 6), End State = false
State 1: Successors = Map(e -> 2, i -> 8), End State = false
State 6: Successors = Map(e -> 7), End State = false
State 9: Successors = Map(), End State = true, Keyword = his
State 2: Successors = Map(r -> 3), End State = false
State 7: Successors = Map(), End State = true, Keyword = she
State 3: Successors = Map(s -> 4), End State = false
State 8: Successors = Map(s -> 9), End State = false
State 4: Successors = Map(), End State = true, Keyword = hers


defined function printGraph

## Step 4: Compute Failure Links
Now we need to compute the failure links, which will allow the algorithm to jump to the next possible state when a mismatch occurs. The failure link for a state is a pointer to another state that might lead to a match.
We will now define the `computeFail` function to compute these failure links for all states.

In [6]:
def computeFail(states: Map[Int, State]): Map[Int, Int] = {
  var fail = Map[Int, Int]()
  val queue = Queue[Int]()

  // Set fail link for each state
  states.filter(_._1 != 0).foreach { case (stateID, state) =>
    fail += stateID -> 0 // Root's fail link is always 0
    queue.enqueue(stateID)
  }

  while (queue.nonEmpty) {
    val currentStateID = queue.dequeue()
    val currentState = states(currentStateID)

    currentState.Successor.foreach { case (char, nextStateID) =>
      val failStateID = fail(currentStateID)
      var failState = states(failStateID)
      while (!failState.Successor.contains(char) && failStateID != 0) {
        failState = states(failStateID)
        failStateID = fail(failStateID)
      }
      if (failStateID != 0) {
        fail += nextStateID -> failStateID
      } else {
        fail += nextStateID -> 0
      }
      queue.enqueue(nextStateID)
    }
  }

  fail
}

defined function computeFail

## Step 5: Search the Text
Now that we have built the Trie and computed the failure links, the final step is to search the input text. We will implement a function `search` that takes the text and the list of failure links, and returns all the matched keywords found in the text.

In [7]:
def search(text: String, states: Map[Int, State], fail: Map[Int, Int]): List[String] = {
  var currentStateID = 0
  var result = List[String]()

  for (char <- text) {
    // Follow the successor link for the character
    while (!states(currentStateID).Successor.contains(char.toString) && currentStateID != 0) {
      currentStateID = fail(currentStateID)
    }
    currentStateID = states(currentStateID).Successor.getOrElse(char.toString, 0)

    // If we're at an end state, add the keyword to the result
    var tempStateID = currentStateID
    while (tempStateID != 0) {
      val currentState = states(tempStateID)
      if (currentState.endState) {
        result = currentState.keyword.get :: result
      }
      tempStateID = fail(tempStateID)
    }

  }
  result.reverse
}

defined function search

## Step 6: Test the Algorithm
Now that we've implemented all the components, we can test the Aho-Corasick algorithm. We'll run it on a sample text and a list of keywords.

In [8]:
def testAhoCorasick(): Unit = {
  val keywords = List("hers", "she", "his")
  val text = "ushers"

  val states = buildGraph(keywords)
  val fail = computeFail(states)
  val result = search(text, states, fail)

  println(s"Matched Keywords: ${result.mkString(", ")}")
}

testAhoCorasick()

defined function testAhoCorasick

## Conclusion
The Aho-Corasick algorithm efficiently searches for multiple keywords in a given text. By building a Trie and computing failure links, we can ensure that we do not need to start from the beginning each time a mismatch occurs. This results in faster searches, especially when dealing with large datasets.
We have implemented the algorithm step-by-step and tested it with sample data.