# State Search
## Motivation
Many problems in AI can be defined as follows:

**Given a start state and goal state, find the optimal sequence of actions to take so that the agent reaches the goal state.**

Frequently, we would want the sequence of action to be optimal, which means to say that it is the least cost/highest utility sequence of action that reaches the goal state.


## Nodes
We can formalize the graph that we are searching.
In this graph, it contains nodes, where each node has information on:
* The state of the system
* The parent node of this node
* The action that generated this node
* The actions that can be generated from this node
* The path cost of each action

Thus, by searching, we are traversing the graph, starting from the start node, moving across path which are defined as actions that the agent can take at that state, and finding the least cost path to the goal node.


We define F as the **fringe** or **frontier**, which are the nodes that are to be explored, and E as the **previously explored nodes**.

Note that F is equivalent to the data structure that stores the next visiting vertices (Queue/PriorityQueue), while E is the data structure that stores the previously visited nodes.

## Complications
Since we are dealing with various possible problem formulation, it may be possible that there are multiple goal nodes, or no goal nodes are reachable from the start node.
Also, it is possible that the search place is infinite. For example, in the case of a real life robot navigating the real world space. The possible positions that the robot can take would be infinite, since its coodinates can take any real number. Thus, the analysis of the algorithm will be slightly different from those in the [Algorithm Analysis](../algorithm-analysis/graph_algorithms.ipynb#traversal).

## Metric
To judge how good a search algorithm is, we look at 2 specific traits of the search.
We look at the perfomance under this conditions:
* Graph has finite branching factor
* There exists some goal nodes that are finite depth away from the start

Note that the graph itself may be infinite, but the goal must be in a finite depth away from the start.

### Completeness
A search is **complete** if it terminates when searching the given graph. 
Essentially, it is saying that "if a path to the goal exists, the search should be able to find some (not neccessarily optimal) path in finite time".

### Optimality
A search is **optimal** if every path that it returns is the least costs path for that problem.
Note it is possible for a search to be optimal but not complete.

### Complexity
Complexity measures the number of computations that the algorithm need to do when searching.
We define $d$ as the depth of the search, which tells us how far from the source we have traversed, and
$b$ as the branching factor, which is the number of children each node can have.

## Types of Searches
There are 2 broad category of state search:
* Tree search
    * Do not remembers the state that were visited in another branch
    * Will revisit these previously visited state
* Graph search
    * Remembers previously visited state from another branch
    * Will not visit those previosuly visited state
    * Requires memory

It is to note that tree search does not imply that the underlying graph is a tree, this is a rather unfortunate naming convention.

Thus, keep in mind that the searches below can either be the tree search variant or the graph search variant.

# Uninformed Search
Uninformed searches are searches that do no have prior knowledge of the graph.
Thus, they do not have knowledge of how far away the goal is until they complete their search.

## Depth First Search
This is exactly the same as the DFS discussed in [Algorithm Analysis](../algorithm-analysis/graph_algorithms.ipynb#dfs).

However, the small caveat is that DFS tree search is actually recursive backtracking, while DFS graph search is the "DFS" as discussed in that chapter.

### Completeness
DFS is not complete when searching an infinite graph.
It is possible for the search to iteratively search deeper and deeper into the wrong branch of the graph, causing it to not terminate.

### Optimality
Clearly DFS is not optimal, as it does not account for the cost of each path.

### Time Complexity
By counting the number of nodes at each layer, we get that the total number of of nodes = $b + b^2 + b^3 + \dots + b^d \leq b(b^d) = O(b^{d+1})$

### Space Complexity
When we are at a node and want to traverse deeper, we need to remember which of the chidren of the current node have already been visited. Since we need to remember this for each layer, and each layer may need to remember up to $d$ childrens, the space complexity is $O(bd)$. 

(Technically we can simply remember the index of the last visited children for each layer, to reduce the time complexity to $O(d)$).


## Breadth First Search
This is exactly the same as the BFS discussed in [Algorithm Analysis](../algorithm-analysis/graph_algorithms.ipynb#bfs). 
The BFS discussed there is the graph search variant.

### Completeness
BFS is complete. 

Suppose the goal node is at depth $d$.
It is clear that BFS will find the goal node once it starts processing layer $d$.

### Optimality
By the same analysis as DFS, it is not optimal.

### Time Complexity
By the same analysis as DFS, the complexity is $O(b^{d+1})$.

### Space Complexity
Since we need to visit at most $b^{d+1}$ nodes to reach the goal, we would need $O(b^{d+1})$ space to store E.

And because F only stores all the nodes that are to be visited at layer $k$ at the k-th step of the search, also that each layer have at most $b^k$ nodes, thus the space needed to store F is $O(b^d)$

## Uniform Cost Search
This is exactly the same as the Dijkstra algorithm for single source shortest path, discussed in [Algorithm Analysis](../algorithm-analysis/graph_algorithms.ipynb#dijkstra). 
The UCS discussed there is the graph search variant.

Here, we define $g(n)$ as the true distance of node $n$ from the start, and $\hat g(n)$ as the estimate distance of the node $n$ from the start during the run time of the search (the same as the values that are in the F during run time).

### Pseudocode
```
UCS ( u )
    F ← PQueue ( u ) // Sorted on gˆ[u]
    E ← { u }
    gˆ[u] = 0
    while F not empty
        u ← F. pop ( )
    if GoalTest ( u )
    for all children of u
    
    if v not in E
        if v in F
            gˆ[v] = min ( gˆ[v] , gˆ[u] + c [ u , v ] )
        else
            F.push(v)
            gˆ[v] = gˆ[v] + c [u , v]
    return Failure
```

### Completeness
Complete if every edge has cost $\geq \epsilon$, or else there will be a state that is stuck in F.

### Optimality
This is optimal, by the same reasoning of Dijkstra's optimality.

### Time Complexity
By the same analysis as BFS, the complexity is $O(b^{d+1})$.

### Space Complexity
By the same analysis as BFS, the complexity is $O(b^{d+1})$.


# Informed Search
Informed search have some prior knowledge of the cost needed to reach the goal from a given state.

Thus, they can utilise it to explore less nodes to reach the goal.


## A*
A* is similar to UCS, but instead of using the distance of a node from the start node to determine the order of exploration, it uses the distance + some heuristic on that node.
This heuristic tries to estimate the distance needed to reach the goal from that node.
Thus, A* is choosing the node to explore based on its estimation of the total distance from the start to the goal through this node.

One way to view it is that suppose you want to go from your house to the train station. 
UCS will prompt you to start the search in a circle area around your house, and slowly expand the radius until you find a station.
But suppose you know that the train station somewhere south of your house, then A* will prompt you to search in that general direction for the train station instead of wasting resources searching the north.

Thus, the only modification to UCS is that $f(n)$ is the combined cost of the node $n$, and our F is sorted base on the cost of $\hat f(n)$, and we update $\hat f(n) = \hat g(n) + h(n)$ at each step.

### Pseudocode
```
UCS ( u )
    F ← PQueue ( u ) // Sorted on fˆ[u]
    E ← { u }
    gˆ[u] = 0
    while F not empty
        u ← F. pop ( )
    if GoalTest ( u )
    for all children of u
    
    if v not in E
        if v in F
            gˆ[v] = min ( gˆ[v] , gˆ[u] + c [ u , v ] )
            fˆ[v] = gˆ[v] + h [v]
        else
            F.push(v)
            gˆ[v] = gˆ[v] + c [u , v]
            fˆ[v] = gˆ[v] + h [v]
    return Failure
```

### Completeness
Using same analysis as UCS, it is complete.

### Optimality
Optimal. See below.

### Time Complexity
By the same analysis as UCS, the complexity is $O(b^{d+1})$.

### Space Complexity
By the same analysis as UCS, the complexity is $O(b^{d+1})$.

### Condition for Optimality
Since we know that UCS is optimal, we just need to ensure that the inclusion of the heuristic does not cause UCS to become unoptimal.
Suppose that the optimal path is through the state $s_0, s_1, s_2, \dots u$, where $u$ is the goal node.
We require that $\hat f_{pop}(s_0)  \leq \hat f_{pop}(s_1) \leq  \hat f_{pop}(s_2) \leq \hat f_{pop}(u)$ and $\hat f_{pop}(s_i) = f(s_i)$.

The first inequality enforces that we will not explore a node with less cost than all the previous nodes that we have explored before, and the second simply enforces that once we explore a node, the cost when we explore it is the same as the true cost.

Note that if we ensure that the 2nd equality and $f(s_0)  \leq f(s_1) \leq  f(s_2) \leq f(u)$ (the true cost is strictly decreasing along the shortest path), it follows that the 1st inequality must hold.

Thus in general we must have 

$$
\begin{align*}
f(s_i) &\leq f(s_{i+1}) \\
\Rightarrow g(s_i) + h(s_i) &\leq g(s_{i+1}) + h(s_{i+1})\\
\Rightarrow h(s_i) &\leq g(s_{i+1}) - g(s_i)+ h(s_{i+1})\\
\Rightarrow h(s_i) &\leq c(s_i, s_{i+1}) + h(s_{i+1})\\
\end{align*}
$$

Hence, we get that, if we formulate our heuristic such that it satisfy the above inequality, then **graph search A* will be optimal**.
This property is also known as **consistency**.
In words, it means that the difference in the value of heuristic between adjacent states cannot exceed the cost of the path between the states.

A weaker condition is known as **admissability**, which is the condition that $h(s) \leq Opt(s)$, where $Opt(s)$ is the cost from node $s$ to the goal. 
It simply means that our heuristic should not overestimate the distance to the goal.
It can be proven that **tree search A* will be optimal** if this holds.

Notes:
* If a heuristic is consistent, it will be admissable
* Consistent/admissable heuristics must have the heuristic of 0 at the goal node

### Heuristics
Notice that if we set $h(s)$ to be 0 everywhere, we will get UCS.

And ideally, we would want $h(s)$ to be as close to the optimal distance from node $s$ to the goal so that it is a truer estimate of the cost of the node.

However, in order to do so, it would require us to use UCS from node $s$, which defeats the purpose of using a heuristic in the first place because it would take too long to produce an accurate result.

An example would be the straight line (Euclidean) distance from $s$ to the goal.
Suppose we were to revisit the example in the preamble.
Now, you would have a device can measure the absolute distance between your location and the station.
Note that even though this is not a measure of the true distance, because you would need to travel on roads and there might not always be a straight road to the station), but it does intuitively make sense that this is a good heuristic to use to find the shortest path to the station.


## Greedy Search
Now suppose we put all our trust in our heuristic and make decision solely on it.
This is what Greedy Search is about, where we choose the node to explore based solely on heuristic values.

It is clear that this shares all the same properties as A* and UCS, with the exception that it is clearly not optimal, because one can create a bad heuristic that fails to capture the true distance.