# Problem Solving and Search

## Backgrounds: 

Modern research regards problem solving as the Infomation-Processsing approach. It was first proposed by Alan Newell and Herbert Simon, and they described their “logic theorist” computer program that was designed to simulate human problem solving. In other words, instead of just considering the initial structure of a problem and then the new structure achieved when the problem is solved, Newell and Simon described problem solving as a search that occurs between the posing of the problem and its solution.

One of the main contributions of Newell and Simon’s approach to problem solving
is that it provided a way to specify the possible pathways from the initial to goal states.

**Beyond problem space**

In phycology perspective, how a problem is stated can affect its diffculty. This is very similar to the Gestalt psychologists’ idea of restructuring. 

Thus here, I want to talk about the representation of general graph as nodes in Euclidean space, where intuition can be retrieved efficiently because of the structure and closed-form formulars.

**The thoughts here:**

1. Representation: how graph problems can be facilitated.
    - NRL can reveal the structure, good at further application on ML. 
    - Euclidean Heuristic can preserve the distance information, it's good for informed search, and it can also play a role as NRL. (Explore its potential.)
2. FastMap in Euclidean Heuristic, and directed graph space
    - The improvement of speed, which a sacrificy on accuracy and admissibility.
    - Which parts in FastMap algorithm result in the loss of precision?
    - L1, L2 norm; can we adjust L2 to keep the feature.
    - In directed graph, where lose the properties, how can we make up for it.
    - Try to connect the two embedding into one space, if not, prove they are valuable on application. (An intuition, max to average is like an elastic space model)
3. Differential Heuristic
    - The idea of true distances storage is necessary for this model
    - How can we use it in a reasonable way, like landmark, goal?
    - The improvement of accuracy based on a tiny storage. This can be reasonable from application perspective.
4. Experiments that support the first section of representation.
    - Directed graph path finding
    - State lattice, motion planning
    - TBD

## Network Representation Learning

Adopting this encoder-decoder view, we organize our discussion of the various node embedding methods
along the following four methodological components:

1. A pairwise similarity function $s_{\mathcal{G}} : V \times V \to R^+$, defined over the graph $G$. This function measures the similarity between nodes in $G$.
2. An encoder function, $ENC$, that generates the node embeddings. This function contains a number of trainable parameters that are optimized during the training phase.
3. A decoder function, $DEC$, which reconstructs pairwise **similarity values** from the generated embeddings. This function usually contains no trainable parameters.
4. A loss function, $\mathcal{L}$, which determines how the quality of the pairwise reconstructions is evaluated in order to train the model, i.e., how $DEC(z_i, z_j)$ is compared to the true $s_{\mathcal{G}}(v_i, v_j)$ values.

**Thus the question is how to choose similarity values?**

Most most common metrics are adjacent, probability of co-occuring in random walk, These various different similarity functions trade-off between modeling “first-order similarity”, where $s_{\mathcal{G}}$ directly measures connections between nodes (i.e., $s_{\mathcal{G}}(v_i, v_j)\triangleq A_{i,j}$) and modeling “higher-order similarity”, where $s_{\mathcal{G}}$ corresponds to more general notions of neighborhood overlap (e.g., $s_{\mathcal{G}}(v_i, v_j)\triangleq A^2_{i,j}$).

Or there are more complex similarity like structure similarity, which reveal more of the structure feature of the graph.

However, as we can see, it more likely to handle supervised classification and unsupervised clustering problem, as for regression problem, it lose the popularity. And another very significant factor is the asymmetricity of similarity, especially on directed graph.

**Shortest path distance:**

The intuitions here:

1. In uniform weighted graph, the distance is just the depth of BFS, it cannot be refined into a high order approximation, because it will become meaning less.
    - However, on the other sider, using distance has already considered the structure by some means, which is what higer order approximation cares, but not the neighborhood situation.
    - In other words, the representaion of this method, fastmap algorithm is tend to belong **Graph Constructed from Non-relational Data**
2. And during the NRL, what's matters is the similarity between nodes, which means it only keep the similar nodes stay close after embedding, and it doesn't care what's the meaning of the representation.
3. NRL ignore the weight of edges and mainly focus on the structure of graph.

For Euclidean heuristic, what we care is the distance, which can be very powerful when used on informed search setting. And here we ignored the structure of the graph.

Questions here are: 
- Whether these two pespective can combine together? 
- And whether there is a need to do so? 
- Will it bring improvements on achieving their target individually?
- We may need to classifiy graph problems according to their goal, structure or distance?
- And how about the asymetric problem on directed graph?

## Euclidean space embedding

One significent feature of Euclidean space is the distances between nodes obeys triangular inequality. And such features enable some properties when mapping it with general graph.

And FastMap algorithm start from imaging the objects existing in a Euclidean space where the distance information keeps, and then try to find out the coordinates of nodes in a specified $K$ dimensional space which preserves as much original information as possible.

Thus when this two direction meets, what's going on with the ignored structure information of graph?

## Collection of theorems and properties on graph:

**Any edge $(u, v) \in E$ is the unique shortest $u$-$v$-path is not a limiting assumption when finding shortest paths**.

Since any edge $(u, v)$ that does not satisfy the assumption can be removed from the graph without changing the length of any shortest path.

**A Euclidean heuristic is admissible and consistent if and only if the heuristic is locally admissible**, i.e., $\forall (i, j) \in E,  \|y_i −y_j\|≤ \delta(i, j)$ where $E$ is the set of edges.

**Transitivity is a common characteristic of undirected and directed graphs**.