<a href="https://colab.research.google.com/github/Aranguri/causal-map/blob/master/Similarity_measure.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Measuring similarity
Question: given two concepts, how similar are they?

If every vector is in the same space, we can measure their similarity by the sum of the absolute distances on every coordinate. For instance

In [0]:
import numpy as np
v1 = np.array([3, -4, 5])
v2 = np.array([2, 1, 3])
abs_dist = lambda v1, v2: sum(abs(v1 - v2))  
abs_dist(v1, v2)

8

Now, 

In [0]:
squared_distance = lambda v1, v2: sum((v1 - v2) ** 2)
squared_distance(v1, v2)

30

## Analogies
What we want in an analogy is to have two concepts that are similar in some respect. We can do that using a similiarity matrix that has some values equal to zero (not sure about that.) Then, we are throwing away some information, but we are keeping other parts. And we are making the comparison in that parts we are keeping. 

In [0]:
import numpy as np
import tensorflow as tf

batch_size = 10000
dims = 10

x_train = np.random.randint(10, size=(batch_size, dims))
y_train = np.roll(x_train, 1)
x_test = np.random.randint(10, size=(batch_size, dims))
y_test = np.roll(x_test, 1)

vect1 = np.matrix([10, 100, 10000, 1, 0])
vect2 = np.matrix([0, 1000, 1000, 100, 10])
vect3 = np.matrix([100, 1000, 1000, 100, 10])

# Objective: compare(vect1, vect2) = 1,
# we want these properties
#   * v1 and v2 unit vectors
#   * v1 A v2 = v2.T A v1.T (then we need A = A.T)
#   * v1 A v1 = 1 (then we need A to have 1 in its diagonal)

#v1 //A B C D// v2'
#(AB).T = B.T A.T = BA

model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(dims, activation=tf.nn.relu)
])

model.compile(optimizer='adam',
              loss='mean_squared_error',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.fit(x_train, x_train, epochs=5)
model.fit(y_train, y_train, epochs=5)

model.evaluate(x_test, y_test)
(W, b) = model.layers[0].get_weights()

#
#composing matrices v1 A B C v2 (if we think in the transformation way, we are making several transformations. If we think in similarity way, IDK)
#try v2 (Av1 + b)
# train with reverse input (we want the similarity measure to be reversible)
# how to discover matrices that measure similarity: start with a matrix that measures similarity, that matrix will tell us that a set of vectors are similar between them, then we wan to generate another matrix that has as trainig data the set of vectors the other matrix produced and we penalize the matrix if it's similar to the previous matrix, then we should discover new things on the input. We can repeat the process again with the similarities discovered by the previous vector #and then we use it to find more similar things  
# let's say we don't start with a matrix that measures similarity. We want from scratch to come up with a matrix that measures similarity. If the matrix is less complex then that means it's more probable. If we took little time in training it then it's more probable


'''
array([[ 8.65406170e-02,  |9.75496948e-01|,  8.31720885e-03,
        -5.46534061e-02,  3.77366925e-03,  3.52384336e-02,
         5.05577866e-03,  2.62985309e-03,  2.34701913e-02,
         2.95122173e-02],
       [ 9.89526436e-02,  4.60578687e-02,  8.59199524e-01,
        -2.43056100e-02,  1.18864572e-03,  3.65314484e-02,
         8.15669261e-03,  6.03734225e-04,  2.84412503e-02,
         1.33965733e-02],
       [ 7.46548995e-02, -1.24876155e-02,  8.38994607e-03,
         9.09765244e-01,  9.05210618e-03,  4.28244248e-02,
        -1.14847360e-04,  1.29830102e-02,  2.22217515e-02,
         1.50865568e-02],
       [ 6.41250312e-02,  1.07111474e-02,  1.08579192e-02,
         1.20886285e-02,  9.82677639e-01,  1.94820482e-02,
         1.71931053e-03,  1.95743106e-02,  3.42881046e-02,
         2.24106684e-02],
       [ 7.44421557e-02, -1.09711066e-02,  3.49380285e-03,
         6.28653541e-02,  1.69861002e-03,  7.41901875e-01,
         5.18119661e-03,  1.00192046e-02,  1.77765489e-02,
         5.26456162e-02],
       [ 1.10365935e-01,  9.77548864e-03,  1.60647649e-02,
         8.46332163e-02,  4.49927337e-03,  9.74491518e-03,
         9.16110516e-01,  7.81392679e-03,  4.09136973e-02,
         1.88619588e-02],
       [ 7.25669116e-02, -1.28480177e-02,  1.75851639e-02,
         8.03491566e-03,  4.05756058e-03,  2.39672642e-02,
         8.35803710e-03,  8.79179955e-01,  2.93481350e-02,
         1.65188555e-02],
       [ 5.57185784e-02,  1.47533594e-02,  2.06894856e-02,
         5.22893146e-02,  1.69304351e-03,  1.68965440e-02,
         1.21552674e-02,  8.66250135e-03,  6.96692228e-01,
         2.74204835e-02],
       [ 8.58424231e-02,  3.33533362e-02,  4.83619655e-03,
        -8.77163652e-03,  1.38745329e-03,  1.52551401e-02,
         8.71499907e-03, -3.37033463e-03,  1.79024674e-02,
         6.88671827e-01],
       [ 1.07347734e-01, -1.94627214e-02,  3.12978984e-03,
        -9.59053449e-03,  2.06850492e-03,  5.03478339e-03,
        -1.38251018e-03,  1.38141895e-02,  3.30601186e-02,
         4.20070626e-02]], dtype=float32)
         '''



Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


'\narray([[ 8.65406170e-02,  |9.75496948e-01|,  8.31720885e-03,\n        -5.46534061e-02,  3.77366925e-03,  3.52384336e-02,\n         5.05577866e-03,  2.62985309e-03,  2.34701913e-02,\n         2.95122173e-02],\n       [ 9.89526436e-02,  4.60578687e-02,  8.59199524e-01,\n        -2.43056100e-02,  1.18864572e-03,  3.65314484e-02,\n         8.15669261e-03,  6.03734225e-04,  2.84412503e-02,\n         1.33965733e-02],\n       [ 7.46548995e-02, -1.24876155e-02,  8.38994607e-03,\n         9.09765244e-01,  9.05210618e-03,  4.28244248e-02,\n        -1.14847360e-04,  1.29830102e-02,  2.22217515e-02,\n         1.50865568e-02],\n       [ 6.41250312e-02,  1.07111474e-02,  1.08579192e-02,\n         1.20886285e-02,  9.82677639e-01,  1.94820482e-02,\n         1.71931053e-03,  1.95743106e-02,  3.42881046e-02,\n         2.24106684e-02],\n       [ 7.44421557e-02, -1.09711066e-02,  3.49380285e-03,\n         6.28653541e-02,  1.69861002e-03,  7.41901875e-01,\n         5.18119661e-03,  1.00192046e-02,  1.77