## Cosine similarity!

By: Alex Comerford (alexanderjcomerford@gmail.com)

In this notebook we will be explaining what is cosine similarity!

This notebook will be provided as a tutorial and as a service ready function for execution from within OpenFaas. This means that any modifications that are added to this notebook can be deployed and propogated throughout all the consumers of this function, so be careful what you edit before you deploy!

### Paramaters

First we will start this notebook with a set list of paramaters defining the context in which we are executing this notebook

In [28]:
// MATRIX_A/B will be used as paramaters when executing Cosine similarity
// as a service. By default they are sample values
val VECTOR_A = List(0,1,2)
val VECTOR_B = List(4,5,6)

// No output to clutter the notebook
null

null

## What is Cosine similarity??

Cosine similarity can be defined as follows in Latex

In [29]:
display(Latex("""$
\begin{equation}
\cos ({\bf A},{\bf B}) = {{\bf A} {\bf B} \over \|{\bf A}\| \|{\bf B}\|} = \frac{ \sum_{i=1}^{n}{{\bf A}_i{\bf B}_i} }{ \sqrt{\sum_{i=1}^{n}{({\bf A}_i)^2}} \sqrt{\sum_{i=1}^{n}{({\bf B}_i)^2}} }
\end{equation}
$"""))

Cosine similarity is a metric for measuring the distance between two vectors.

The outputs we can expect are from `-1` to `1`, dissimilar to similar respectively.

This is a widely used function in several different disciplines especially natural language processing (nlp). One such example in nlp can be word counts across sentences. Sentences with similar word counts will have higher cosine similarity and therefore can be though of as more related!

Cosine similarity can be used in several other domains where some properties of the instances make so that the weights might be larger without meaning anything different. Sensor values that were captured in various lengths (in time) between instances could be such an example.

Below we will implement an `object` in scala to compute the cosine similarity of two "vectors" (actually type `List`) as a demonstration .

In [30]:
object CosineSimilarity {
  
  /*
   * This method takes 2 equal length arrays of integers 
   * It returns a double representing similarity of the 2 arrays
   * 0.9925 would be 99.25% similar
   * (x dot y)/||X|| ||Y||
   */
  def cosineSimilarity(x: Array[Double], y: Array[Double]): Double = {
    
    // ensure similary 
    require(x.size == y.size)
    dotProduct(x, y)/(magnitude(x) * magnitude(y))
  }
  
  /*
   * Return the dot product of the 2 arrays
   * e.g. (a[0]*b[0])+(a[1]*a[2])
   */
  def dotProduct(x: Array[Double], y: Array[Double]): Double = {
    (for((a, b) <- x zip y) yield a * b) sum
  }
  
  /*
   * Return the magnitude of an array
   * We multiply each element, sum it, then square root the result.
   */
  def magnitude(x: Array[Double]): Double = {
    math.sqrt(x map(i => i*i) sum)
  }
  
}

$line54.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$CosineSimilarity$@354f5552

Now that we have an implementation of cosine similarity we can test it with whatever values we want! In the next cell we will take the paramater input from the top cell and calculate their cosine similarity and print it to `stdout` from within this notebook.

In [31]:
// Convert vectors to Arrays of type Double
var A = VECTOR_A.toArray.map(_.toDouble)
var B = VECTOR_B.toArray.map(_.toDouble)

// Compute their cosine similarity
CosineSimilarity.cosineSimilarity(A,B)

0.8664002254439633