# Simple algorithms

Many algorithms used can help determine whether a word is similar to another, which is used in applications such as spell checking. The first algorithm we can look at is Cosine Similarity. It is a simple function that determines the similarity of one string of letters to another and is computed by:

$$\frac{\sum_{i=1}^{n}A_i B_i}{\sqrt{\sum_{i=1}^{n}A_i^2}\sqrt{\sum_{i=1}^{n}B_i^2}}$$

This return a percentage of how similar the two vectors, or in this case, words are:

In [52]:
from math import sqrt

def letters_to_numbers(letters):
    """Convert a string of letters to numbers (Not case sensitive)."""
    numbers = []
    for letter in letters.lower():
        numbers.append(ord(letter))
    return numbers

def cosine_similarity(a, b):
    """Returns the cosine similarity of two vectors a and b."""
    a = letters_to_numbers(a)
    b = letters_to_numbers(b)
    a2 = [x**2 for x in a]
    b2 = [x**2 for x in b]

    ab =  [x * y for (x, y) in zip(a,b)]

    return sum(ab)/sqrt(sum(a2)*sum(b2))

We can see an example of this using the two phrases: "HallO Worlds" and "hello world":

In [53]:
cosine_similarity("HallO Worlds", "hello world")

0.9482289714614682

This shows that they are approximately 94.8% similar.

This type of technique is used to search for likely spelling mistakes and their best solutions, and even to compare articles to see how similar they are. The algorithm can be altered to include words instead of letters for larger readings.

In [None]:
words_to_numbers