Building a recommendation system using word analogies
--------

Let's say we want to build a recommendation engine for job seekers. 

The job seeker has a list of previous job titles, and we want to suggest jobs for them. Let's find better jobs for them given their previous work history.

This problem can be as a word analogy problem:

> Man is to king as woman is to queen

In the reframe as job promotion:

> Prince is to king as princess is to ________.

Given that word vectors can solve the word analogy problem, let's use word vectors to suggest job title promotions.

In [6]:
import gensim
import gensim.downloader

In [7]:
model = gensim.downloader.load('glove-wiki-gigaword-300')

In [8]:
def complete_analogy(worda, wordb, wordc):
    "Return the single best match that completes: {worda} is to {wordb} as {wordc} is ____"
    try:
        result = model.most_similar(negative=[worda], 
                                    positive=[wordb, wordc])
        # Remove simple purals
        top_result = result[0][0]
        if top_result != wordc+'s':
            return top_result
        else:
            second_best_result = result[1][0]
            return second_best_result
    except KeyError as error:
        return error

assert complete_analogy("man", "king", "woman") == 'queen'

In [9]:
lower_position  = "prince"
higher_position = "king"
original_job_titles = ['princess', 'valet', 'gardener'] 
promotions = [complete_analogy(lower_position, higher_position, job_title) for job_title in original_job_titles]

In [10]:
for job_title, promotion in zip(original_job_titles, promotions):
    print(f"A(n) {job_title} can be promoted to a(n) {promotion}.")

A(n) princess can be promoted to a(n) queen.
A(n) valet can be promoted to a(n) concierge.
A(n) gardener can be promoted to a(n) horticulturist.


Building A Data Product Notes
-----

The most important element of building a data product is a large quantity of high quality data. For the actual model, I built a custom embedding space using millions of job postings.

Another element is error handling. Above I had to handle simple plurals. The actual system had complex logical to handle edge cases.

<center><h2>Sources of Inspiration</h2></center>

- https://radimrehurek.com/gensim/models/keyedvectors.html
- https://towardsdatascience.com/how-to-solve-analogies-with-word2vec-6ebaf2354009