This notebook checks to make sure the pickled model is working properly.

In [12]:
import pickle
import gensim
import pandas as pd

In [2]:
# Load the pickled model from disk
model = pickle.load(open('model.p', 'rb'))

In [4]:
# Select a sample job description
js = "'\nData Scientist\n\nat Brightidea\n\nSan Francisco\n\nThe Role\n\nWe are seeking machine learning developers with natural language processing experience.\n\nIn general, we are looking for people who are self-motivated and passionate about the field of machine learning and the vast applications of it. These folks will have the ability to work with / understand / and build on top of an existing code base using their deep knowledge of various machine learning algorithms (e.g. neural networks, bayesian methods, etc).\n\nKey responsibilities include, but not limited to:\n\n\nBuild on top of an existing text processing/classification system\nWrite, maintain, and develop python machine learning modules & repos\nRun hyperparameter optimizations + collect, analyze, visualize, and present results\n\nWhat You Need to Succeed\n\nBS or MS in computer science, mathematics, physics or other hard science/engineering discipline\nProgramming in Python ~ 2+ years\nNumpy, scipy, pandas, Jupyter, and scikit-learn background\nData visualization (e.g. matplotlib, seaborn, bokeh, mpld3, etc)\nAbility to implement machine learning algorithms from scratch\nExperience with full machine learning pipeline: from data preprocessing, to building/training various models, to hyperparameter optimization, testing, and visualization of results.\nBackground in deep learning preferred but not required\n\nIn Your Application Please Include\n\n\n\nA past machine learning project you worked on in which highlights your skills, including: What tools/models did you use? What were some problems you encountered along the way, and how did you solve them?"

In [9]:
# Preprocess the job description
doc = gensim.utils.simple_preprocess(js)

In [7]:
# Vectorize the job description
vector = model.infer_vector(doc)

In [10]:
# Extract the most similar docs from the model
sims = model.docvecs.most_similar([vector])
sims

[(441, 0.5915085673332214),
 (1231, 0.5760758519172668),
 (2849, 0.5734542012214661),
 (3976, 0.5609011650085449),
 (1634, 0.5435008406639099),
 (1074, 0.5411732792854309),
 (3298, 0.5391528010368347),
 (18, 0.5345658659934998),
 (4269, 0.5208688378334045),
 (1656, 0.5193619728088379)]

In [13]:
# Read in the course data
course_df = pd.read_csv('../Data/Course_Data/Coursera_Catalog.csv')
course_df.head(2)

Unnamed: 0,courseType,description,domainTypes,id,slug,specializations,workload,primaryLanguages,certificates,name
0,v2.ondemand,Gamification is the application of game elemen...,"[{'subdomainId': 'design-and-product', 'domain...",69Bku0KoEeWZtA4u62x6lQ,gamification,[],4-8 hours/week,['en'],['VerifiedCert'],Gamification
1,v2.ondemand,This course will cover the steps used in weigh...,"[{'subdomainId': 'data-analysis', 'domainId': ...",0HiU7Oe4EeWTAQ4yevf_oQ,missing-data,[],"4 weeks of study, 1-2 hours/week",['en'],"['VerifiedCert', 'Specialization']",Dealing With Missing Data


In [14]:
# Extract course ids from the similar doc list
course_ids = [sim[0] for sim in sims]

In [15]:
# Display the names of the most similar courses
course_df.loc[course_ids, 'name']

441                              Data Science Math Skills
1231                           Mathematics for economists
2849    Scalable Machine Learning on Big Data using Ap...
3976                  Big Data Integration and Processing
1634                         Parallel Programming in Java
1074                               Tools for Data Science
3298    Programming for Everybody (Getting Started wit...
18                                 Computer Vision Basics
4269                                     Disease Clusters
1656                                  業務効率や生産性向上につながる時間管理
Name: name, dtype: object