## **Week 9: Recommender Systems Workshop**

Aman Kumar

Adapted from Armand Olivares. 2019. Building NLP Content-Based Recommender Systems https://medium.com/@armandj.olivares/building-nlp-content-based-recommender-systems-b104a709c042

### Getting Data

This demo shows how to upload files from google drive. If you prefer using upload methods from local files, make sure to modify file import code.

In [55]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [56]:
!ls -ls /content/gdrive/'My Drive'/Week9-WS1_data

total 161329
154878 -rw------- 1 root root 158594370 Mar  9 15:44 Combined_Jobs_Final.csv
  2770 -rw------- 1 root root   2836104 Mar  9 15:43 Experience.csv
  3214 -rw------- 1 root root   3291093 Mar  9 15:43 Job_Views.csv
   468 -rw------- 1 root root    478462 Mar  9 15:41 Positions_Of_Interest.csv


In [57]:
! pwd

/content


In [58]:
! cp -r /content/gdrive/'My Drive'/Week9-WS1_data /content

^C


In [59]:
! ls /content/

gdrive	sample_data  Week9-WS1_data


### Importing libraries and data exploration

In [6]:
import pandas as pd
import numpy as np
import nltk

from nltk.corpus import stopwords
import re
import string
from nltk.stem import WordNetLemmatizer
from nltk import word_tokenize
from nltk.corpus import stopwords

In [7]:
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [8]:
directory= '/content/Week9-WS1_data'

In [9]:
# ## reading the files
df_jobs = pd.read_csv(directory+'/Combined_Jobs_Final.csv')
df_exp= pd.read_csv(directory+'/Experience.csv') # previous experience of the applicant
df_views= pd.read_csv(directory+'/Job_Views.csv') # df showing job postings that the applicant has viewed
df_poi= pd.read_csv(directory+'/Positions_Of_Interest.csv') # positions of interest of the applicant

In [10]:
df_jobs.head()

Unnamed: 0,Job.ID,Provider,Status,Slug,Title,Position,Company,City,State.Name,State.Code,Address,Latitude,Longitude,Industry,Job.Description,Requirements,Salary,Listing.Start,Listing.End,Employment.Type,Education.Required,Created.At,Updated.At
0,111,1,open,palo-alto-ca-tacolicious-server,Server @ Tacolicious,Server,Tacolicious,Palo Alto,California,CA,,37.443346,-122.16117,Food and Beverages,Tacolicious' first Palo Alto store just opened...,,8.0,,,Part-Time,,2013-03-12 02:08:28 UTC,2014-08-16 15:35:36 UTC
1,113,1,open,san-francisco-ca-claude-lane-kitchen-staff-chef,Kitchen Staff/Chef @ Claude Lane,Kitchen Staff/Chef,Claude Lane,San Francisco,California,CA,,37.78983,-122.404268,Food and Beverages,\r\n\r\nNew French Brasserie in S.F. Financia...,,0.0,,,Part-Time,,2013-04-12 08:36:36 UTC,2014-08-16 15:35:36 UTC
2,117,1,open,san-francisco-ca-machka-restaurants-corp-barte...,Bartender @ Machka Restaurants Corp.,Bartender,Machka Restaurants Corp.,San Francisco,California,CA,,37.795597,-122.402963,Food and Beverages,We are a popular Mediterranean wine bar and re...,,11.0,,,Part-Time,,2013-07-16 09:34:10 UTC,2014-08-16 15:35:37 UTC
3,121,1,open,brisbane-ca-teriyaki-house-server,Server @ Teriyaki House,Server,Teriyaki House,Brisbane,California,CA,,37.685073,-122.400275,Food and Beverages,● Serve food/drinks to customers in a profess...,,10.55,,,Part-Time,,2013-09-04 15:40:30 UTC,2014-08-16 15:35:38 UTC
4,127,1,open,los-angeles-ca-rosa-mexicano-sunset-kitchen-st...,Kitchen Staff/Chef @ Rosa Mexicano - Sunset,Kitchen Staff/Chef,Rosa Mexicano - Sunset,Los Angeles,California,CA,,34.073384,-118.460439,Food and Beverages,"Located at the heart of Hollywood, we are one ...",,10.55,,,Part-Time,,2013-07-17 15:26:18 UTC,2014-08-16 15:35:40 UTC


In [11]:
df_exp.head()

Unnamed: 0,Applicant.ID,Position.Name,Employer.Name,City,State.Name,State.Code,Start.Date,End.Date,Job.Description,Salary,Can.Contact.Employer,Created.At,Updated.At
0,10001,Account Manager / Sales Administration / Quali...,Barcode Resourcing,Bellingham,Washington,WA,2012-10-15,,,,,2014-12-12 20:10:02 UTC,2014-12-12 20:10:02 UTC
1,10001,Electronics Technician / Item Master Controller,Ryzex Group,Bellingham,Washington,WA,2001-12-01,2012-04-01,,,,2014-12-12 20:10:02 UTC,2014-12-12 20:10:02 UTC
2,10001,Machine Operator,comptec inc,Custer,Washington,WA,1997-01-01,1999-01-01,,,,2014-12-12 20:10:02 UTC,2014-12-12 20:10:02 UTC
3,10003,maintenance technician,Winn residental,washington,District of Columbia,DC,,,"Necessary maintenance for ""Make Ready"" Plumbin...",10.0,False,2014-12-12 21:27:05 UTC,2014-12-12 21:27:05 UTC
4,10003,Electrical Helper,michael and son services,alexandria,Virginia,VA,,,repair and services of electrical construction,,False,2014-12-12 21:27:05 UTC,2014-12-12 21:27:05 UTC


In [12]:
df_views.head()

Unnamed: 0,Applicant.ID,Job.ID,Title,Position,Company,City,State.Name,State.Code,Industry,View.Start,View.End,View.Duration,Created.At,Updated.At
0,10000,73666,Cashiers & Valets Needed! @ WallyPark,Cashiers & Valets Needed!,WallyPark,Newark,New Jersey,NJ,,2014-12-12 20:12:35 UTC,2014-12-12 20:31:24 UTC,1129.0,2014-12-12 20:12:35 UTC,2014-12-12 20:12:35 UTC
1,10000,96655,Macy's Seasonal Retail Fragrance Cashier - Ga...,Macy's Seasonal Retail Fragrance Cashier - Ga...,Macy's,Garden City,New York,NY,,2014-12-12 20:08:50 UTC,2014-12-12 20:10:15 UTC,84.0,2014-12-12 20:08:50 UTC,2014-12-12 20:08:50 UTC
2,10001,84141,Part Time Showroom Sales / Cashier @ Grizzly I...,Part Time Showroom Sales / Cashier,Grizzly Industrial Inc.,Bellingham,Washington,WA,,2014-12-12 20:12:32 UTC,2014-12-12 20:17:18 UTC,286.0,2014-12-12 20:12:32 UTC,2014-12-12 20:12:32 UTC
3,10002,77989,Event Specialist Part Time @ Advantage Sales &...,Event Specialist Part Time,Advantage Sales & Marketing,Simpsonville,South Carolina,SC,,2014-12-12 20:39:23 UTC,2014-12-12 20:42:13 UTC,170.0,2014-12-12 20:39:23 UTC,2014-12-12 20:39:23 UTC
4,10002,69568,Bonefish - Kitchen Staff @ Bonefish Grill,Bonefish - Kitchen Staff,Bonefish Grill,Greenville,South Carolina,SC,,2014-12-12 20:43:25 UTC,2014-12-12 20:43:58 UTC,33.0,2014-12-12 20:43:25 UTC,2014-12-12 20:43:25 UTC


In [13]:
df_poi.head()

Unnamed: 0,Applicant.ID,Position.Of.Interest,Created.At,Updated.At
0,10003,security officer,2014-12-12 21:20:54 UTC,2014-12-12 21:20:54 UTC
1,10007,Server,2014-08-14 15:56:42 UTC,2015-02-26 20:35:12 UTC
2,10007,Bartender,2014-08-14 15:56:44 UTC,2015-02-19 23:21:28 UTC
3,10008,Host,2014-08-14 15:56:42 UTC,2015-02-26 20:35:12 UTC
4,10008,Barista,2014-08-14 15:56:43 UTC,2015-02-18 02:35:06 UTC


###1st inspecting jobs_df and creating jobs corpus


In [14]:
## selecting the needful columns
cols = ['Job.ID','Title','Position','Company','City','Employment.Type','Job.Description']
df_jobs =df_jobs[cols]
df_jobs.columns = ['Job_ID', 'Title', 'Position', 'Company','City', 'Empl_type','Job_Description']
df_jobs.head()

Unnamed: 0,Job_ID,Title,Position,Company,City,Empl_type,Job_Description
0,111,Server @ Tacolicious,Server,Tacolicious,Palo Alto,Part-Time,Tacolicious' first Palo Alto store just opened...
1,113,Kitchen Staff/Chef @ Claude Lane,Kitchen Staff/Chef,Claude Lane,San Francisco,Part-Time,\r\n\r\nNew French Brasserie in S.F. Financia...
2,117,Bartender @ Machka Restaurants Corp.,Bartender,Machka Restaurants Corp.,San Francisco,Part-Time,We are a popular Mediterranean wine bar and re...
3,121,Server @ Teriyaki House,Server,Teriyaki House,Brisbane,Part-Time,● Serve food/drinks to customers in a profess...
4,127,Kitchen Staff/Chef @ Rosa Mexicano - Sunset,Kitchen Staff/Chef,Rosa Mexicano - Sunset,Los Angeles,Part-Time,"Located at the heart of Hollywood, we are one ..."


In [15]:
df_jobs.isnull().sum()

Job_ID                0
Title                 0
Position              0
Company            2271
City                135
Empl_type            10
Job_Description      56
dtype: int64

In [16]:
print(len(set(df_jobs['Job_ID'])), len(df_jobs['Job_ID']))

84090 84090


Creating an additional column of jobs corpus

In [17]:
df_jobs["text"] = df_jobs["Position"] + " " + df_jobs["Company"] +" "+ df_jobs["City"]+ " "+df_jobs['Empl_type']+" "+df_jobs['Job_Description'] +" "+df_jobs['Title']
df_jobs.head(2)

Unnamed: 0,Job_ID,Title,Position,Company,City,Empl_type,Job_Description,text
0,111,Server @ Tacolicious,Server,Tacolicious,Palo Alto,Part-Time,Tacolicious' first Palo Alto store just opened...,Server Tacolicious Palo Alto Part-Time Tacolic...
1,113,Kitchen Staff/Chef @ Claude Lane,Kitchen Staff/Chef,Claude Lane,San Francisco,Part-Time,\r\n\r\nNew French Brasserie in S.F. Financia...,Kitchen Staff/Chef Claude Lane San Francisco P...


In [18]:
df_jobs = df_jobs[['Job_ID', 'text', 'Title']]
df_jobs = df_jobs.fillna(" ")
df_jobs.head()

Unnamed: 0,Job_ID,text,Title
0,111,Server Tacolicious Palo Alto Part-Time Tacolic...,Server @ Tacolicious
1,113,Kitchen Staff/Chef Claude Lane San Francisco P...,Kitchen Staff/Chef @ Claude Lane
2,117,Bartender Machka Restaurants Corp. San Francis...,Bartender @ Machka Restaurants Corp.
3,121,Server Teriyaki House Brisbane Part-Time ● Se...,Server @ Teriyaki House
4,127,Kitchen Staff/Chef Rosa Mexicano - Sunset Los ...,Kitchen Staff/Chef @ Rosa Mexicano - Sunset


Cleaning the corpus i.e. the 'Text' column

In [19]:
stop = stopwords.words('english')
stop_words_ = set(stopwords.words('english'))
wn = WordNetLemmatizer()

def black_txt(token):
    return  token not in stop_words_ and token not in list(string.punctuation)  and len(token)>2   
  
def clean_txt(text):
  clean_text = []
  clean_text2 = []
  text = re.sub("'", "",text)
  text=re.sub("(\\d|\\W)+"," ",text) 
  text = text.replace("nbsp", "")
  clean_text = [ wn.lemmatize(word, pos="v") for word in word_tokenize(text.lower()) if black_txt(word)]
  clean_text2 = [word for word in clean_text if black_txt(word)]
  return " ".join(clean_text2)


In [20]:
df_jobs['text'] = df_jobs['text'].apply(clean_txt)
df_jobs.head(2)

Unnamed: 0,Job_ID,text,Title
0,111,server tacolicious palo alto part time tacolic...,Server @ Tacolicious
1,113,kitchen staff chef claude lane san francisco p...,Kitchen Staff/Chef @ Claude Lane


TF-IDF (Generating the feature of Job corpus)

In [21]:
#initializing tfidf vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vectorizer = TfidfVectorizer()

tfidf_jobid = tfidf_vectorizer.fit_transform((df_jobs['text'])) #fitting and transforming the vector
tfidf_jobid

<84090x50767 sparse matrix of type '<class 'numpy.float64'>'
	with 8249764 stored elements in Compressed Sparse Row format>

### Inspecting users' dataframes and creating the user corpus

Job views dataset

In [22]:
df_views= df_views[['Applicant.ID', 'Job.ID', 'Position', 'Company','City']]
df_views["select_pos_com_city"] = df_views["Position"] + "  " + df_views["Company"] +"  "+ df_views["City"]
df_views['select_pos_com_city'] = df_views['select_pos_com_city'].map(str).apply(clean_txt)
df_views['select_pos_com_city'] = df_views['select_pos_com_city'].str.lower()
df_views = df_views[['Applicant.ID','select_pos_com_city']]
df_views.head()

Unnamed: 0,Applicant.ID,select_pos_com_city
0,10000,cashier valet need wallypark newark
1,10000,macys seasonal retail fragrance cashier garden...
2,10001,part time showroom sales cashier grizzly indus...
3,10002,event specialist part time advantage sales mar...
4,10002,bonefish kitchen staff bonefish grill greenville


In [23]:
df_views= df_views.groupby("Applicant.ID")['select_pos_com_city'].apply(' '.join).reset_index()
df_views.head(3)

Unnamed: 0,Applicant.ID,select_pos_com_city
0,42,movie extras actors model want san francisco p...
1,96,kitchen staff izakaya yuzuki san francisco ser...
2,153,valic financial advisor intern roseville aig c...


Experience data set

In [24]:
df_exp=df_exp[['Applicant.ID','Position.Name']]
#cleaning the text
df_exp['Position.Name'] = df_exp['Position.Name'].map(str).apply(clean_txt)
df_exp.head()

Unnamed: 0,Applicant.ID,Position.Name
0,10001,account manager sales administration quality a...
1,10001,electronics technician item master controller
2,10001,machine operator
3,10003,maintenance technician
4,10003,electrical helper


In [25]:
df_exp =  df_exp.sort_values(by='Applicant.ID')
df_exp = df_exp.fillna(" ")
df_exp.head()

Unnamed: 0,Applicant.ID,Position.Name
2763,2,volunteer
2762,2,writer uloop blog
3759,3,market intern
3758,3,server
3757,3,prep cook


Club multiple applicant in one

In [26]:
#adding same rows to a single row
df_exp = df_exp.groupby('Applicant.ID')['Position.Name'].apply(' '.join).reset_index()
df_exp.head(5)

Unnamed: 0,Applicant.ID,Position.Name
0,2,volunteer writer uloop blog
1,3,market intern server prep cook
2,6,project assistant
3,8,deli clerk server cashier food prep order taker
4,11,cashier


Position of interest data frame

In [27]:
#Position of interest
df_poi = df_poi.sort_values(by='Applicant.ID')
df_poi=df_poi[['Applicant.ID','Position.Of.Interest']]
#cleaning the text
df_poi['Position.Of.Interest']=df_poi['Position.Of.Interest'].map(str).apply(clean_txt)
df_poi = df_poi.fillna(" ")
df_poi.head()

Unnamed: 0,Applicant.ID,Position.Of.Interest
6437,96,server
1156,153,barista
1155,153,host
1154,153,server
1158,153,sales rep


In [28]:
df_poi = df_poi.groupby('Applicant.ID', sort=True)['Position.Of.Interest'].apply(' '.join).reset_index()
df_poi.head()

Unnamed: 0,Applicant.ID,Position.Of.Interest
0,96,server
1,153,barista host server sales rep customer service...
2,256,host production area sales rep customer servic...
3,438,customer service rep barista host server
4,568,receptionist customer service rep book keeper


###Merging users data frames

merging job_views and job_exp dataframes

In [29]:
print(len(df_views),len(df_exp))

3448 3790


In [30]:
df_view_exp= df_views.merge(df_exp, on='Applicant.ID', how='outer')
df_view_exp.head()

Unnamed: 0,Applicant.ID,select_pos_com_city,Position.Name
0,42,movie extras actors model want san francisco p...,courtesy clerk street marketer
1,96,kitchen staff izakaya yuzuki san francisco ser...,cashiet waiter receptionist cashier
2,153,valic financial advisor intern roseville aig c...,banker front desk agent operations supervisor ...
3,601,retail sales consultant retail bay area associ...,
4,1877,sales associate see candy sunnyvale,registration coordinator reeptionist front des...


In [31]:
len(df_view_exp),len(set(df_view_exp['Applicant.ID']))

(6461, 6461)

Merging position of interest with previous dataframe

In [32]:
df_user= df_poi.merge(df_view_exp, on='Applicant.ID', how='outer')
df_user= df_user.fillna(' ')
df_user.head()

Unnamed: 0,Applicant.ID,Position.Of.Interest,select_pos_com_city,Position.Name
0,96,server,kitchen staff izakaya yuzuki san francisco ser...,cashiet waiter receptionist cashier
1,153,barista host server sales rep customer service...,valic financial advisor intern roseville aig c...,banker front desk agent operations supervisor ...
2,256,host production area sales rep customer servic...,,cashier server
3,438,customer service rep barista host server,,
4,568,receptionist customer service rep book keeper,,account receivable specialist


In [33]:
df_user["text"] = df_user["select_pos_com_city"].map(str) + df_user["Position.Name"] +" "+ df_user["Position.Of.Interest"]
df_user= df_user[['Applicant.ID','text']]
df_user.columns=['Applicant_id','text'] # renaming the columns of final user dataset
df_user['text']=df_user['text'].apply(clean_txt)
df_user.head()

Unnamed: 0,Applicant_id,text
0,96,kitchen staff izakaya yuzuki san francisco ser...
1,153,valic financial advisor intern roseville aig c...
2,256,cashier server host production area sales rep ...
3,438,customer service rep barista host server
4,568,account receivable specialist receptionist cus...


###Selecting a User to whom the job should be recommended

In [34]:
u_id=326 # Applicant Id
index = np.where(df_user['Applicant_id'] == u_id)[0][0]
user_q = df_user.iloc[[index]]
user_q

Unnamed: 0,Applicant_id,text
5168,326,java developer


## Building the recommender system

### 1.) Using Cosine Similarity

In [47]:
from sklearn.metrics.pairwise import cosine_similarity
user_tfidf = tfidf_vectorizer.transform(user_q['text'])
cos_similarity_tfidf = map(lambda x: cosine_similarity(user_tfidf, x),tfidf_jobid)

In [48]:
op = list(cos_similarity_tfidf)

In [49]:
print(len(op))
op[0]

84090


array([[0.]])

In [50]:
# for demonstration
sim= cosine_similarity(user_tfidf, tfidf_jobid[50,:])
sim

array([[0.]])

In [51]:
 def get_recommendation(top, df_jobs, scores):
  recommendation = pd.DataFrame(columns = ['ApplicantID', 'JobID',  'title', 'score'])
  count = 0
  for i in top:
      recommendation.at[count, 'ApplicantID'] = u_id
      recommendation.at[count, 'JobID'] = df_jobs['Job_ID'][i]
      recommendation.at[count, 'title'] = df_jobs['Title'][i]
      recommendation.at[count, 'score'] =  scores[count]
      count += 1
  return recommendation

Getting top 10 recommendations

In [None]:
top = sorted(range(len(op)), key=lambda i: op[i], reverse=True)[:10]
list_scores = [op[i][0][0] for i in top]
get_recommendation(top,df_jobs, list_scores)

Unnamed: 0,ApplicantID,JobID,title,score
0,326,303112,Java Developer @ TransHire,0.749481
1,326,294684,Java Developer @ Kavaliro,0.740841
2,326,269922,Entry Level Java Developer / Jr. Java Develope...,0.736959
3,326,141831,Lead Java/J2EE Developer - Contract to Hire @ ...,0.671604
4,326,270171,Senior Java Developer - Contract to Hire - Gre...,0.644978
5,326,305264,Sr. Java Developer @ Paladin Consulting Inc,0.62543
6,326,309945,"Java Software Engineer @ iTech Solutions, Inc.",0.592243
7,326,245753,Java Administrator @ ConsultNet,0.530202
8,326,146640,Jr. Java Developer @ Paladin Consulting Inc,0.510518
9,326,150882,Java Consultant - Mobile Apps Development @ Co...,0.48677


### 2.) Using KNN

In [None]:
from sklearn.neighbors import NearestNeighbors
n_neighbors = 11
KNN = NearestNeighbors(n_neighbors)
KNN.fit(tfidf_jobid)
NNs = KNN.kneighbors(user_tfidf, return_distance=True) 

In [None]:
print(NNs)

(array([[0.70784066, 0.71994255, 0.72531474, 0.81042681, 0.84264137,
        0.86552883, 0.90305777, 0.96932739, 0.98942606, 1.        ,
        1.        ]]), array([[69346, 63958, 40385,  3231, 40634, 71496, 76180, 16225,  7108,
        55896, 72067]]))


Getting Top 10 recommendations using KNN

In [None]:
top = NNs[1][0]
index_score = NNs[0][0]

get_recommendation(top, df_jobs, index_score)

Unnamed: 0,ApplicantID,JobID,title,score
0,326,303112,Java Developer @ TransHire,0.707841
1,326,294684,Java Developer @ Kavaliro,0.719943
2,326,269922,Entry Level Java Developer / Jr. Java Develope...,0.725315
3,326,141831,Lead Java/J2EE Developer - Contract to Hire @ ...,0.810427
4,326,270171,Senior Java Developer - Contract to Hire - Gre...,0.842641
5,326,305264,Sr. Java Developer @ Paladin Consulting Inc,0.865529
6,326,309945,"Java Software Engineer @ iTech Solutions, Inc.",0.903058
7,326,245753,Java Administrator @ ConsultNet,0.969327
8,326,146640,Jr. Java Developer @ Paladin Consulting Inc,0.989426
9,326,285436,Bilingual Retail Sales Associate (432),1.0


### 3.) Using Spacy

In [35]:
import spacy

In [36]:
!python -m spacy download en_core_web_lg

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_lg')


In [37]:
import spacy.cli 
spacy.cli.download("en_core_web_lg")

[38;5;2m✔ Download and installation successful[0m
You can now load the model via spacy.load('en_core_web_lg')


In [38]:
nlp = spacy.load('en_core_web_lg')

Transform the corpus text to Spacy's document data structure

In [39]:
%%time
list_docs = []
for i in range(len(df_jobs)):
  doc1 = nlp("u'" + df_jobs['text'][i] + "'")
  list_docs.append((doc1,i))
print(len(list_docs))

84090
CPU times: user 43min 40s, sys: 29.2 s, total: 44min 9s
Wall time: 44min 16s


In [40]:
def calculateSimWithSpaCy(nlp, df, user_text, n=6):
    # Calculate similarity using spaCy
    list_sim =[]
    doc1 = nlp("u'" + user_text + "'")
    for i in df.index:
      try:
            doc2 = list_docs[i][0]
            score = doc1.similarity(doc2)
            list_sim.append((doc1, doc2, list_docs[i][1],score))
      except:
        continue

    return  list_sim

In [41]:
user_q.text[5168]

'java developer'

In [42]:
%%time
 df3 = calculateSimWithSpaCy(nlp, df_jobs, user_q.text[5168], n=15)

CPU times: user 1min 14s, sys: 441 ms, total: 1min 15s
Wall time: 1min 14s


In [53]:
df_recom_spacy = pd.DataFrame(df3).sort_values([3], ascending=False).head(10)
df_recom_spacy.reset_index(inplace=True)
index_spacy = df_recom_spacy[2]
list_scores = df_recom_spacy[3]

Top recommendations using Spacy

In [54]:
get_recommendation(index_spacy, df_jobs, list_scores)

Unnamed: 0,ApplicantID,JobID,title,score
0,326,250216,Microstrategy Developer @ Kavaliro,0.698317
1,326,294489,Magento Developer (ONSITE) @ Creative Circle,0.654219
2,326,172946,Non-Certified Nursing Assistant (CNA),0.633758
3,326,308511,Account Manager - Sales,0.633758
4,326,308503,Account Manager - Sales,0.633758
5,326,308504,Account Manager - Sales,0.633758
6,326,308505,Account Manager - Sales,0.633758
7,326,308506,Account Manager - Sales,0.633758
8,326,308507,Account Manager - Sales,0.633758
9,326,308508,Account Manager - Sales,0.633758
