# NLP to recommend Bolt jobs

### Idea:

- Get a user input of a description of his experience and habilitys and then find the positions that fit the most.
- Use the bag of words aproach to get data from text
- Calcule the cosine similarity to find the best job ever

In [12]:
#install dependencies
import nltk
nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [111]:
from rake_nltk import Rake
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import joblib

Let's read the data that we scraped

In [73]:
df_job = pd.read_csv('bolt_jobs_info.csv')
df_job.head()

Unnamed: 0.1,Unnamed: 0,id,title,department,location,link,description
0,0,5380554002,Accountant,Finance,"{'city': 'Tallinn, Estonia'}",https://bolt.eu/en/careers/positions/5380554002,We are looking for an Accountant with at least...
1,1,5802857002,Account Management Specialist,Bolt Food,"{'city': 'Warsaw, Poland'}",https://bolt.eu/en/careers/positions/5802857002,Bolt Food is looking for a passionate Accoun...
2,2,5835763002,Account Management Team Lead,Bolt Food,"{'city': 'Kyiv, Ukraine'}",https://bolt.eu/en/careers/positions/5835763002,Bolt Food is looking for a passionate Account...
3,3,5770201002,Account Management Team Lead,Bolt Food,"{'city': 'Baku, Azerbaijan'}",https://bolt.eu/en/careers/positions/5770201002,Bolt Food is looking for a passionate Account...
4,4,6057289002,Account Management Team Lead,Bolt Food,"{'city': 'Vilnius, Lithuania'}",https://bolt.eu/en/careers/positions/6057289002,Bolt Food Lithuania is looking for a passiona...


- Extracting key-words from jobs descriptions

In [78]:
df_job['bag_of_words'] = ''
key_extractor = Rake() 
bag_of_words_list = []
for index, row in df_job.iterrows():
    key_extractor.extract_keywords_from_text(row['description'])
    key_words_dict_scores = key_extractor.get_word_degrees()
    key_words_list = list(key_words_dict_scores.keys())
    bag_of_words = ' '.join(key_words_list) 
    bag_of_words_list.append(bag_of_words)

In [79]:
df_job['bag_of_words'] = bag_of_words_list

In [123]:
df_job.head(3)

Unnamed: 0.1,Unnamed: 0,id,title,department,location,link,description,bag_of_words,similarity
0,0,5380554002,Accountant,Finance,"{'city': 'Tallinn, Estonia'}",https://bolt.eu/en/careers/positions/5380554002,We are looking for an Accountant with at least...,looking accountant least 3 years experience jo...,0.076472
1,1,5802857002,Account Management Specialist,Bolt Food,"{'city': 'Warsaw, Poland'}",https://bolt.eu/en/careers/positions/5802857002,Bolt Food is looking for a passionate Accoun...,bolt food looking passionate account managemen...,0.131352
2,2,5835763002,Account Management Team Lead,Bolt Food,"{'city': 'Kyiv, Ukraine'}",https://bolt.eu/en/careers/positions/5835763002,Bolt Food is looking for a passionate Account...,bolt food looking passionate account managemen...,0.125988


- Bag of words approach to transform text to vector

In [81]:
#init encoder class
count_vec = CountVectorizer()
#transform
count_matrix = count_vec.fit_transform(df_job['bag_of_words'])

- Serialize model objects to use them later

In [134]:
joblib.dump(count_matrix, '../model_objects/jobs_matrix.joblib')

['model_objects/jobs_matrix.joblib']

In [135]:
joblib.dump(count_vec, '../model_objects/jobs_encoder.joblib')

['model_objects/jobs_encoder.joblib']

- Save a feather data-frame to use on application

In [136]:
joblib.dump(df_job[['title', 'department', 'location',	'link']], '../model_objects/jobs_filtered_data.joblib')

### Example of use

In [124]:
user_description = """
Hey! I'm a data scientist with 3 years of experience. I like to use R, python, SQL and NoSQL, javascript and docker
"""

In [128]:
def transform_input(user_description):
    key_extractor = Rake() 
    key_extractor.extract_keywords_from_text(user_description)
    key_words_dict_scores = key_extractor.get_word_degrees()
    key_words_list = list(key_words_dict_scores.keys())
    return ' '.join(key_words_list) 

In [132]:
def recommend(description, df):
    user_matrix = count_vec.transform([user_bag_of_words])
    cosine_sim = cosine_similarity(user_matrix, count_matrix)
    df['similarity'] = cosine_sim[0]
    return df.sort_values('similarity', ascending=False).head(10)

In [130]:
user_bag_of_words = transform_input(user_description)
print(user_bag_of_words)

hey data scientist 3 years experience like use r python sql nosql javascript docker


In [133]:
recommend(user_bag_of_words, df_job)

Unnamed: 0.1,Unnamed: 0,id,title,department,location,link,description,bag_of_words,similarity
917,917,6077232002,Supply Chain Data Analyst,Bolt Market,"{'city': 'Warsaw, Poland'}",https://bolt.eu/en/careers/positions/6077232002,We are looking for a Supply Chain Data Analys...,looking supply chain data analyst join bolt ’ ...,0.196293
119,119,6100418002,"Data Analyst, Fraud",Analytics,"{'city': 'Tallinn, Estonia'}",https://bolt.eu/en/careers/positions/6100418002,We are looking for a motivated Data Analyst to...,looking motivated data analyst join fraud team...,0.192669
915,915,6098695002,Supply Chain Data Analyst,Bolt Market,"{'city': 'Tallinn, Estonia'}",https://bolt.eu/en/careers/positions/6098695002,We are looking for a Supply Chain Data Analys...,looking supply chain data analyst join bolt ’ ...,0.172328
916,916,6098696002,Supply Chain Data Analyst,Bolt Market,"{'city': 'Lisbon, Portugal'}",https://bolt.eu/en/careers/positions/6098696002,We are looking for a Supply Chain Data Analys...,looking supply chain data analyst join bolt ’ ...,0.172328
18,18,5835461002,"Analytics Manager, Campaigns",Analytics,"{'city': 'Warsaw, Poland'}",https://bolt.eu/en/careers/positions/5835461002,"As an Analytics Manager in the Campaigns team,...",analytics manager campaigns team ’ leading cam...,0.166337
19,19,5835463002,"Analytics Manager, Campaigns",Analytics,"{'city': 'Tartu, Estonia'}",https://bolt.eu/en/careers/positions/5835463002,"As an Analytics Manager in the Campaigns team,...",analytics manager campaigns team ’ leading cam...,0.166337
21,21,5835462002,"Analytics Manager, Campaigns",Analytics,"{'city': 'Tallinn, Estonia'}",https://bolt.eu/en/careers/positions/5835462002,As an Analytics Manager in the Ride-hailing Ca...,analytics manager ride hailing campaigns team ...,0.163322
20,20,5835456002,"Analytics Manager, Campaigns",Analytics,"{'city': 'Berlin, Germany'}",https://bolt.eu/en/careers/positions/5835456002,As an Analytics Manager in the Ride-hailing Ca...,analytics manager ride hailing campaigns team ...,0.163322
855,855,4877706002,"Senior Software Engineer, Data Engineering",Engineering,"{'city': 'Tallinn, Estonia'}",https://bolt.eu/en/careers/positions/4877706002,Bolt engineering teams are working on unique p...,bolt engineering teams working unique product ...,0.16093
900,900,5845914002,"Software Engineer, Data Engineering",Engineering,"{'city': 'Berlin, Germany'}",https://bolt.eu/en/careers/positions/5845914002,Bolt engineering teams are working on unique p...,bolt engineering teams working unique product ...,0.16093
