<a href="https://colab.research.google.com/github/YaserMarey/my_openai_colab/blob/master/job_recommendation/job_recommendation_using_openai_text_embedding_ada_002.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 1. Install

The dataset of job descriptions I collected from Linkedin. total number of job descriptions is 10. I will combine the title, qualifications and responsibilites into a single combined text. I will use openai text-embedding-ada-002 model to encode this combined text and it will output a single vector embedding.

In [None]:
!export OPENAI_API_KEY='sk-PUT YOUR OPEN AI KEY'

To run this notebook, you will need to install: pandas, openai, transformers, plotly, matplotlib, scikit-learn, torch (transformer dep), torchvision, and scipy.

In [None]:
!pip install pandas openai transformers plotly matplotlib scikit-learn torch torchvision scipy tiktoken

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### 2. Define embedding model and encoding

In [None]:
# imports
import pandas as pd
import tiktoken
from openai.embeddings_utils import get_embedding, cosine_similarity


In [None]:
# embedding model parameters
embedding_model = "text-embedding-ada-002"
embedding_encoding = "cl100k_base"  # this the encoding for text-embedding-ada-002
max_tokens = 8000  # the maximum for text-embedding-ada-002 is 8191


### 3. Define recommendation based on cosine similarity


In [None]:
# search through the reviews for a specific product
def recommend_jobs(df, resume, n=3, pprint=True):
    resume_embedding = get_embedding(resume, engine=embedding_model)
    df["similarity"] = df.embedding.apply(lambda x: cosine_similarity(x, resume_embedding))

    results = (df.sort_values("similarity", ascending=False).head(n))
    return results


### 4. load & inspect dataset

In [None]:
input_datapath = "/content/data/job_descriptions_10_linkedin.csv"  # to save space, we provide a pre-filtered dataset
df = pd.read_csv(input_datapath, encoding = "ISO-8859-1")
df

Unnamed: 0.1,Unnamed: 0,Title,Employer,Country,Remote/Onsite/Hybrid,About,Responsibilities,Requirements,What job offers,combined,n_tokens,embedding
0,0,Senior Data Scientist ML,Binance.US,"Dubai, Dubai, United Arab Emirates",Remote,"Launched in 2019, BAM Management US Holdings I...",Identify potential business opportunities with...,3+ years of experience in Data Science/Analyti...,,"Senior Data Scientist ML;Dubai, Dubai, United ...",317,"[-0.02436540648341179, -0.03042932040989399, 0..."
1,1,AI Engineer,LiquidX Studio,United Arab Emirates,Remote,LiquidX Studio is a gaming development studio ...,Building challenging and fun AI that our playe...,"Thoughtful Problem Solving: For you, problem-s...",Great work environment\nAttractive salary & be...,AI Engineer;United Arab Emirates;Remote;Liquid...,297,"[0.007941177114844322, -0.010572289116680622, ..."
2,2,Data Analyst / Data Scientist,nybl,"Dubai, Dubai, United Arab Emirates",Remote,nybl is looking for our next generation of dat...,work closely with nybl to identify issues and ...,Experience and knowledge in statistical and da...,,"Data Analyst / Data Scientist;Dubai, Dubai, Un...",575,"[-0.037207040935754776, -0.02034076116979122, ..."
3,3,Research Scientist,NEOM,"Tabuk, Saudi Arabia",Onsite,The NEOM project is being built from the groun...,Field and experimental work.\nData analysis\nI...,,,"Research Scientist;Tabuk, Saudi Arabia;Onsite;...",295,"[0.0018988176016137004, -0.021261312067508698,..."
4,4,\nData Scientist [NLP Expert; Artificial Intel...,Armaco Chemical Processes Systems Pvt Ltd,"Jiddah, Makkah, Saudi Arabia",Onsite,ramco occupies a unique position in the global...,Digital Transformation (DT) is responsible for...,As the successful candidate you will hold a Ma...,,Data Scientist [NLP Expert; Artificial Intelli...,766,"[-0.024973710998892784, -0.01902099698781967, ..."
5,5,NLP Developer,Jobskey Search and Selection,"ubail, Eastern, Saudi Arabia",Hybrid,"As the successful candidate, you will hold deg...",Work with stakeholders throughout the organiza...,"Tokenization, classification and preprocessing...",,"NLP Developer;ubail, Eastern, Saudi Arabia;Hyb...",663,"[-0.00904441624879837, -0.0057301949709653854,..."
6,6,Artificial Intelligence Researcher,AL MOZN AI,Riyadh Region,Onsite,AI Research Scientists have tasks of designing...,,Minimum Qualifications\n\nPh.D. and publicatio...,,Artificial Intelligence Researcher;Riyadh Regi...,429,"[-0.0011712032137438655, -0.001034254790283739..."
7,7,Machine Learning Engineer,IQVIA,"Riyadh, Riyadh, Saudi Arabia",Onsite,Position summary:\n\nML engineers typically wo...,Key Responsibilities:\nConsulting with manager...,Minimum 5+ experience in related field\nBachel...,,"Machine Learning Engineer;Riyadh, Riyadh, Saud...",361,"[-0.02034110017120838, -0.009696582332253456, ..."
8,8,Computer Vision Engineer,Turing,Egypt,Remote,A US-based company pioneering data-driven virt...,"Design, develop, ship, and maintain web-based ...","BachelorÂs/MasterÂs degree in Engineering, C...",,Computer Vision Engineer;Egypt;Remote;A US-bas...,268,"[-0.006810328457504511, -0.015718238428235054,..."
9,9,Software Engineering Manager,Careem,"Alexandria, Egypt",Onsite,At Careem we are led by a powerful purpose to ...,Lead a team of software engineers in implement...,You have strong software engineering skills wi...,n addition to a competitive long-term total co...,"Software Engineering Manager;Alexandria, Egypt...",666,"[0.023302186280488968, -0.013660808093845844, ..."


### 5. Calcualte embedding for job descriptions

In [None]:
df = df[["Title", "Employer", "Country", "Remote/Onsite/Hybrid", "About", "Responsibilities", "Requirements", "What job offers"]]
df.fillna('', inplace=True)
df["combined"] = df["Title"].str.strip() + ";" + df["Country"].str.strip() + ";" + df["Remote/Onsite/Hybrid"].str.strip() + ";" + df["About"].str.strip()  + ";" + df["Responsibilities"].str.strip()  + ";" + df["Requirements"].str.strip()
df["combined"]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().fillna(
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["combined"] = df["Title"].str.strip() + ";" + df["Country"].str.strip() + ";" + df["Remote/Onsite/Hybrid"].str.strip() + ";" + df["About"].str.strip()  + ";" + df["Responsibilities"].str.strip()  + ";" + df["Requirements"].str.strip()


0    Senior Data Scientist ML;Dubai, Dubai, United ...
1    AI Engineer;United Arab Emirates;Remote;Liquid...
2    Data Analyst / Data Scientist;Dubai, Dubai, Un...
3    Research Scientist;Tabuk, Saudi Arabia;Onsite;...
4    Data Scientist [NLP Expert; Artificial Intelli...
5    NLP Developer;ubail, Eastern, Saudi Arabia;Hyb...
6    Artificial Intelligence Researcher;Riyadh Regi...
7    Machine Learning Engineer;Riyadh, Riyadh, Saud...
8    Computer Vision Engineer;Egypt;Remote;A US-bas...
9    Software Engineering Manager;Alexandria, Egypt...
Name: combined, dtype: object

In [None]:
encoding = tiktoken.get_encoding("cl100k_base")
# should print [83, 1609, 5963, 374, 2294, 0]
encoding.encode("tiktoken is great!")



[83, 1609, 5963, 374, 2294, 0]

In [None]:
# filter too long resumes to be encoded
df["n_tokens"] = df.combined.apply(lambda x: len(encoding.encode(str(x))))
df = df[df.n_tokens <= max_tokens]
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["n_tokens"] = df.combined.apply(lambda x: len(encoding.encode(str(x))))


Unnamed: 0,Title,Employer,Country,Remote/Onsite/Hybrid,About,Responsibilities,Requirements,What job offers,combined,n_tokens
0,Senior Data Scientist ML,Binance.US,"Dubai, Dubai, United Arab Emirates",Remote,"Launched in 2019, BAM Management US Holdings I...",Identify potential business opportunities with...,3+ years of experience in Data Science/Analyti...,,"Senior Data Scientist ML;Dubai, Dubai, United ...",318
1,AI Engineer,LiquidX Studio,United Arab Emirates,Remote,LiquidX Studio is a gaming development studio ...,Building challenging and fun AI that our playe...,"Thoughtful Problem Solving: For you, problem-s...",Great work environment\nAttractive salary & be...,AI Engineer;United Arab Emirates;Remote;Liquid...,299
2,Data Analyst / Data Scientist,nybl,"Dubai, Dubai, United Arab Emirates",Remote,nybl is looking for our next generation of dat...,work closely with nybl to identify issues and ...,Experience and knowledge in statistical and da...,,"Data Analyst / Data Scientist;Dubai, Dubai, Un...",575
3,Research Scientist,NEOM,"Tabuk, Saudi Arabia",Onsite,The NEOM project is being built from the groun...,Field and experimental work.\nData analysis\nI...,,,"Research Scientist;Tabuk, Saudi Arabia;Onsite;...",295
4,\nData Scientist [NLP Expert; Artificial Intel...,Armaco Chemical Processes Systems Pvt Ltd,"Jiddah, Makkah, Saudi Arabia",Onsite,ramco occupies a unique position in the global...,Digital Transformation (DT) is responsible for...,As the successful candidate you will hold a Ma...,,Data Scientist [NLP Expert; Artificial Intelli...,766
5,NLP Developer,Jobskey Search and Selection,"ubail, Eastern, Saudi Arabia",Hybrid,"As the successful candidate, you will hold deg...",Work with stakeholders throughout the organiza...,"Tokenization, classification and preprocessing...",,"NLP Developer;ubail, Eastern, Saudi Arabia;Hyb...",663
6,Artificial Intelligence Researcher,AL MOZN AI,Riyadh Region,Onsite,AI Research Scientists have tasks of designing...,,Minimum Qualifications\n\nPh.D. and publicatio...,,Artificial Intelligence Researcher;Riyadh Regi...,430
7,Machine Learning Engineer,IQVIA,"Riyadh, Riyadh, Saudi Arabia",Onsite,Position summary:\n\nML engineers typically wo...,Key Responsibilities:\nConsulting with manager...,Minimum 5+ experience in related field\nBachel...,,"Machine Learning Engineer;Riyadh, Riyadh, Saud...",361
8,Computer Vision Engineer,Turing,Egypt,Remote,A US-based company pioneering data-driven virt...,"Design, develop, ship, and maintain web-based ...","BachelorÂs/MasterÂs degree in Engineering, C...",,Computer Vision Engineer;Egypt;Remote;A US-bas...,270
9,Software Engineering Manager,Careem,"Alexandria, Egypt",Onsite,At Careem we are led by a powerful purpose to ...,Lead a team of software engineers in implement...,You have strong software engineering skills wi...,n addition to a competitive long-term total co...,"Software Engineering Manager;Alexandria, Egypt...",669


In [None]:
# Ensure you have your API key set in your environment per the README: https://github.com/openai/openai-python#usage
# This may take a few minutes
df["embedding"] = df.combined.apply(lambda x: get_embedding(x, engine=embedding_model))
df.to_csv("data/job_descriptions_10_linkedin.csv")
df


Unnamed: 0,Title,Employer,Country,Remote/Onsite/Hybrid,About,Responsibilities,Requirements,What job offers,combined,n_tokens,embedding
0,Senior Data Scientist ML,Binance.US,"Dubai, Dubai, United Arab Emirates",Remote,"Launched in 2019, BAM Management US Holdings I...",Identify potential business opportunities with...,3+ years of experience in Data Science/Analyti...,,"Senior Data Scientist ML;Dubai, Dubai, United ...",318,"[-0.023338189348578453, -0.02993198111653328, ..."
1,AI Engineer,LiquidX Studio,United Arab Emirates,Remote,LiquidX Studio is a gaming development studio ...,Building challenging and fun AI that our playe...,"Thoughtful Problem Solving: For you, problem-s...",Great work environment\nAttractive salary & be...,AI Engineer;United Arab Emirates;Remote;Liquid...,299,"[0.008481680415570736, -0.011898957192897797, ..."
2,Data Analyst / Data Scientist,nybl,"Dubai, Dubai, United Arab Emirates",Remote,nybl is looking for our next generation of dat...,work closely with nybl to identify issues and ...,Experience and knowledge in statistical and da...,,"Data Analyst / Data Scientist;Dubai, Dubai, Un...",575,"[-0.037207040935754776, -0.02034076116979122, ..."
3,Research Scientist,NEOM,"Tabuk, Saudi Arabia",Onsite,The NEOM project is being built from the groun...,Field and experimental work.\nData analysis\nI...,,,"Research Scientist;Tabuk, Saudi Arabia;Onsite;...",295,"[0.0018988176016137004, -0.021261312067508698,..."
4,\nData Scientist [NLP Expert; Artificial Intel...,Armaco Chemical Processes Systems Pvt Ltd,"Jiddah, Makkah, Saudi Arabia",Onsite,ramco occupies a unique position in the global...,Digital Transformation (DT) is responsible for...,As the successful candidate you will hold a Ma...,,Data Scientist [NLP Expert; Artificial Intelli...,766,"[-0.024974431842565536, -0.018761955201625824,..."
5,NLP Developer,Jobskey Search and Selection,"ubail, Eastern, Saudi Arabia",Hybrid,"As the successful candidate, you will hold deg...",Work with stakeholders throughout the organiza...,"Tokenization, classification and preprocessing...",,"NLP Developer;ubail, Eastern, Saudi Arabia;Hyb...",663,"[-0.00904441624879837, -0.0057301949709653854,..."
6,Artificial Intelligence Researcher,AL MOZN AI,Riyadh Region,Onsite,AI Research Scientists have tasks of designing...,,Minimum Qualifications\n\nPh.D. and publicatio...,,Artificial Intelligence Researcher;Riyadh Regi...,430,"[-0.0008614232065156102, -0.000480020069517195..."
7,Machine Learning Engineer,IQVIA,"Riyadh, Riyadh, Saudi Arabia",Onsite,Position summary:\n\nML engineers typically wo...,Key Responsibilities:\nConsulting with manager...,Minimum 5+ experience in related field\nBachel...,,"Machine Learning Engineer;Riyadh, Riyadh, Saud...",361,"[-0.02034110017120838, -0.009696582332253456, ..."
8,Computer Vision Engineer,Turing,Egypt,Remote,A US-based company pioneering data-driven virt...,"Design, develop, ship, and maintain web-based ...","BachelorÂs/MasterÂs degree in Engineering, C...",,Computer Vision Engineer;Egypt;Remote;A US-bas...,270,"[-0.006575488485395908, -0.01827809028327465, ..."
9,Software Engineering Manager,Careem,"Alexandria, Egypt",Onsite,At Careem we are led by a powerful purpose to ...,Lead a team of software engineers in implement...,You have strong software engineering skills wi...,n addition to a competitive long-term total co...,"Software Engineering Manager;Alexandria, Egypt...",669,"[0.022755378857254982, -0.014074817299842834, ..."


In [None]:
import numpy as np
df["embedding"] = df.embedding.apply(np.array)
df


Unnamed: 0,Title,Employer,Country,Remote/Onsite/Hybrid,About,Responsibilities,Requirements,What job offers,combined,n_tokens,embedding
0,Senior Data Scientist ML,Binance.US,"Dubai, Dubai, United Arab Emirates",Remote,"Launched in 2019, BAM Management US Holdings I...",Identify potential business opportunities with...,3+ years of experience in Data Science/Analyti...,,"Senior Data Scientist ML;Dubai, Dubai, United ...",318,"[-0.023338189348578453, -0.02993198111653328, ..."
1,AI Engineer,LiquidX Studio,United Arab Emirates,Remote,LiquidX Studio is a gaming development studio ...,Building challenging and fun AI that our playe...,"Thoughtful Problem Solving: For you, problem-s...",Great work environment\nAttractive salary & be...,AI Engineer;United Arab Emirates;Remote;Liquid...,299,"[0.008481680415570736, -0.011898957192897797, ..."
2,Data Analyst / Data Scientist,nybl,"Dubai, Dubai, United Arab Emirates",Remote,nybl is looking for our next generation of dat...,work closely with nybl to identify issues and ...,Experience and knowledge in statistical and da...,,"Data Analyst / Data Scientist;Dubai, Dubai, Un...",575,"[-0.037207040935754776, -0.02034076116979122, ..."
3,Research Scientist,NEOM,"Tabuk, Saudi Arabia",Onsite,The NEOM project is being built from the groun...,Field and experimental work.\nData analysis\nI...,,,"Research Scientist;Tabuk, Saudi Arabia;Onsite;...",295,"[0.0018988176016137004, -0.021261312067508698,..."
4,\nData Scientist [NLP Expert; Artificial Intel...,Armaco Chemical Processes Systems Pvt Ltd,"Jiddah, Makkah, Saudi Arabia",Onsite,ramco occupies a unique position in the global...,Digital Transformation (DT) is responsible for...,As the successful candidate you will hold a Ma...,,Data Scientist [NLP Expert; Artificial Intelli...,766,"[-0.024974431842565536, -0.018761955201625824,..."
5,NLP Developer,Jobskey Search and Selection,"ubail, Eastern, Saudi Arabia",Hybrid,"As the successful candidate, you will hold deg...",Work with stakeholders throughout the organiza...,"Tokenization, classification and preprocessing...",,"NLP Developer;ubail, Eastern, Saudi Arabia;Hyb...",663,"[-0.00904441624879837, -0.0057301949709653854,..."
6,Artificial Intelligence Researcher,AL MOZN AI,Riyadh Region,Onsite,AI Research Scientists have tasks of designing...,,Minimum Qualifications\n\nPh.D. and publicatio...,,Artificial Intelligence Researcher;Riyadh Regi...,430,"[-0.0008614232065156102, -0.000480020069517195..."
7,Machine Learning Engineer,IQVIA,"Riyadh, Riyadh, Saudi Arabia",Onsite,Position summary:\n\nML engineers typically wo...,Key Responsibilities:\nConsulting with manager...,Minimum 5+ experience in related field\nBachel...,,"Machine Learning Engineer;Riyadh, Riyadh, Saud...",361,"[-0.02034110017120838, -0.009696582332253456, ..."
8,Computer Vision Engineer,Turing,Egypt,Remote,A US-based company pioneering data-driven virt...,"Design, develop, ship, and maintain web-based ...","BachelorÂs/MasterÂs degree in Engineering, C...",,Computer Vision Engineer;Egypt;Remote;A US-bas...,270,"[-0.006575488485395908, -0.01827809028327465, ..."
9,Software Engineering Manager,Careem,"Alexandria, Egypt",Onsite,At Careem we are led by a powerful purpose to ...,Lead a team of software engineers in implement...,You have strong software engineering skills wi...,n addition to a competitive long-term total co...,"Software Engineering Manager;Alexandria, Egypt...",669,"[0.022755378857254982, -0.014074817299842834, ..."


In [None]:
file = open('data/resume.txt', 'r', encoding = "ISO-8859-1" )
resume_contents = file.read()
file.close()
resume_contents



"Yaser Mohye Marey\nMSc. in Computer Science - Machine Learning\nhttps://yasermarey.github.io/ https://www.kaggle.com/yasermarey\n https://github.com/YaserMarey, https://medium.com/@yasser.maree\nyasser_maree@hotmail.com\n+201017332998\n\nProfile\nVeteran software engineer with extensive hands-on experience in software systems envisioning and development. Master's in Computer Science from Georgia Institute of Technology specializing in Machine Learning. Crafted my first Neural Network in 1996 and have updated knowledge in Deep Learning, DNN, CNN, and RNN applied to Computer Vision and NLP problems. I have solid hands-on experience in JavaScript, Python, R, and C#.\nEducation\nMaster of Science in Computer Science with a focus on Machine Learning \nGeorgia Institute of Technology, GPA 3.58, 2020, Atlanta, USA\n\nSome of the Master\x92s coursework included:\n* For Computer Vision: 1) Detecting traffic signals and lights using Hough Transform. 2) Motion Detection using Pyramidal Lucas and

In [None]:
results = recommend_jobs(df, resume_contents, n=10)

In [None]:
display(results[['Title', 'About','Responsibilities','Requirements','similarity']])

Unnamed: 0,Title,About,Responsibilities,Requirements,similarity
5,NLP Developer,"As the successful candidate, you will hold deg...",Work with stakeholders throughout the organiza...,"Tokenization, classification and preprocessing...",0.831803
6,Artificial Intelligence Researcher,AI Research Scientists have tasks of designing...,,Minimum Qualifications\n\nPh.D. and publicatio...,0.821803
7,Machine Learning Engineer,Position summary:\n\nML engineers typically wo...,Key Responsibilities:\nConsulting with manager...,Minimum 5+ experience in related field\nBachel...,0.817558
2,Data Analyst / Data Scientist,nybl is looking for our next generation of dat...,work closely with nybl to identify issues and ...,Experience and knowledge in statistical and da...,0.811574
4,\nData Scientist [NLP Expert; Artificial Intel...,ramco occupies a unique position in the global...,Digital Transformation (DT) is responsible for...,As the successful candidate you will hold a Ma...,0.810253
8,Computer Vision Engineer,A US-based company pioneering data-driven virt...,"Design, develop, ship, and maintain web-based ...","BachelorÂs/MasterÂs degree in Engineering, C...",0.808425
9,Software Engineering Manager,At Careem we are led by a powerful purpose to ...,Lead a team of software engineers in implement...,You have strong software engineering skills wi...,0.800007
3,Research Scientist,The NEOM project is being built from the groun...,Field and experimental work.\nData analysis\nI...,,0.786109
0,Senior Data Scientist ML,"Launched in 2019, BAM Management US Holdings I...",Identify potential business opportunities with...,3+ years of experience in Data Science/Analyti...,0.775974
1,AI Engineer,LiquidX Studio is a gaming development studio ...,Building challenging and fun AI that our playe...,"Thoughtful Problem Solving: For you, problem-s...",0.765223
