# Recommender using Embeddings

Data taken from https://www.kaggle.com/datasets/mexwell/us-software-engineer-jobs

In [1]:
import pandas as pd
from pathlib import Path
file = 'us-software-engineer-jobs-zenrows.csv'

df = pd.read_csv(Path('data',file))
df.shape

(58433, 29)

In [2]:
df.describe()

Unnamed: 0,rating,review_count,ad_id,source_id,job_location_postal
count,58433.0,58433.0,23626.0,58433.0,26121.0
mean,2.56504,1604.46905,337699100.0,6605754.0,52579.873627
std,1.832927,5894.934474,83079660.0,7333885.0,33371.51903
min,0.0,0.0,98269.0,17.0,603.0
25%,0.0,0.0,361697100.0,15710.0,20877.0
50%,3.5,14.0,369429600.0,3370807.0,53122.0
75%,4.0,624.0,371688700.0,13074500.0,87124.0
max,5.0,223345.0,372569600.0,20773080.0,99901.0


In [3]:
columns = ['title', 'company', 'types', 'location', 'snippet']
df = df[columns]
df.shape

(58433, 5)

In [4]:
df[df['snippet'].isnull()].dropna(inplace=True, axis='rows')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[df['snippet'].isnull()].dropna(inplace=True, axis='rows')


In [5]:
# Filter out rows 
mask = df[['title', 'snippet']].isnull().any(axis='columns')
df[mask].index

Index([928, 2796, 24639, 36774, 38167, 39357, 44512, 57079], dtype='int64')

In [6]:
df.drop(index=df[mask].index, inplace=True)
df.shape

(58425, 5)

In [7]:
df.sample(5)

Unnamed: 0,title,company,types,location,snippet
40628,Software Engineer - Crypto Products,TradeStation,,Remote,Work and collaborate with DevOps engineers to ...
56524,Principal Software Engineer,Liberty Mutual Insurance,,Remote,7+ years of software engineering experience.\n...
32993,C# Senior Software Developer (Contractor) - Fr...,DataAxxis,"Full-time, Contract","New York, NY",The Front Office Senior Software Developer wil...
3546,Site Reliability Engineer,SHIELD AI,,"San Diego, CA",Collaborate with a diverse group of supportive...
37609,c# Engineer,Optimum Technologies,Full-time,"San Francisco Bay Area, CA",Versed in software engineering best practices ...


In [22]:
pd.set_option('display.max_colwidth', None)

documents = list(df['snippet'][:5])

In [10]:
from fastembed.embedding import FlagEmbedding as Embedding
import numpy as np

Embedding.list_supported_models()

[{'model': 'BAAI/bge-small-en',
  'dim': 384,
  'description': 'Fast English model',
  'size_in_GB': 0.2},
 {'model': 'BAAI/bge-small-en-v1.5',
  'dim': 384,
  'description': 'Fast and Default English model',
  'size_in_GB': 0.13},
 {'model': 'BAAI/bge-small-zh-v1.5',
  'dim': 512,
  'description': 'Fast and recommended Chinese model',
  'size_in_GB': 0.1},
 {'model': 'BAAI/bge-base-en',
  'dim': 768,
  'description': 'Base English model',
  'size_in_GB': 0.5},
 {'model': 'BAAI/bge-base-en-v1.5',
  'dim': 768,
  'description': 'Base English model, v1.5',
  'size_in_GB': 0.44},
 {'model': 'sentence-transformers/all-MiniLM-L6-v2',
  'dim': 384,
  'description': 'Sentence Transformer model, MiniLM-L6-v2',
  'size_in_GB': 0.09},
 {'model': 'intfloat/multilingual-e5-large',
  'dim': 1024,
  'description': 'Multilingual model, e5-large. Recommend using this model for non-English languages',
  'size_in_GB': 2.24}]

In [23]:
embedding_model = Embedding(model_name="BAAI/bge-small-en-v1.5", max_length=512) 
embeddings: [np.ndarray] = list(embedding_model.embed(documents)) # Note the list() call - this is a generator 

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 76.7M/76.7M [00:06<00:00, 11.6MiB/s]


In [35]:
def print_top_k(query, embeddings, documents, top_k=5):
    query_embedding = list(embedding_model.query_embed(query))[0]
    
    # Calculate cosine similarity.
    scores = np.dot(embeddings, query_embedding)
    
    sorted_scores = np.argsort(scores)[::-1]
    for i in range(top_k):
        print(f"Rank {i+1}: {documents[sorted_scores[i]]}")

In [36]:
print_top_k("I am a backend engineer looking for golang job", embeddings, documents)

Rank 1: This person will be a senior member of the team and will be responsible for architecting, building complex features and providing technical guidance to other…
Rank 2: Proficiency in Agile software development principles is required.
 Advanced knowledge of industry software development methodologies, standards and architecture…
Rank 3: The ideal candidate will have a skill for tough puzzles, a talent for communicating complex ideas simply, and a drive to meet high expectations with great…
Rank 4: Reports to* DIRECTOR OF MARKETING.
 PHP - Equivalent of 3 years education and/or experience.
 Java script - Equivalent of 3 years education and/or experience.
Rank 5: Throughout the day, you will collaborate with your teammates and interact with our clients.
 Benefits! Shockoe offers a comprehensive and competitive benefits…


In [37]:
print_top_k("I am a backend engineer looking for php job", embeddings, documents)

Rank 1: Reports to* DIRECTOR OF MARKETING.
 PHP - Equivalent of 3 years education and/or experience.
 Java script - Equivalent of 3 years education and/or experience.
Rank 2: This person will be a senior member of the team and will be responsible for architecting, building complex features and providing technical guidance to other…
Rank 3: The ideal candidate will have a skill for tough puzzles, a talent for communicating complex ideas simply, and a drive to meet high expectations with great…
Rank 4: Throughout the day, you will collaborate with your teammates and interact with our clients.
 Benefits! Shockoe offers a comprehensive and competitive benefits…
Rank 5: Proficiency in Agile software development principles is required.
 Advanced knowledge of industry software development methodologies, standards and architecture…


In [100]:
recsys_df = df.sample(1000).copy()
recsys_df

Unnamed: 0,title,company,types,location,snippet
7629,Machine Learning Engineer,Capital One,,"Braintree, MA","Contributed to open source ML software.\n Deliver ML software models and components that solve real-world business problems, while working in collaboration with…"
50568,"Software Engineer, Audio Machine Learning",Roblox,,"San Mateo, CA","At least a Bachelors in computer science, engineering, mathematics, machine learning, (computational) physics or statistics, or equivalent industry experience."
51859,Application Programmer Analyst - Informatin Tech Specialist I,Virginia Germanna Community College,Full-time,"Locust Grove, VA","Modifies, maintains and upgrades existing website and intranet environment.\n Oversees college users’ security for VCCS enterprise applications."
8670,10+ years - Java Full Stack Developer - Secaucus NJ,CBL SOLUTIONS INC,"Full-time, Contract",Remote,Build software solutions where the problem is not well defined.\n Assist with the generation and analysis of business and functional requirements for proposed…
47952,Software Engineer - TS/SCI with Polygraph Required,Logistics Management Institute,Full-time,"Herndon, VA",The candidate shall design innovative software solutions to meet client user requirements with the aim of optimizing operational efficiency while complying with…
...,...,...,...,...,...
55560,Back End Engineer- Senior Associate,PRICE WATERHOUSE COOPERS,,"San Jose, CA","Our skilled technologists, data scientists, product managers and business strategists are passionate about using technology to accelerate change."
10089,Technical Lead-II/ SCM .Net,CitiusTech,Full-time,Remote,Title – Technical Lead-II/ SCM .Net.\n Candidate will have technical experience with .\n NET programming and relational databases such as MS SQL Server and/or…
5075,Software Engineer - College Football,Electronic Arts,Full-time,"Orlando, FL","We are looking for engineers who enjoy prototyping and planning, adding amazing new features to an existing and beloved game, and improving existing code."
28070,Software Engineer,JetHead Development,Full-time,"Carlsbad, CA",Work with software testers to resolve issues and ensure test plans cover product requirements.\n Knowledgeable of Linux device drivers and the Linux kernel.


In [102]:
recsys_df['embeddings'] = list(embedding_model.embed(recsys_df['snippet']))

In [103]:
def top_k_df(df, query, top_k=5):
    query_embedding = list(embedding_model.query_embed(query))[0]
    scores = np.dot(recsys_df['embeddings'].tolist(), query_embedding)
    scores_sorted = np.argsort(scores)[::-1]
    return recsys_df.iloc[[scores_sorted[i] for i in range(top_k)]]

In [133]:
preference = 'I like backend development using golang'
result = top_k_df(recsys_df, preference).drop(columns='embeddings')
result

Unnamed: 0,title,company,types,location,snippet
12050,Backend Engineer,Tempus Ex,,"Atlanta, GA","As a backend engineer, you will help build out our GraphQL API and support our R&amp;D data engineering efforts.\n Experience with modern, safe backend languages such…"
57637,"Full Stack DevOps Engineer | Portland, OR",Apexon,Contract,"Portland, OR","Our client based in Portland, OR looking for Full Stack DevOps Engineer.\n Develop full stack applications (vue.js frontend, node or python/flask backend)."
21162,"Software Engineer - Opportunity for Working Remotely Philadelphia, PA",Carbon Black,Full-time,"Philadelphia, PA","Knowledge of building high quality software in Java and python/Golang.\n Experience building enterprise-level software with a focus on performance, scalability,…"
47484,Senior Software Developer - Full Stack,Stilt,,Remote,Work with backend/frontend engineers to integrate the solution in current product flow.\n Stilt (YC W16) is building an online bank for those typically…
10447,"Sr. Engineer, Site Reliability Engineering",Nordstrom,Full-time,"Seattle, WA",Author software using Go and Open Source.\n Experience with popular operations software and concepts.\n Proficiency with software development in a well-known…


In [134]:
recommendations = ['']
for _, row in result.iterrows():
    recommendations.append('\n'.join(': '.join(map(str, item)) for item in row.items()))
        
recommendation_text = '\n---\n'.join(recommendations)
print(recommendation_text)


---
title: Backend Engineer
company: Tempus Ex
types: nan
location: Atlanta, GA
snippet: As a backend engineer, you will help build out our GraphQL API and support our R&amp;D data engineering efforts.
 Experience with modern, safe backend languages such…
---
title: Full Stack DevOps Engineer | Portland, OR
company: Apexon
types: Contract
location: Portland, OR
snippet: Our client based in Portland, OR looking for Full Stack DevOps Engineer.
 Develop full stack applications (vue.js frontend, node or python/flask backend).
---
title: Software Engineer - Opportunity for Working Remotely Philadelphia, PA
company: Carbon Black
types: Full-time
location: Philadelphia, PA
snippet: Knowledge of building high quality software in Java and python/Golang.
 Experience building enterprise-level software with a focus on performance, scalability,…
---
title: Senior Software Developer - Full Stack
company: Stilt
types: nan
location: Remote
snippet: Work with backend/frontend engineers to integrate th

In [129]:
import gemini

In [136]:
prompt = """
You are given the following job listing. They have been filtered from a list of possible jobs. Based on the user's preference, try to entice them that one of this jobs are the one for them:
{listing}

Preference: {preference}
Here's my recommendation:
"""

In [137]:
result = gemini.generate_content(prompt.format(listing=recommendation_text, preference=preference))
print(result.text)

**Sr. Engineer, Site Reliability Engineering** at **Nordstrom** in **Seattle, WA**

**Why this job is perfect for you:**

* You're passionate about backend development using Go.
* You have experience with popular operations software and concepts.
* You're proficient with software development in a well-known programming language.

At Nordstrom, you'll have the opportunity to:

* Author software using Go and Open Source.
* Work on a team of talented engineers who are passionate about building high-quality software.
* Make a real impact on the company's bottom line.

Nordstrom is a Fortune 500 company with a strong reputation for innovation and customer service. They offer competitive salaries and benefits, as well as a supportive and collaborative work environment.

If you're looking for a challenging and rewarding career in backend development, this job is perfect for you. Apply today!


In [138]:
from IPython.display import Markdown

Markdown(result.text)

**Sr. Engineer, Site Reliability Engineering** at **Nordstrom** in **Seattle, WA**

**Why this job is perfect for you:**

* You're passionate about backend development using Go.
* You have experience with popular operations software and concepts.
* You're proficient with software development in a well-known programming language.

At Nordstrom, you'll have the opportunity to:

* Author software using Go and Open Source.
* Work on a team of talented engineers who are passionate about building high-quality software.
* Make a real impact on the company's bottom line.

Nordstrom is a Fortune 500 company with a strong reputation for innovation and customer service. They offer competitive salaries and benefits, as well as a supportive and collaborative work environment.

If you're looking for a challenging and rewarding career in backend development, this job is perfect for you. Apply today!