# Purpose

- Using OpenAI to find the discrepancy between the user existing skillset and the recommeneded job
- Exploring different embedding techniques (Tf-IDF, Word2Vec, BERT and SBERT) and generate a simple course recommendation system using weighted consine similarity score

Select from Part 1 Job Recommendation

Get the user and course info ready to be input into GPT API

# Discrepancy Prompt

In [178]:
# Replace 'xxxxxx' with your OpenAI API key
from openai import OpenAI
client = OpenAI(api_key='xxxxxx')

In [179]:
user_skills = [
    "C", "C#", "C++", "HTML/CSS", "JavaScript", "PHP", "Python", "Rust", "SQL", 
    "TypeScript", "jQuery", "Next.js", "Node.js", "React", "Svelte", "Vue.js", 
    "Electron", "Flutter", "NPM", "Unreal Engine", "Visual Studio", "Vim", 
    "Visual Studio Code", "Discord", "WhatsApp", "ChatGPT", "Codeium", "GitHub", "Copilot"
]

In [180]:
# User info
user_info = {
    "current_skills": user_skills,
}

# Top 2 jobs recommended for the user
jobs_info = [
    {
        "job_title": "Full Stack Java Developer",
        "required_skills": ["Java/J2EE, REACT JavaScript, HTML5, CSS3, UI/UX"],
        "job_description": "Our client looking for a 8+ Years experienced Java/J2EE Full Stack Developer in NYCDescription:â€¢ Strong experience in JavaScriptâ€¢ Experience in Reactâ€¢ Experience in Full stack Web Developping with strong in UI/UX senseâ€¢ Core Java, J2EE, Development experienceâ€¢ Experience in Middle-tier and Backendâ€¢ Experience in API"
    },
    {
        "job_title": "Front end javascript Developer",
        "required_skills": ["JavaScript", "HTML5", "CSS", "Node.js", "AngularJS", "React", "Unity3D", "OpenGL", "iOS", "Android", "SASS", "LESS"],
        "job_description": "Â HOT POSITION !!!Job Description: Front-End DeveloperLocation: San Jose, CAMandatory : GITHUB /PORTFOLIO LINK.......Â The Chief Technology and Architecture Office, establishes the technology vision and strategy for the Client Development organization.Â Through cross-functional engagement and close customer interactions, the CTAO team leads disruptive innovation transition and captures customer mindshare.Â CTAO Experience Design Group is looking for an exceptionally technical, creative, and talented JavaScript Developer to join our User Experience Innovation Team. Â Â The team develops forward-looking prototypes, visualizations and products across desktop and mobile platforms.Â Help us bring the next Internet vision to life! This is a unique opportunity to influence dynamic front ends that are driving next generation cloud services in SDN, analytics, security, and network management.Â Minimum QualificationsHTML5, CSS3, JavaScriptDeep knowledge of JavaScriptExperience with Node.js, AngularJS, and/or React3+ years of front-end web developmentÂ Nice-To-Have ExperienceCSS Preprocessors (SASS, LESS)Responsive UI Frameworks (Foundation, Bootstrap)Connecting front-end UI to back-end APIsContinuous integration processes and toolsUnity3D or OpenGL for AR/VR or game developmentiOS/Android native developmentÂ Â Â KeywordsJavaScript, HTML5, CSS, Node.js, AngularJS, React, Unity3D, OpenGL, iOS, AndroidÂ If interested please do mail me your resume at asen@netpace.com or call me at 805 298 0704Thanks,Annie SenNetpace, Inc"
    }
]


# Function to generate the prompt for the ChatGPT API based on user and job info
def generate_discrepancy_prompt(user_info, jobs_info):
    prompt = "Identify the skills discrepancies for a user based on their profile and recommended jobs. "
    prompt += f"The user has the following current skills: {', '.join(user_info['current_skills'])}. "
    prompt += "Here are the top job recommendations, their skill requirements and job descriptions:\n"
    
    for job in jobs_info:
        prompt += f"- Job Title: {job['job_title']}, Required Skills: {', '.join(job['required_skills'])}, Job Description: {job['job_description']}\n"
    return prompt

# Generate the prompt
prompt = generate_discrepancy_prompt(user_info, jobs_info)

print(prompt)


Identify the skills discrepancies for a user based on their profile and recommended jobs. The user has the following current skills: C, C#, C++, HTML/CSS, JavaScript, PHP, Python, Rust, SQL, TypeScript, jQuery, Next.js, Node.js, React, Svelte, Vue.js, Electron, Flutter, NPM, Unreal Engine, Visual Studio, Vim, Visual Studio Code, Discord, WhatsApp, ChatGPT, Codeium, GitHub, Copilot. Here are the top job recommendations, their skill requirements and job descriptions:
- Job Title: Full Stack Java Developer, Required Skills: Java/J2EE, REACT JavaScript, HTML5, CSS3, UI/UX, Job Description: Our client looking for a 8+ Years experienced Java/J2EE Full Stack Developer in NYCDescription:â€¢ Strong experience in JavaScriptâ€¢ Experience in Reactâ€¢ Experience in Full stack Web Developping with strong in UI/UX senseâ€¢ Core Java, J2EE, Development experienceâ€¢ Experience in Middle-tier and Backendâ€¢ Experience in API
- Job Title: Front end javascript Developer, Required Skills: JavaScript, HTM

In [181]:
# Define the system message
system_msg = prompt

# Define the user message
# user_msg = "For each job, list the current skills the user lacks, the desired skills that match the job requirements, and any additional skills the user might consider developing. The output should be in point form and limited to 5 points."
user_msg = "Analyze the skill discrepancies between all the jobs prvovided and the current skills that user have. Indicate the lacking skills. Output skills after 'Lacking skills:'. Seperate skills by comma."

# Create a dataset using GPT
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": system_msg},
        {"role": "user", "content": user_msg}
        ])

In [182]:
print(response.choices[0].message.content)
print(response.usage.total_tokens)

Lacking skills: Java/J2EE, UI/UX, AngularJS, Unity3D, OpenGL, iOS, Android, SASS, LESS, AR/VR, game development, connecting front-end UI to back-end APIs, Continuous integration processes and tools.
700


In [183]:
discrepancy = response.choices[0].message.content
discrepancy

'Lacking skills: Java/J2EE, UI/UX, AngularJS, Unity3D, OpenGL, iOS, Android, SASS, LESS, AR/VR, game development, connecting front-end UI to back-end APIs, Continuous integration processes and tools.'

# GPT Output Preprocessing

In [184]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Using regex to find all content after Lacking Skills:
lacking_skills = re.findall(r"Lacking skills:\s*(.*)", discrepancy, re.DOTALL)
if lacking_skills:
    lacking_skills = lacking_skills[0]
print("Lacking skills: ", lacking_skills)

# tokenizing the text
tokens = word_tokenize(lacking_skills)
print("Tokens:", tokens)

# removing stop words
stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words and word.isalpha()]
# lowercase
filtered_tokens = [word.lower() for word in filtered_tokens]

print("Filtered Tokens:", filtered_tokens)

Lacking skills:  Java/J2EE, UI/UX, AngularJS, Unity3D, OpenGL, iOS, Android, SASS, LESS, AR/VR, game development, connecting front-end UI to back-end APIs, Continuous integration processes and tools.
Tokens: ['Java/J2EE', ',', 'UI/UX', ',', 'AngularJS', ',', 'Unity3D', ',', 'OpenGL', ',', 'iOS', ',', 'Android', ',', 'SASS', ',', 'LESS', ',', 'AR/VR', ',', 'game', 'development', ',', 'connecting', 'front-end', 'UI', 'to', 'back-end', 'APIs', ',', 'Continuous', 'integration', 'processes', 'and', 'tools', '.']
Filtered Tokens: ['angularjs', 'opengl', 'ios', 'android', 'sass', 'less', 'game', 'development', 'connecting', 'ui', 'apis', 'continuous', 'integration', 'processes', 'tools']


In [185]:
skill_gap = [' '.join(filtered_tokens)]
skill_gap

['angularjs opengl ios android sass less game development connecting ui apis continuous integration processes tools']

# Course Info Preprocessing

In [186]:
# get course data
import pandas as pd
import os

# Set the directory where the CSV file is located
directory = '/Users/auroraxu/Desktop/BT4222/BT4222 Proj Code/Data/'
course_file = 'Course Data/Course_clean.csv'
job_file = 'processed_jobs.csv'
user_file = 'processed_users.csv'

# Full path to the CSV file
course_path = os.path.join(directory, course_file)
job_path = os.path.join(directory, job_file)
user_path = os.path.join(directory, user_file)

course = pd.read_csv(course_path)
job = pd.read_csv(job_path)
user = pd.read_csv(user_path)


In [187]:
print(course.shape)
print(job.shape)
print(user.shape)

(13942, 6)
(22000, 14)
(50000, 86)


In [188]:
job

Unnamed: 0,advertiserurl,company,employmenttype_jobstatus,jobdescription,jobid,joblocation_address,jobtitle,postdate,shift,site_name,skills,uniq_id,Job_Features_Merged,processed_text
0,https://www.dice.com/jobs/detail/AUTOMATION-TE...,"Digital Intelligence Systems, LLC","C2H Corp-To-Corp, C2H Independent, C2H W2, 3 M...",Looking for Selenium engineers...must have sol...,Dice Id : 10110693,"Atlanta, GA",AUTOMATION TEST ENGINEER,1 hour ago,Telecommuting not available|Travel not required,,,418ff92580b270ef4e7c14f0ddfc36b4,AUTOMATION TEST ENGINEER Looking for Selenium ...,automation test engineer looking selenium engi...
1,https://www.dice.com/jobs/detail/Information-S...,University of Chicago/IT Services,Full Time,The University of Chicago has a rapidly growin...,Dice Id : 10114469,"Chicago, IL",Information Security Engineer,1 week ago,Telecommuting not available|Travel not required,,"linux/unix, network monitoring, incident respo...",8aec88cba08d53da65ab99cf20f6f9d9,Information Security Engineer The University o...,information security engineer university chica...
2,https://www.dice.com/jobs/detail/Business-Solu...,"Galaxy Systems, Inc.",Full Time,"GalaxE.SolutionsEvery day, our solutions affec...",Dice Id : CXGALXYS,"Schaumburg, IL",Business Solutions Architect,2 weeks ago,Telecommuting not available|Travel not required,,"Enterprise Solutions Architecture, business in...",46baa1f69ac07779274bcd90b85d9a72,Business Solutions Architect GalaxE.SolutionsE...,business solution architect galaxe.solutionsev...
3,https://www.dice.com/jobs/detail/Java-Develope...,TransTech LLC,Full Time,Java DeveloperFull-time/direct-hireBolingbrook...,Dice Id : 10113627,"Bolingbrook, IL","Java Developer (mid level)- FT- GREAT culture,...",2 weeks ago,Telecommuting not available|Travel not required,,Please see job description,3941b2f206ae0f900c4fba4ac0b18719,"Java Developer (mid level)- FT- GREAT culture,...",java developer ( mid level ) - ft- great cultu...
4,https://www.dice.com/jobs/detail/DevOps-Engine...,Matrix Resources,Full Time,Midtown based high tech firm has an immediate ...,Dice Id : matrixga,"Atlanta, GA",DevOps Engineer,48 minutes ago,Telecommuting not available|Travel not required,,"Configuration Management, Developer, Linux, Ma...",45efa1f6bc65acc32bbbb953a1ed13b7,DevOps Engineer Midtown based high tech firm h...,devops engineer midtown based high tech firm i...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21995,https://www.dice.com/jobs/detail/Web-Designer-...,IAC Publishing,Full Time,Company Description We are searching for a ta...,Dice Id : 10112803,"Oakland, CA",Web Designer,3 weeks ago,Telecommuting not available|Travel not required,,"UI/UX mobile apps, interaction design, digital...",86e27ce6b7e631e55d69d142c7d43df2,Web Designer Company Description We are searc...,web designer company description searching tal...
21996,https://www.dice.com/jobs/detail/Senior-Front-...,Omega Solutions Inc,Full Time,CONTACT - priya@omegasolutioninc.com / 408-45...,Dice Id : 10289500,"San Francisco, CA",Senior Front End Web Developer - Full Time at ...,3 weeks ago,Telecommuting not available|Travel not required,,"JavaScript, HTML5, CSS3, Bootstrap, AJAX, Reac...",4287c7ee3317ccf1edd76e238cf8e584,Senior Front End Web Developer - Full Time at ...,senior front end web developer - full time vis...
21997,https://www.dice.com/jobs/detail/QA-Analyst-Sa...,San Francisco Health Plan,Full Time,Do you take pride in your work knowing that th...,Dice Id : 10115761,"San Francisco, CA",QA Analyst,2 weeks ago,Telecommuting not available|Travel not required,,"SDLC, ALM, SQL, T-SQL, RedGate, Team Foundatio...",d7512f0181d69f83f96db38cd77a4d08,QA Analyst Do you take pride in your work know...,qa analyst take pride work knowing thousand li...
21998,https://www.dice.com/jobs/detail/Tech-Lead%252...,IAC Publishing,Full Time,Company Description What We Can Offer YouAs th...,Dice Id : 10112803,"Oakland, CA",Tech Lead-Full Stack,2 weeks ago,Telecommuting not available|Travel not required,,"Python, Ruby, Go, Clojure, Java, NoSQL-Databas...",ec375268b494b3bcbed1635d64226112,Tech Lead-Full Stack Company Description What ...,tech lead-full stack company description offer...


In [189]:
course_info = course['Course Info'].apply(lambda x: ' '.join(x.split(',')))

# Tokenize the course info
course_tokens = course_info.apply(lambda x: word_tokenize(x))

# Remove stop words
course_tokens = course_tokens.apply(lambda x: [word for word in x if word.lower() not in stop_words and word.isalpha()])

# Lowercase the tokens
course_tokens = course_tokens.apply(lambda x: [word.lower() for word in x])

# Combine the tokens back to a string
course_info = course_tokens.apply(lambda x: ' '.join(x))

In [190]:
course_tokens[1]

['new',
 'ultimate',
 'aws',
 'certified',
 'cloud',
 'practitioner',
 'full',
 'practice',
 'exam',
 'included',
 'explanations',
 'learn',
 'cloud',
 'computing',
 'pass',
 'aws',
 'cloud',
 'practitioner',
 'exam']

In [191]:
course_info[1]

'new ultimate aws certified cloud practitioner full practice exam included explanations learn cloud computing pass aws cloud practitioner exam'

In [192]:
# Tokenize the job info
job_info_tokens = job['processed_text'].apply(lambda x: word_tokenize(x))

In [193]:
# Tokenize the user info
user_info_tokens = user['processed_text'].apply(lambda x: word_tokenize(x))

# Cosine Similarity and Tfidf

In [221]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity


tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(course_info)

skill_gap_vector = tfidf.transform(skill_gap)

similarity_scores = cosine_similarity(skill_gap_vector, tfidf_matrix)[0]


course['Similarity_TFidf'] = similarity_scores

top_n = 5
recommended_courses = course.sort_values(by='Similarity_TFidf', ascending=False).head(top_n)

print("Recommended courses based on skill gap:")
print(recommended_courses[['Course Title', 'Similarity_TFidf']])

Recommended courses based on skill gap:
                                            Course Title  Similarity_TFidf
4455   App Center: Continuous Integration and Deliver...          0.334742
12386        Create the user interface in Android Studio          0.295163
8936     Meta Android Developer Professional Certificate          0.278876
9400   Development of mobile applications with Androi...          0.261287
8331   Development of mobile applications with Androi...          0.260994


In [224]:
print("Top Course 1")
print(recommended_courses.iloc[0]['Course Title'])
print(recommended_courses.iloc[0]['Course Info'])
print()
print("Top Course 2")
print(recommended_courses.iloc[1]['Course Title'])
print(recommended_courses.iloc[1]['Course Info'])

Top Course 1
App Center: Continuous Integration and Delivery for iOS
App Center: Continuous Integration and Delivery for iOS
Automate your iOS development process

Top Course 2
Create the user interface in Android Studio
Using UI component libraries to create Android UI Composing UI using Kotlin UI views Creating a simple UI with the Design Editor Creating a basic Android UI, Creating UI with Jetpack Compose, Advanced UI with Jetpack Compose, UI styling, Final project Personal Development


# Cosine Similarity and Word2Vec

In [196]:
# Train a Word2Vec model on the course, job and user data
corpus = course_tokens.to_list() + job_info_tokens.to_list() + user_info_tokens.to_list()

model = gensim.models.Word2Vec(sentences=corpus, vector_size=200,
                               workers=4, sg=1, min_count=1, epochs=10, window=10)

# save the model
model.save("word2vec.model")

In [225]:
# Load the model
model = gensim.models.Word2Vec.load("word2vec.model")

In [227]:
# Get embeddings for the course tokens
course_embeddings = course_tokens.apply(lambda x: np.mean([model.wv[word] for word in x if word in model.wv], axis=0))

# Get embeddings for the skill gap
skill_gap_embedding = np.mean([model.wv[word] for word in filtered_tokens if word in model.wv], axis=0)

# Check if all elements in course_embeddings are np.arrays and have the same shape
# If not, replace with a zero vector of the same dimensionality as our Word2Vec vectors
dimensionality = 200
course_embeddings = course_embeddings.apply(lambda vec: np.zeros(dimensionality) if isinstance(vec, float) else vec)

# Ensure skill_gap_embedding is also correctly shaped. If it's empty or not the right form, replace it as well.
if isinstance(skill_gap_embedding, float):
    skill_gap_embedding = np.zeros(dimensionality)

# Convert the list of vectors to a 2D array
course_vectors = np.array(list(course_embeddings))

# Make sure skill_gap_embedding is also in the correct shape
skill_gap_embedding = np.reshape(skill_gap_embedding, (1, -1))

# Now, calculate cosine similarity
similarity_scores = cosine_similarity(skill_gap_embedding, course_vectors)[0]

# Continue with your existing logic
course['Similarity_Word2Vec'] = similarity_scores

top_n =5
recommended_courses = course.sort_values(by='Similarity_Word2Vec', ascending=False).head(top_n)
recommended_courses

Unnamed: 0,Course Title,Course Info,Course Rating,Course Duration,Course Level,Source,Similarity_TFidf,Similarity_Word2Vec,Similarity_BERT,Similarity_SBERT,Similarity_TFidf_norm,Similarity_word2vec_norm,Similarity_BERT_norm,Similarity_SBERT_norm,Combined_Similarity
8936,Meta Android Developer Professional Certificate,Gain the skills required for an entry-level ca...,4.7,224.0,Beginner,Coursera,0.278876,0.905553,0.704135,0.28754,0.833107,1.0,0.759504,0.699578,0.823047
13349,Meta iOS Developer Professional Certificate,Gain the skills required for an entry-level ca...,4.7,224.0,Beginner,Coursera,0.224659,0.896217,0.702645,0.330005,0.671142,0.984605,0.756807,0.767335,0.794972
8935,Meta iOS Developer Professional Certificate,Gain the skills required for an entry-level ca...,4.7,224.0,Beginner,Coursera,0.224804,0.894419,0.702041,0.327703,0.671576,0.98164,0.755713,0.763662,0.793148
8926,IBM Full-Stack JavaScript Developer Profession...,"Master the full-stack development languages, f...",4.8,200.0,Beginner,Coursera,0.041372,0.889117,0.761081,0.324575,0.123594,0.972898,0.862604,0.75867,0.679441
8931,IBM Front-End Developer Professional Certificate,Master the most up-to-date practical skills an...,4.6,120.0,Beginner,Coursera,0.100062,0.88795,0.709296,0.201478,0.298924,0.970975,0.768849,0.562256,0.650251


In [228]:
# print top 5 course titles and similarity scores
print("Recommended courses based on skill gap:")
print(recommended_courses[['Course Title', 'Similarity_Word2Vec']])

Recommended courses based on skill gap:
                                            Course Title  Similarity_Word2Vec
8936     Meta Android Developer Professional Certificate             0.905553
13349        Meta iOS Developer Professional Certificate             0.896217
8935         Meta iOS Developer Professional Certificate             0.894419
8926   IBM Full-Stack JavaScript Developer Profession...             0.889117
8931    IBM Front-End Developer Professional Certificate             0.887950


In [229]:
print("Top Course 1")
print(recommended_courses.iloc[0]['Course Title'])
print(recommended_courses.iloc[0]['Course Info'])
print()
print("Top Course 2")
print(recommended_courses.iloc[1]['Course Title'])
print(recommended_courses.iloc[1]['Course Info'])

Top Course 1
Meta Android Developer Professional Certificate
Gain the skills required for an entry-level career as an Android developer. Learn how to create applications for Android including how to build and manage the lifecycle of a mobile app using Android Studio.Learn coding in Kotlin and the programming fundamentals for how to create the user interface (UI) and best practices for design.   Create cross-platform mobile applications using React Native. Demonstrate your new skills by creating a job-ready portfolio you can show during interviews. Kotlin Playground, practice using and extending protocols., declare and initialize different types of variables, create arrays, create control flow patterns using conditionals and loops, Version Control, Github, Bash (Unix Shell), Web Development, Linux, React (Javascript Library), Application development, React, Mobile Development, Android Studio, •\tCreate simple JavaScript code, •\tCreate and manipulate objects and arrays, •\tWrite unit te

# Cosine Similarity and BERT
- Contextual Embeddings

In [None]:
from transformers import BertTokenizer, BertModel
import torch
from sklearn.metrics.pairwise import cosine_similarity

# Load pre-trained model tokenizer (vocabulary) and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

# Encode text
def encode(text):
    # Tokenize text and convert to input IDs, attention masks
    encoded_input = tokenizer(text, padding=True, truncation=True, return_tensors='pt')
    # Get the embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)
    # Take the mean of the sequence output (ignoring padding tokens)
    return model_output.last_hidden_state.mean(dim=1)

# Encode course info and skill gap
course_vector_bert = course_info.apply(encode)
skill_gap_embedding = encode(discrepancy)

# Calculate cosine similarity
course['Similarity_BERT'] = course_vector_bert.apply(lambda x: cosine_similarity(skill_gap_embedding, x)[0][0])

In [234]:
# Get top N recommended courses
top_n = 5
recommended_courses = course.sort_values(by='Similarity_BERT', ascending=False).head(top_n)
recommended_courses

Unnamed: 0,Course Title,Course Info,Course Rating,Course Duration,Course Level,Source,Similarity_TFidf,Similarity_Word2Vec,Similarity_BERT,Similarity_SBERT,Similarity_TFidf_norm,Similarity_word2vec_norm,Similarity_BERT_norm,Similarity_SBERT_norm,Combined_Similarity
4043,Microservices: Build GraphQL APIs with SpringB...,Microservices: Build GraphQL APIs with SpringB...,5.0,1.0,All Levels,Udemy,0.08511,0.810223,0.83697,0.184299,0.254255,0.842812,1.0,0.534847,0.657978
1150,"SERENITY BDD Framework for Selenium, Appium an...","SERENITY BDD Framework for Selenium, Appium an...",4.6,21.5,All Levels,Udemy,0.130932,0.825058,0.827499,0.316463,0.391144,0.867273,0.982853,0.745726,0.746749
2202,Building GraphQL APIs with Python: Beginner To...,Building GraphQL APIs with Python: Beginner To...,4.7,12.0,Beginner,Udemy,0.084725,0.825613,0.820184,0.209112,0.253105,0.868188,0.969609,0.574439,0.666335
1052,Selenium Python with Behave BDD(Basic + Advanc...,Selenium Python with Behave BDD(Basic + Advanc...,4.6,23.5,All Levels,Udemy,0.0,0.833675,0.817598,0.079516,0.0,0.881481,0.964927,0.367653,0.553515
3878,Mastering GitLab Building Continuous Integrati...,Mastering GitLab Building Continuous Integrati...,5.0,9.0,Beginner,Udemy,0.089267,0.788967,0.815874,0.24149,0.266676,0.807762,0.961806,0.626101,0.665586


In [235]:
# print top 5 course titles and similarity scores
print("Recommended courses based on skill gap:")
print(recommended_courses[['Course Title', 'Similarity_BERT']])

Recommended courses based on skill gap:
                                           Course Title  Similarity_BERT
4043  Microservices: Build GraphQL APIs with SpringB...         0.836970
1150  SERENITY BDD Framework for Selenium, Appium an...         0.827499
2202  Building GraphQL APIs with Python: Beginner To...         0.820184
1052  Selenium Python with Behave BDD(Basic + Advanc...         0.817598
3878  Mastering GitLab Building Continuous Integrati...         0.815874


In [236]:
print("Top Course 1")
print(recommended_courses.iloc[0]['Course Title'])
print(recommended_courses.iloc[0]['Course Info'])
print()
print("Top Course 2")
print(recommended_courses.iloc[1]['Course Title'])
print(recommended_courses.iloc[1]['Course Info'])

Top Course 1
Microservices: Build GraphQL APIs with SpringBoot 3 & JDK 17
Microservices: Build GraphQL APIs with SpringBoot 3 & JDK 17
Building Microservices with GraphQL APIs: Simple easy steps to effective schema modularization.

Top Course 2
SERENITY BDD Framework for Selenium, Appium and REST Assured
SERENITY BDD Framework for Selenium, Appium and REST Assured
Single Framework for UI, Mobile and REST APIs Testing, Integration with design patterns like PageObjects and CucumberBDD


# Cosine Similarity and SBERT

In [204]:
from sentence_transformers import SentenceTransformer

# Load the pre-trained SBERT model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode course info and skill gap
course_vector_SBERT = course_info.apply(model.encode)
skill_gap_embedding_SBERT = model.encode(skill_gap)

In [205]:
# check dimension of the embeddings
print("Course Embedding Dimension:", course_vector_SBERT[0].shape)
print("Skill Gap Embedding Dimension:", skill_gap_embedding_SBERT.shape)

Course Embedding Dimension: (384,)
Skill Gap Embedding Dimension: (1, 384)


In [238]:
# Calculate cosine similarity
course['Similarity_SBERT'] = course_vector_SBERT.apply(
    lambda x: cosine_similarity([x], skill_gap_embedding_SBERT)[0][0]
)
# Get top N recommended courses
top_n = 5
recommended_courses = course.sort_values(by='Similarity_SBERT', ascending=False).head(top_n)
recommended_courses

# print top 5 course titles and similarity scores
print("Recommended courses based on skill gap:")
print(recommended_courses[['Course Title', 'Similarity_SBERT']])

Recommended courses based on skill gap:
                                            Course Title  Similarity_SBERT
11824  Full Stack Web Development in Spanish Speciali...          0.475822
9480   Learning MEAN Stack by Building Real world App...          0.470527
4455   App Center: Continuous Integration and Deliver...          0.468848
8393   Learning MEAN Stack by Building Real world App...          0.468579
6240                                 Learn AngularJS 1.X          0.465272


In [239]:
print("Top Course 1")
print(recommended_courses.iloc[0]['Course Title'])
print(recommended_courses.iloc[0]['Course Info'])
print()
print("Top Course 2")
print(recommended_courses.iloc[1]['Course Title'])
print(recommended_courses.iloc[1]['Course Info'])

Top Course 1
Full Stack Web Development in Spanish Specialization
Design and implement a client-side web page with Bootstrap. Develop single-page applications (SPA) with Angular. Develop native cross-platform applications using NativeScript4. Develop support for server-side applications. Node.Js, Nativescript, Bootstrap, Mongodb, Angularjs Designing web pages with Bootstrap 4, Developing pages with Angular, Developing cross-platform mobile applications with Nativescript, Angular and Redux, Server-side development: NodeJS, Express and MongoDB Personal Development

Top Course 2
Learning MEAN Stack by Building Real world Application Specialization
Master Angular for frontend development; explore HTML, CSS, and JavaScript essentials for dynamic and interactive web applications. Build RESTful APIs using Node.js & Express; learn MongoDB for database interaction, advanced error handling, security, and testing. Integrate Angular, Node.js, and MongoDB to create a full-fledged MEAN stack applica

# Similarity Combination

In [255]:
# Get a combined similarity score by weighted averaging
course['Combined_Similarity'] = (
    1/4 * course['Similarity_TFidf'] +
    1/4 * course['Similarity_Word2Vec'] +
    1/4 * course['Similarity_BERT'] +
    1/4 * course['Similarity_SBERT']
)

# Get top N recommended courses
top_n = 5
recommended_courses = course.sort_values(by='Combined_Similarity', ascending=False).head(top_n)
recommended_courses

Unnamed: 0,Course Title,Course Info,Course Rating,Course Duration,Course Level,Source,Similarity_TFidf,Similarity_Word2Vec,Similarity_BERT,Similarity_SBERT,Combined_Similarity
4455,App Center: Continuous Integration and Deliver...,App Center: Continuous Integration and Deliver...,4.7,1.5,Beginner,Udemy,0.334742,0.790912,0.708951,0.468848,0.575863
9480,Learning MEAN Stack by Building Real world App...,Master Angular for frontend development; explo...,3.4,60.0,Intermediate,Coursera,0.104809,0.868639,0.812658,0.470527,0.564158
8393,Learning MEAN Stack by Building Real world App...,Master Angular for frontend development; explo...,3.4,60.0,Intermediate,Coursera,0.10475,0.865974,0.813434,0.468579,0.563184
12387,Working with data in Android,Review some of the most useful tools and packa...,0.0,24.0,Intermediate,Coursera,0.246948,0.87698,0.799968,0.319259,0.560789
12386,Create the user interface in Android Studio,Using UI component libraries to create Android...,0.0,32.0,Intermediate,Coursera,0.295163,0.839003,0.752677,0.350601,0.559361


In [256]:
# print top 5 course titles and similarity scores
print("Recommended courses based on skill gap:")
print(recommended_courses[['Course Title', 'Combined_Similarity']])

Recommended courses based on skill gap:
                                            Course Title  Combined_Similarity
4455   App Center: Continuous Integration and Deliver...             0.575863
9480   Learning MEAN Stack by Building Real world App...             0.564158
8393   Learning MEAN Stack by Building Real world App...             0.563184
12387                       Working with data in Android             0.560789
12386        Create the user interface in Android Studio             0.559361


In [253]:
print("Top Course 1")
print(recommended_courses.iloc[0])
print(recommended_courses.iloc[0]['Course Title'])
print(recommended_courses.iloc[0]['Course Info'])
print()
print("Top Course 2")
print(recommended_courses.iloc[1])
print(recommended_courses.iloc[1]['Course Title'])
print(recommended_courses.iloc[1]['Course Info'])

Top Course 1
Course Title           App Center: Continuous Integration and Deliver...
Course Info            App Center: Continuous Integration and Deliver...
Course Rating                                                        4.7
Course Duration                                                      1.5
Course Level                                                    Beginner
Source                                                             Udemy
Similarity_TFidf                                                0.334742
Similarity_Word2Vec                                             0.790912
Similarity_BERT                                                 0.708951
Similarity_SBERT                                                0.468848
Combined_Similarity                                             0.575863
Name: 4455, dtype: object
App Center: Continuous Integration and Delivery for iOS
App Center: Continuous Integration and Delivery for iOS
Automate your iOS development process

Top Cou