<a href="https://colab.research.google.com/github/Siliconvalley4uYouthProjects/SpringBoard-Swatcloud/blob/main/Recommendation_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [44]:
# Importing libraries

import pandas as pd
import numpy as np
import nltk
import regex as re
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.stem.wordnet import WordNetLemmatizer
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')
from pandas.core.common import SettingWithCopyWarning
import warnings
warnings.simplefilter(action="ignore", category=SettingWithCopyWarning)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [45]:
# Reading in the jobs

df = pd.read_csv('swe_marketing_jobs.csv',header=None,names=['Industry','Company','Job Title','Job Description'],skiprows=1)
df.dropna(inplace=True)
df.reset_index(inplace=True, drop=True)

In [46]:
df.head(3)

Unnamed: 0,Industry,Company,Job Title,Job Description
0,Software Engineering,Amazon,Senior Software Development Eng...,4 year professional software d...
1,Software Engineering,Amazon,Software Development Engineer -...,programming experience least o...
2,Software Engineering,Amazon,Software Development Engineer -...,bachelor degree computer scienc...


### Now we supply a new data point, and let the model output recommended job titles based on the calculated cosine similarity of this new data point and the existing job descriptions.

In [47]:
def top_x_recommendations(x,DataFrame,description):
  # Text Cleaning tasks
  # Removing new line characters
  description = description.replace('\n', ' ')
  # Removing special characters
  description = description.replace(r'[^\w\s]+', '')
  description = re.sub(r'[^a-zA-Z0-9]', ' ', description)
  # Converting the text to lowercase
  description = description.lower()
  # Removing empty leading and trailing spaces 
  description = description.strip()
  # Splitting each word
  description = description.split(' ')

  text = []
  lemmatizer = WordNetLemmatizer()
  for i in range(len(description)):
      if description[i] not in list(stopwords.words('english')):
        word = lemmatizer.lemmatize(description[i])
        word = ''.join(x for x in word if x.isalnum())
        text.append(word)
  description = text
  description = ' '.join(description)
  description = re.sub(' +', ' ', description)
  new_data_input = [description]

  # transform the new data point using CountVectorizer
  countVector = CountVectorizer(stop_words = 'english')
  countMatrix = countVector.fit_transform(DataFrame['Job Description'])
  new_data_transformed = countVector.transform(new_data_input)

  # calculate cosine similarities of the new data point with all of the job descriptions
  cosine_sim = cosine_similarity(new_data_transformed, countMatrix)

  # collect the top x recommendations
  top_x = pd.DataFrame(cosine_sim.T, columns=['Similarity Score']).sort_values(by='Similarity Score', ascending=False)[1:x+1]
  similarity = []
  for score in top_x['Similarity Score']:
    if score > 0.4:
      similarity.append('Very Strong Match')
    elif score > 0.35:
      similarity.append('Strong Match')
    elif score > 0.25:
      similarity.append('Good Match')
    else:
      similarity.append('Loose Match')
  top_x['Match Strength'] = similarity  
  
  top_x['Company'] = 'none'
  top_x['Job Title'] = 'none'  
  
  top_x = top_x.reset_index()
   
  for i in range(len(top_x.index)):
    job_index = top_x.iloc[i]['index']
    top_x['Company'][i] = DataFrame['Company'][job_index]
    top_x['Job Title'][i] = DataFrame['Job Title'][job_index]
    
  top_x = top_x[['index', 'Job Title', 'Company', 'Similarity Score', 'Match Strength']]
  top_x = top_x.rename(columns = {'Similarity Score': 'Similarity', 'Match Strength': 'Match'})

  # print out the top x job descriptions
  cols = ['Job Title', 'Company', 'Similarity', 'Match']
  print(top_x[cols])
  
  # print applicant's qualifications
  print("\nApplicant's qualifications: ", new_data_input[0], '\n' )
  print('Job Descriptions for the Recommended jobs:')

  # print job descriptions for the recommended jobs
  for j in range(len(top_x.index)):
    job_index = top_x.iloc[j]['index']
    print(j, '. ', 'Job Description:', DataFrame['Job Description'][job_index], '\n')

In [48]:
#pd.set_option('max_colwidth', 35)
pd.options.display.width = 0
top_x_recommendations(10,df,df['Job Description'].iloc[10])

                            Job Title Company  Similarity              Match
0  Software Development Engineer I...  Amazon    0.751005  Very Strong Match
1       Software Development Engineer  Amazon    0.745466  Very Strong Match
2  iOS Software Engineer, Amazon.c...  Amazon    0.745420  Very Strong Match
3       Software Development Engineer  Amazon    0.739808  Very Strong Match
4  Software Development Engineer I...  Amazon    0.737304  Very Strong Match
5  Software Development Engineer I...  Amazon    0.737304  Very Strong Match
6  Software Development Engineer (...  Amazon    0.733350  Very Strong Match
7       Software Development Engineer  Amazon    0.730831  Very Strong Match
8  SDE- AFT, Amazon Fulfillment Te...  Amazon    0.725731  Very Strong Match
9  Tech Lead / Senior Software Dev...  Amazon    0.722166  Very Strong Match

Applicant's qualifications:  candidate must bachelor computer science engineering related field equivalent experience8 year professional experience soft

**Testing Using a Non-Amazon Job**

In [49]:
#Now let's try to use an input from a non-Amazon job.
df.iloc[2514]

Industry                                Data Analysis
Company                                  Esolvit Inc.
Job Title                  Data Analyst/Report Writer
Job Description    performs complex journeylevel d...
Name: 2514, dtype: object

In [58]:
top_x_recommendations(10,df,df['Job Description'].iloc[2514])

                            Job Title                           Company  \
0          Data Analyst/Report Writer                      Esolvit Inc.   
1                    Sr. Data Analyst           Internal Data Resources   
2                        Data Analyst                   Exeter Hospital   
3                        Data Analyst  Pacific Southwest Container, LLC   
4                    Business Analyst                            Amazon   
5                    Business Analyst                            Amazon   
6  Youth Apprentice, Data Analytic...            Amazon Advertising LLC   
7               Research Data Analyst          Johns Hopkins University   
8     Data Engineer, Factory Software                             Tesla   
9                    SQL Data Analyst                  The Ternio Group   

   Similarity              Match  
0    1.000000  Very Strong Match  
1    0.722599  Very Strong Match  
2    0.630176  Very Strong Match  
3    0.589091  Very Strong Match  

In [51]:
df.iloc[2655]

Industry                         Technology (Non-SWE)
Company                                         Cisco
Job Title                    NOC Incident Coordinator
Job Description    experience operating hightransa...
Name: 2655, dtype: object

In [52]:
top_x_recommendations(10,df,df['Job Description'].iloc[2662])

                            Job Title Company  Similarity              Match
0   AppD Principal Software Architect   Cisco    1.000000  Very Strong Match
1          Software Engineer (Python)   Cisco    0.423732  Very Strong Match
2          Software Engineer (Python)   Cisco    0.423732  Very Strong Match
3  Leader, Software Engineering Co...   Cisco    0.421329  Very Strong Match
4  Leader, Software Engineering Co...   Cisco    0.421329  Very Strong Match
5  Support Engineer (.Net, AWS, Ku...   Cisco    0.415647  Very Strong Match
6  Support Engineer (.Net, AWS, Ku...   Cisco    0.415647  Very Strong Match
7  Senior Systems Software Enginee...   Cisco    0.412750  Very Strong Match
8  Senior Systems Software Enginee...   Cisco    0.412750  Very Strong Match
9  Senior Software Engineer, Accou...   Cisco    0.411633  Very Strong Match

Applicant's qualifications:  u appdynamics application performance monitoring solution provides realtime visibility insight application environment uniq

**Testing using sampled resumes**

Candidate #1: Software Engineer @ Microsoft with prior experience as a team lead for engineers working in sales analytics

In [53]:

top_x_recommendations(10,df,'Azure Kubernetes Service (AKS) team - Built a web server in Golang and wrote unit tests. Designed and deployed it with a microservices architecture, leveraging Kubernetes communication patterns. - Containerized the web server using a multistage Docker build process and packaged it into a Helm Chart. Set up a CI/CD pipeline to automatically test and deploy application to an AKS cluster. - Created a logging and metrics infrastructure, using Azure Log Analytics, Prometheus, and Grafana to monitor application behavior and system health. Led a team of engineers and product managers to shape the evolution of Dropbox’s central sales analytics tool. While leading the team, we: - Increased adoption of the platform by 2.4x in 2020. - Built a PowerPoint generation engine to deliver customized, data-driven sales materials at scale, enhancing productivity of the sales organization by ~500 hours per quarter. - Advised global sales teams on strategic customers through bespoke analytics. Designed a scalable model that resulted in a 10% increase in ARR influenced each quarter.')

                            Job Title Company  Similarity       Match
0     Account Manager 1, Inside Sales    Dell    0.287772  Good Match
1         Leader Software Development   Cisco    0.273297  Good Match
2         Leader Software Development   Cisco    0.273297  Good Match
3  Software Engineering Manager, C...   Cisco    0.270210  Good Match
4  Software Engineering Manager, C...   Cisco    0.270210  Good Match
5  Senior Software Development Eng...   Cisco    0.263937  Good Match
6  Senior Software Development Eng...   Cisco    0.263937  Good Match
7          Frontend Software Engineer  Spruce    0.259763  Good Match
8   AppD Software Engineering Manager   Cisco    0.258377  Good Match
9   AppD Software Engineering Manager   Cisco    0.258377  Good Match

Applicant's qualifications:  azure kubernetes service ak team built web server golang wrote unit test designed deployed microservices architecture leveraging kubernetes communication pattern containerized web server using multist

Candidate #2: Software Engineer @ Amazon with academic background in computer science and data science

In [54]:
top_x_recommendations(10,df,'-Worked on the backend team to develop the Edtera web application, a learning engagement platform, using the Java Spring MVC Framework -Implemented data access layer using Spring Data JPA to allow various CRUD services to Edtera’s PostgreSQL database -Developed a performance tracker using Spring RestTemplate to retrieve students’ enrollment and grade data from third party learning management systems and configured RestTemplate Interceptor to reduce redundancy in the code -Built RESTful services to publish data by creating Rest Controllers, such as grades, course, enrollment information, etc. -Developed a high-performance laser health monitoring program with Python, which was highly recognized by the course instructor and project sponsor and selected for exhibition at the Department Senior Design Day -Implemented Random Forest Regression using Scikit-Learn library to predict laser survival rate, achieving MAPE of 12% -Created an interactive data visualization web application with Python Dash framework for explorative analysis -Designed a feature engineering procedure to sum the time series data and convert it to a supervised-learning problem')

                            Job Title                           Company  \
0                        Data Analyst                   Exeter Hospital   
1      Machine Learning/Data Engineer                             Rosen   
2     Data Engineer, Factory Software                             Tesla   
3                        Data Analyst  Pacific Southwest Container, LLC   
4  Youth Apprentice, Data Analytic...            Amazon Advertising LLC   
5                    Business Analyst                            Amazon   
6                    Business Analyst                            Amazon   
7  Data Engineer, Quality Data Ana...                              Tela   
8          Data Analyst/Report Writer                Rose International   
9          Data Analyst/Report Writer                      Esolvit Inc.   

   Similarity              Match  
0    0.430846  Very Strong Match  
1    0.419233  Very Strong Match  
2    0.418190  Very Strong Match  
3    0.412346  Very Strong Match  

Candidate #3: Senior Marketing Analytics Manager @ Rippling with extensive work history as a marketing data analyst

In [55]:
top_x_recommendations(10,df,'1. Create measurement framework across different funnel stages (TOF, MOF, and BOF) and marketing channels & campaigns: 1) identify KPIs (primary & secondary) and 2) define leading indicators 2. Build reporting foundation and consistently report on 1) actuals against goals by segment, marketing channel, campaign, and product and 2) trend on performance across acquisition and cross-sell motions. Identify gaps on data tracking, data connection, and reporting infrastructure and implement solutions. 3. Develop a framework to measure channel and campaign effectiveness and efficiency through attribution (FT, LT, & MT) and incrementality (MMM & geo-based experiments). Marketing channels include Paid Social (LinkedIn & Facebook & YouTube), Paid & Organic Search, Review Sites, and OOH 4. Hire and grow a Marketing Analytics team, mentor and coach analyst(s) to deliver high-quality work 1. Acquisition Marketing Channel and Campaign Analysis - Measure acquisition marketing campaign performance through Geo experiments, time series models, and platform lift studies; Channels include brand media, such as TV, OTT, Streaming Audio, & Podcast, and OA media, such as Display, OLV, Paid Social, Paid Search, and Affiliates - Design and measure channel incrementality tests (PPC Brand & OTT/CTV) and other tests, such as bid algorithm test and landing page test, across different product segments - Provide on-going channel and campaign performance analysis via prediction models, pre-post analysis, and A/B tests; Create channel performance dashboard (Holistic Search) to inform efficiency - Provide insights and recommendations on channel and campaign performance to marketing stakeholders and the leadership team based on the test and analysis results 2. Product Performance Analysis - Analytics lead on weekly marketing acquisition performance across all channels (Brand and OA) for Quickbooks Online product. - Work with finance, marketing, sales, and other analytics teams to identify performance root causes. - Define full-funnel metrics to measure marketing acquisition performance, such as brand metrics, QBO brand and industry search demand, traffic and conversion rate, sales, and CPA. 1. CRM/Database Marketing Management and Strategies - CRM database and email & direct mail channels owner and work with sales team for outbound call campaigns. Develop, optimize, monitor, and execute database marketing campaigns including re-marketing, cross-sell, acquisition, and win-back. - Analyze CRM customer & campaign data on different segments to improve overall DBM campaign performance. Auto re-marketing campaign sales contribute 13.5% of total company sales in 2016. - Provide on-going analysis on CRM data to define targeting and segmentation strategies for marketing acquisition campaigns. Design A/B and multivariate tests and define KPIs to measure test success. - Oversee design, development, and maintenance of a CRM database. Work with a cross-functional team (product, analytics, development, and BI) to define database and data integration requirements. - Collaborate with data science, sales, BI, and IT teams to build and implement models used for segmentation and targeting strategies to increase customer lifetime value and campaign ROI. - Provide insights and recommendations to the senior management team to influence decision making. 2. Marketing Analytics - Lead all aspects of Marketing Analytics, including campaign analysis, reporting, and predictive modeling. Marketing channels include email, SEM, affiliates, paid social, and sponsorship. - Provide on-going analysis on CRM data to inform targeting and segmentation strategies for marketing acquisition campaigns. Design tests and define KPIs to measure campaign and test success. - Analyze cost, engagement, and sales data to propose U.S. budget re-allocation recommendations across marketing channels to increase conversion, ROI, and improve customer mix. - Set KPIs and lead efforts to provide and consolidate analysis and reporting for website testing. - Provide insights and recommendations to the senior management team to influence decision making.')

                            Job Title Company  Similarity              Match
0  Analyst, Consumer Behavior, Dat...  NASCAR    0.488839  Very Strong Match
1  Principal Growth & Demand Gen M...   Cisco    0.434763  Very Strong Match
2  Principal Growth & Demand Gen M...   Cisco    0.434763  Very Strong Match
3       Functional Marketing II #0000  Amazon    0.389761       Strong Match
4  Principal Growth & Demand Gen M...   Cisco    0.368751       Strong Match
5  Principal Growth & Demand Gen M...   Cisco    0.368751       Strong Match
6  Principal Growth & Demand Gen M...   Cisco    0.368386       Strong Match
7  Principal Growth & Demand Gen M...   Cisco    0.368386       Strong Match
8  Integrated Marketing Manager - ...    Meta    0.366217       Strong Match
9  Senior Marketing Manager (Mater...  Amazon    0.359683       Strong Match

Applicant's qualifications:  1 create measurement framework across different funnel stage tof mof bof marketing channel campaign 1 identify kpis primary

Candidate #4: Product Manager @ Meta with prior experience as a consultant at Deloitte Digital

In [56]:
top_x_recommendations(10,df,'Product Manager for Strategic Transformation Tool: Led a nine-person product team to design, build, and launch StrategyAccelerator® - a single, customizable digital platform to help companies drive ideation to implementation across their business strategy life cycle, resulting in 17 client wins and $xxM revenue in direct asset fees and services for Deloitte Product and Monetization Strategy for a Pharmaceutical Company: Managed a five-person team to develop and prioritize 42 use cases and monetization opportunities of facial and retinal scans towards disruptive applications of AI and ML for disease detection and treatment targeting direct-to-consumer opportunities Marketing Strategy, Operations, and Annual Planning for $9B Cloud Technology Company: Led a five-person team working directly with the CMO and SVPs to spearhead annual strategic planning, and successfully secured ~$600M, the highest % of investment allocation for the marketing organization thus far from the C-suite eCommerce Partnership Strategy for $600B Global Social Media Company: Led leadership strategy workshops engaging cross-functionally with sales, product development, and privacy compliance to identify target large and mid-size retailers for ecommerce offering expansion, and develop a GTM plan for ~60M potential buyers Customer Experience and Journey Mapping for Fortune 500 Financial Services Company: Designed product roadmap to improve customer satisfaction and retention; directed ethnographic research leading to insights on key customer behaviors via quantitative surveys and interviews to understand customer journeys and shape new service offerings')

                            Job Title Company  Similarity         Match
0  Cisco Meraki - Director of Prod...   Cisco    0.396238  Strong Match
1  Sr. Product Manager, UCS Networ...   Cisco    0.384502  Strong Match
2  Sr. Product Manager, UCS Networ...   Cisco    0.384502  Strong Match
3           Sr. Associate, Salesforce    KPMG    0.364361  Strong Match
4   Database Go-To-Market Practice...  Google    0.359623  Strong Match
5       Program/Project Manager Webex   Cisco    0.352488  Strong Match
6       Program/Project Manager Webex   Cisco    0.352488  Strong Match
7  Principal Product Designer, Pro...   Cisco    0.351922  Strong Match
8  Principal Product Designer, Pro...   Cisco    0.351922  Strong Match
9  Principal Technical Product Mar...   Cisco    0.344719    Good Match

Applicant's qualifications:  product manager strategic transformation tool led nine person product team design build launch strategyaccelerator single customizable digital platform help company drive ideatio

Candidate #5: Investment Banker @ Brookwood Associates. No background in tech - purely a finance person.

In [57]:
print(top_x_recommendations(10,df,'- Represented CAIRE on its second U.S. acquisition of MGC Diagnostics, a manufacturer of cardiorespiratory diagnostics systems - Represented provider of transactional communications solutions on its sale to Doxim, a portfolio company of GI Partners - Represented CAIRE, a subsidiary of NGK Spark Plugs, on its acquisition of an e-commerce seller of portable oxygen concentrators and other respiratory products - Represented manufacturer of advanced composite materials on its recapitalization - Represented provider of water and wastewater infrastructure services on its sale to Sciens Water - Represented provider of center-based, home-based, and school-based behavioral healthcare services for individuals with autism spectrum disorders on its sale to LEARN Behavioral, a portfolio company of Gryphon Investors - Represented provider of urgent care services on its merger with CRH Healthcare - Represented distributor of industrial air compressors and compressed air automation systems on its recapitalization - Represented manufacturer of domestic and imported hardwood lumber on the divestiture of its distribution business to the Rugby Architectural Building Products division of Hardwoods Distribution Inc. (TSX:HDI) - Represented vertically integrated manufacturer of technical and performance fabrics on its sale to Milliken & Company - Represented manufacturer of wakeboard towers and accessories and custom-patterned boat covers on its sale to a financial sponsor - Represented specialty mattress retailer with 130+ locations in the Midwest and Southeast on its sale to Mattress Firm'))

                            Job Title  Company  Similarity        Match
0  Senior Manager I, Advertising S...  Walmart    0.210477  Loose Match
1  Partner Manager, Food (Senior M...  Walmart    0.210477  Loose Match
2  Senior Manager I, Advertising S...  Walmart    0.210477  Loose Match
3  Senior Manager I, Advertising S...  Walmart    0.210477  Loose Match
4  Senior Manager I, Advertising S...  Walmart    0.210477  Loose Match
5  Partner Manager - Sam's Club Me...  Walmart    0.210477  Loose Match
6  (USA) Senior Manager I, Adverti...  Walmart    0.210477  Loose Match
7  Senior Manager I, Advertising S...  Walmart    0.210477  Loose Match
8  Senior Manager I, Advertising S...  Walmart    0.210477  Loose Match
9    Partner Manager, Walmart Connect  Walmart    0.210477  Loose Match

Applicant's qualifications:  represented caire second u acquisition mgc diagnostics manufacturer cardiorespiratory diagnostics system represented provider transactional communication solution sale doxim port