<a href="https://colab.research.google.com/github/Siliconvalley4uYouthProjects/SpringBoard-Swatcloud/blob/main/Recommendation_system_content_based_CV_without_wanted_unwanted_keywords.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [13]:
# Importing libraries

import pandas as pd
import numpy as np
import nltk
import regex as re
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from nltk.stem.wordnet import WordNetLemmatizer
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

In [7]:
# Reading in the jobs

df = pd.read_csv('swe_marketing_jobs.csv',header=None,names=['Industry','Company','Job Title','Job Description'],skiprows=1)
df.dropna(inplace=True)
df.reset_index(inplace=True, drop=True)

In [8]:
df.head()

Unnamed: 0,Industry,Company,Job Title,Job Description
0,Software Engineering,Amazon,Senior Software Development Engineer,4 year professional software development expe...
1,Software Engineering,Amazon,Software Development Engineer - Payments,programming experience least one modern langu...
2,Software Engineering,Amazon,Software Development Engineer - Fintech,bachelor degree computer science related field...
3,Software Engineering,Amazon,Software Development Engineer,1 year experience contributing system design a...
4,Software Engineering,Amazon,"Embedded Software Development Engineer, Satell...",1 year experience contributing system design a...


### Now we supply a new data point, and let the model output recommended job titles based on the calculated cosine similarity of this new data point and the existing job descriptions.

In [9]:
def top_x_recommendations(x,DataFrame,description):
  # Text Cleaning tasks
  # Removing new line characters
  description = description.replace('\n', ' ')
  # Removing special characters
  description = description.replace(r'[^\w\s]+', '')
  description = re.sub(r'[^a-zA-Z0-9]', ' ', description)
  # Converting the text to lowercase
  description = description.lower()
  # Removing empty leading and trailing spaces 
  description = description.strip()
  # Splitting each word
  description = description.split(' ')

  text = []
  lemmatizer = WordNetLemmatizer()
  for i in range(len(description)):
      if description[i] not in list(stopwords.words('english')):
        word = lemmatizer.lemmatize(description[i])
        word = ''.join(x for x in word if x.isalnum())
        text.append(word)
  description = text
  description = ' '.join(description)
  description = re.sub(' +', ' ', description)
  new_data_input = [description]

  # transform the new data point using CountVectorizer
  countVector = CountVectorizer(stop_words = 'english')
  countMatrix = countVector.fit_transform(DataFrame['Job Description'])
  new_data_transformed = countVector.transform(new_data_input)

  # calculate cosine similarities of the new data point with all of the job descriptions
  cosine_sim = cosine_similarity(new_data_transformed, countMatrix)

  # collect the top x recommendations
  top_x = pd.DataFrame(cosine_sim.T, columns=['Similarity Score']).sort_values(by='Similarity Score', ascending=False)[1:x+1]
  similarity = []
  for score in top_x['Similarity Score']:
    if score > 0.4:
      similarity.append('Very Strong Match')
    elif score > 0.35:
      similarity.append('Strong Match')
    elif score > 0.25:
      similarity.append('Good Match')
    else:
      similarity.append('Loose Match')
  top_x['Match Strength'] = similarity
  top_x = top_x.reset_index()
  print(top_x)

  # print out the top x job descriptions
  print("\nApplicant's qualifications: ", new_data_input[0], '\n' )
  print('Recommended jobs:')
  for i in range(len(top_x['index'])):
    print('\nJob Title: ', DataFrame['Job Title'][top_x.iloc[i]['index']])
    print('Company:', DataFrame['Company'][top_x.iloc[i]['index']])
    print('Job Description:', DataFrame['Job Description'][top_x.iloc[i]['index']])
    match_strength = top_x['Match Strength'][i]
    print('Match Strength:', match_strength)
  

In [14]:
top_x_recommendations(10,df,df['Job Description'].iloc[10])

   index  Similarity Score     Match Strength
0    160          0.751005  Very Strong Match
1     63          0.745466  Very Strong Match
2    207          0.745420  Very Strong Match
3     25          0.739808  Very Strong Match
4    165          0.737304  Very Strong Match
5   3607          0.737304  Very Strong Match
6    220          0.733350  Very Strong Match
7     15          0.730831  Very Strong Match
8    417          0.725731  Very Strong Match
9    242          0.722166  Very Strong Match

Applicant's qualifications:  candidate must bachelor computer science engineering related field equivalent experience8 year professional experience software developmentexperience contributing architecture design architecture design pattern reliability scaling new current system industry experience architecting designing scalable system interact multiple system designed expansion business growspossess extremely sound understanding basic area computer science algorithm data structure object

**Testing Using a Non-Amazon Job**

In [15]:
#Now let's try to use an input from a non-Amazon job.
df.iloc[2514]

Industry                                               Data Analysis
Company                                                 Esolvit Inc.
Job Title                                 Data Analyst/Report Writer
Job Description    performs complex journeylevel data analysis da...
Name: 2514, dtype: object

In [16]:
top_x_recommendations(10,df,df['Job Description'].iloc[2514])

   index  Similarity Score     Match Strength
0   2514          1.000000  Very Strong Match
1   2530          0.722599  Very Strong Match
2   2511          0.630176  Very Strong Match
3   2495          0.589091  Very Strong Match
4    477          0.556532  Very Strong Match
5   3461          0.556532  Very Strong Match
6   2503          0.555492  Very Strong Match
7   2522          0.553509  Very Strong Match
8   2467          0.549713  Very Strong Match
9   2519          0.536577  Very Strong Match

Applicant's qualifications:  performs complex journeylevel data analysis data research work work involves conducting detailed analysis extensive research data providing result monitoring implementing data quality develop implement customercentric metric reporting support performance quality improvement effort help streamline process data analysis visualization manage analysis data embrace evaluative thinking includes posing thoughtful question data getting feedback key stakeholder using d

In [17]:
df.iloc[2655]

Industry                                        Technology (Non-SWE)
Company                                                        Cisco
Job Title                                   NOC Incident Coordinator
Job Description    experience operating hightransactional 24x7 pr...
Name: 2655, dtype: object

In [18]:
top_x_recommendations(10,df,df['Job Description'].iloc[2662])

   index  Similarity Score     Match Strength
0   7226          1.000000  Very Strong Match
1   2662          1.000000  Very Strong Match
2   2795          0.423732  Very Strong Match
3   7359          0.423732  Very Strong Match
4   5506          0.423732  Very Strong Match
5   7279          0.421329  Very Strong Match
6   5426          0.421329  Very Strong Match
7   2715          0.421329  Very Strong Match
8   7204          0.415647  Very Strong Match
9   2640          0.415647  Very Strong Match

Applicant's qualifications:  u appdynamics application performance monitoring solution provides realtime visibility insight application environment unique solution take right action precisely right time automated anomaly detection rapid rootcause analysis unified view entire application ecosystem including private public cloud using appdynamics youll finally align devops engineering business around information help protect bottom line deliver magnificent customer experience responsibility

**Testing using sampled resumes**

Candidate #1: Software Engineer @ Microsoft with prior experience as a team lead for engineers working in sales analytics

In [19]:

top_x_recommendations(10,df,'Azure Kubernetes Service (AKS) team - Built a web server in Golang and wrote unit tests. Designed and deployed it with a microservices architecture, leveraging Kubernetes communication patterns. - Containerized the web server using a multistage Docker build process and packaged it into a Helm Chart. Set up a CI/CD pipeline to automatically test and deploy application to an AKS cluster. - Created a logging and metrics infrastructure, using Azure Log Analytics, Prometheus, and Grafana to monitor application behavior and system health. Led a team of engineers and product managers to shape the evolution of Dropbox’s central sales analytics tool. While leading the team, we: - Increased adoption of the platform by 2.4x in 2020. - Built a PowerPoint generation engine to deliver customized, data-driven sales materials at scale, enhancing productivity of the sales organization by ~500 hours per quarter. - Advised global sales teams on strategic customers through bespoke analytics. Designed a scalable model that resulted in a 10% increase in ARR influenced each quarter.')

   index  Similarity Score Match Strength
0   6060          0.287772     Good Match
1   6059          0.287772     Good Match
2   4207          0.287772     Good Match
3   5401          0.273297     Good Match
4   2690          0.273297     Good Match
5   7254          0.273297     Good Match
6   2938          0.270210     Good Match
7   7502          0.270210     Good Match
8   5649          0.270210     Good Match
9   2848          0.263937     Good Match

Applicant's qualifications:  azure kubernetes service ak team built web server golang wrote unit test designed deployed microservices architecture leveraging kubernetes communication pattern containerized web server using multistage docker build process packaged helm chart set ci cd pipeline automatically test deploy application ak cluster created logging metric infrastructure using azure log analytics prometheus grafana monitor application behavior system health led team engineer product manager shape evolution dropbox central sal

Candidate #2: Software Engineer @ Amazon with academic background in computer science and data science

In [20]:
top_x_recommendations(10,df,'-Worked on the backend team to develop the Edtera web application, a learning engagement platform, using the Java Spring MVC Framework -Implemented data access layer using Spring Data JPA to allow various CRUD services to Edtera’s PostgreSQL database -Developed a performance tracker using Spring RestTemplate to retrieve students’ enrollment and grade data from third party learning management systems and configured RestTemplate Interceptor to reduce redundancy in the code -Built RESTful services to publish data by creating Rest Controllers, such as grades, course, enrollment information, etc. -Developed a high-performance laser health monitoring program with Python, which was highly recognized by the course instructor and project sponsor and selected for exhibition at the Department Senior Design Day -Implemented Random Forest Regression using Scikit-Learn library to predict laser survival rate, achieving MAPE of 12% -Created an interactive data visualization web application with Python Dash framework for explorative analysis -Designed a feature engineering procedure to sum the time series data and convert it to a supervised-learning problem')

   index  Similarity Score     Match Strength
0   2511          0.430846  Very Strong Match
1   2352          0.419233  Very Strong Match
2   2467          0.418190  Very Strong Match
3   2495          0.412346  Very Strong Match
4   2503          0.398647       Strong Match
5    477          0.389556       Strong Match
6   3461          0.389556       Strong Match
7   3687          0.376542       Strong Match
8   2514          0.368227       Strong Match
9   2532          0.368227       Strong Match

Applicant's qualifications:  worked backend team develop edtera web application learning engagement platform using java spring mvc framework implemented data access layer using spring data jpa allow various crud service edtera postgresql database developed performance tracker using spring resttemplate retrieve student enrollment grade data third party learning management system configured resttemplate interceptor reduce redundancy code built restful service publish data creating rest cont

Candidate #3: Senior Marketing Analytics Manager @ Rippling with extensive work history as a marketing data analyst

In [21]:
top_x_recommendations(10,df,'1. Create measurement framework across different funnel stages (TOF, MOF, and BOF) and marketing channels & campaigns: 1) identify KPIs (primary & secondary) and 2) define leading indicators 2. Build reporting foundation and consistently report on 1) actuals against goals by segment, marketing channel, campaign, and product and 2) trend on performance across acquisition and cross-sell motions. Identify gaps on data tracking, data connection, and reporting infrastructure and implement solutions. 3. Develop a framework to measure channel and campaign effectiveness and efficiency through attribution (FT, LT, & MT) and incrementality (MMM & geo-based experiments). Marketing channels include Paid Social (LinkedIn & Facebook & YouTube), Paid & Organic Search, Review Sites, and OOH 4. Hire and grow a Marketing Analytics team, mentor and coach analyst(s) to deliver high-quality work 1. Acquisition Marketing Channel and Campaign Analysis - Measure acquisition marketing campaign performance through Geo experiments, time series models, and platform lift studies; Channels include brand media, such as TV, OTT, Streaming Audio, & Podcast, and OA media, such as Display, OLV, Paid Social, Paid Search, and Affiliates - Design and measure channel incrementality tests (PPC Brand & OTT/CTV) and other tests, such as bid algorithm test and landing page test, across different product segments - Provide on-going channel and campaign performance analysis via prediction models, pre-post analysis, and A/B tests; Create channel performance dashboard (Holistic Search) to inform efficiency - Provide insights and recommendations on channel and campaign performance to marketing stakeholders and the leadership team based on the test and analysis results 2. Product Performance Analysis - Analytics lead on weekly marketing acquisition performance across all channels (Brand and OA) for Quickbooks Online product. - Work with finance, marketing, sales, and other analytics teams to identify performance root causes. - Define full-funnel metrics to measure marketing acquisition performance, such as brand metrics, QBO brand and industry search demand, traffic and conversion rate, sales, and CPA. 1. CRM/Database Marketing Management and Strategies - CRM database and email & direct mail channels owner and work with sales team for outbound call campaigns. Develop, optimize, monitor, and execute database marketing campaigns including re-marketing, cross-sell, acquisition, and win-back. - Analyze CRM customer & campaign data on different segments to improve overall DBM campaign performance. Auto re-marketing campaign sales contribute 13.5% of total company sales in 2016. - Provide on-going analysis on CRM data to define targeting and segmentation strategies for marketing acquisition campaigns. Design A/B and multivariate tests and define KPIs to measure test success. - Oversee design, development, and maintenance of a CRM database. Work with a cross-functional team (product, analytics, development, and BI) to define database and data integration requirements. - Collaborate with data science, sales, BI, and IT teams to build and implement models used for segmentation and targeting strategies to increase customer lifetime value and campaign ROI. - Provide insights and recommendations to the senior management team to influence decision making. 2. Marketing Analytics - Lead all aspects of Marketing Analytics, including campaign analysis, reporting, and predictive modeling. Marketing channels include email, SEM, affiliates, paid social, and sponsorship. - Provide on-going analysis on CRM data to inform targeting and segmentation strategies for marketing acquisition campaigns. Design tests and define KPIs to measure campaign and test success. - Analyze cost, engagement, and sales data to propose U.S. budget re-allocation recommendations across marketing channels to increase conversion, ROI, and improve customer mix. - Set KPIs and lead efforts to provide and consolidate analysis and reporting for website testing. - Provide insights and recommendations to the senior management team to influence decision making.')

   index  Similarity Score     Match Strength
0   4233          0.542013  Very Strong Match
1   2507          0.488839  Very Strong Match
2   7402          0.434763  Very Strong Match
3   2838          0.434763  Very Strong Match
4   5549          0.434763  Very Strong Match
5   6250          0.389761       Strong Match
6   4397          0.389761       Strong Match
7   2836          0.368751       Strong Match
8   7400          0.368751       Strong Match
9   5547          0.368751       Strong Match

Applicant's qualifications:  1 create measurement framework across different funnel stage tof mof bof marketing channel campaign 1 identify kpis primary secondary 2 define leading indicator 2 build reporting foundation consistently report 1 actuals goal segment marketing channel campaign product 2 trend performance across acquisition cross sell motion identify gap data tracking data connection reporting infrastructure implement solution 3 develop framework measure channel campaign effecti

Candidate #4: Product Manager @ Meta with prior experience as a consultant at Deloitte Digital

In [22]:
top_x_recommendations(10,df,'Product Manager for Strategic Transformation Tool: Led a nine-person product team to design, build, and launch StrategyAccelerator® - a single, customizable digital platform to help companies drive ideation to implementation across their business strategy life cycle, resulting in 17 client wins and $xxM revenue in direct asset fees and services for Deloitte Product and Monetization Strategy for a Pharmaceutical Company: Managed a five-person team to develop and prioritize 42 use cases and monetization opportunities of facial and retinal scans towards disruptive applications of AI and ML for disease detection and treatment targeting direct-to-consumer opportunities Marketing Strategy, Operations, and Annual Planning for $9B Cloud Technology Company: Led a five-person team working directly with the CMO and SVPs to spearhead annual strategic planning, and successfully secured ~$600M, the highest % of investment allocation for the marketing organization thus far from the C-suite eCommerce Partnership Strategy for $600B Global Social Media Company: Led leadership strategy workshops engaging cross-functionally with sales, product development, and privacy compliance to identify target large and mid-size retailers for ecommerce offering expansion, and develop a GTM plan for ~60M potential buyers Customer Experience and Journey Mapping for Fortune 500 Financial Services Company: Designed product roadmap to improve customer satisfaction and retention; directed ethnographic research leading to insights on key customer behaviors via quantitative surveys and interviews to understand customer journeys and shape new service offerings')

   index  Similarity Score Match Strength
0   2617          0.396238   Strong Match
1   7181          0.396238   Strong Match
2   7404          0.384502   Strong Match
3   2840          0.384502   Strong Match
4   5551          0.384502   Strong Match
5   3365          0.364361   Strong Match
6   6118          0.359623   Strong Match
7   4265          0.359623   Strong Match
8   2932          0.352488   Strong Match
9   7496          0.352488   Strong Match

Applicant's qualifications:  product manager strategic transformation tool led nine person product team design build launch strategyaccelerator single customizable digital platform help company drive ideation implementation across business strategy life cycle resulting 17 client win xxm revenue direct asset fee service deloitte product monetization strategy pharmaceutical company managed five person team develop prioritize 42 use case monetization opportunity facial retinal scan towards disruptive application ai ml disease detectio

Candidate #5: Investment Banker @ Brookwood Associates. No background in tech - purely a finance person.

In [23]:
top_x_recommendations(10,df,'- Represented CAIRE on its second U.S. acquisition of MGC Diagnostics, a manufacturer of cardiorespiratory diagnostics systems - Represented provider of transactional communications solutions on its sale to Doxim, a portfolio company of GI Partners - Represented CAIRE, a subsidiary of NGK Spark Plugs, on its acquisition of an e-commerce seller of portable oxygen concentrators and other respiratory products - Represented manufacturer of advanced composite materials on its recapitalization - Represented provider of water and wastewater infrastructure services on its sale to Sciens Water - Represented provider of center-based, home-based, and school-based behavioral healthcare services for individuals with autism spectrum disorders on its sale to LEARN Behavioral, a portfolio company of Gryphon Investors - Represented provider of urgent care services on its merger with CRH Healthcare - Represented distributor of industrial air compressors and compressed air automation systems on its recapitalization - Represented manufacturer of domestic and imported hardwood lumber on the divestiture of its distribution business to the Rugby Architectural Building Products division of Hardwoods Distribution Inc. (TSX:HDI) - Represented vertically integrated manufacturer of technical and performance fabrics on its sale to Milliken & Company - Represented manufacturer of wakeboard towers and accessories and custom-patterned boat covers on its sale to a financial sponsor - Represented specialty mattress retailer with 130+ locations in the Midwest and Southeast on its sale to Mattress Firm')

   index  Similarity Score Match Strength
0   5731          0.210477    Loose Match
1   5733          0.210477    Loose Match
2   3878          0.210477    Loose Match
3   3876          0.210477    Loose Match
4   3875          0.210477    Loose Match
5   3880          0.210477    Loose Match
6   5732          0.210477    Loose Match
7   3879          0.210477    Loose Match
8   3866          0.210477    Loose Match
9   3882          0.210477    Loose Match

Applicant's qualifications:  represented caire second u acquisition mgc diagnostics manufacturer cardiorespiratory diagnostics system represented provider transactional communication solution sale doxim portfolio company gi partner represented caire subsidiary ngk spark plug acquisition e commerce seller portable oxygen concentrators respiratory product represented manufacturer advanced composite material recapitalization represented provider water wastewater infrastructure service sale sciens water represented provider center base