# Matching profiles and offers

Objectif: Trouver les profils d'employées qui correspondent le mieux à chaque offre

Exercices:

1. Pour chaque offre, trouver l'employé pour lequel les compétences techniques et les compétences longues correspondent le mieux.

Note: Les fonctions `offer_to_word_doc` et `profile_to_word_doc` dans `utils.py` aident à formatter un profil ou une offre en document Word pour aider à les visualiser.

In [1]:
import re

import nltk
import pandas as pd
from flashtext import KeywordProcessor
from nltk import stem
from os.path import join
from scipy.sparse import csr_matrix
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from tqdm.notebook import tqdm
from unidecode import unidecode
from utils import offer_to_word_doc, profile_to_word_doc

tqdm.pandas()
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /home/philippe/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

## Load and clean data
We load all profile data: tech skills and experience descriptions are used to compute similarities, and profile job titles are used to check matching quality. We also load 10 offers for which we want to find the perfect profile.

In [2]:
df_profiles = pd.read_csv('employee_profiles.csv')
df_skills = pd.read_csv('employee_skills.csv')
df_experiences = pd.read_csv('employee_experiences.csv')
df_offers = pd.read_csv('offers.csv')

In [3]:
# Remove profiles with too few skills
n_min_tech_skill = 10
mask_keep_profiles = df_skills.groupby(by="id_profile", sort=False).count() > n_min_tech_skill
selected_profile_ids = mask_keep_profiles[mask_keep_profiles.values].index
df_profiles = df_profiles[df_profiles['id_profile'].isin(selected_profile_ids)]
df_skills = df_skills[df_skills['id_profile'].isin(selected_profile_ids)]
df_experiences = df_experiences[df_experiences['id_profile'].isin(selected_profile_ids)]

In [4]:
# Subsample data: only work with 3000 profiles to speed things up
# n_sub_sample = 3000
# df_profiles = df_profiles.iloc[:n_sub_sample]
# selected_profile_ids = set(df_profiles['id_profile'])
# df_skills = df_skills[df_skills['id_profile'].isin(selected_profile_ids)]
# df_experiences = df_experiences[df_experiences['id_profile'].isin(selected_profile_ids)]

# Matching 10 offers against all employees
Looping through each offer, we compute tech skill and long skill similarity scores to all profiles. We sum the scores and output the profile that best matches the offer.
Results are displayed below for all 10 offers.

In [5]:
match_dir = "matching"
for i, offer in df_offers.iterrows():
    # Convert offers to Word document
    offer_to_word_doc(*df_offers[["jobtitle", "description", "contract", "company", "location"]].iloc[i],
                      out_file_name=join(match_dir, f"{i}_offre.docx"))
    
    ##################
    # TODO: Matching #
    ##################
    best_profile_id = 49282
    global_score = 0.0
    tech_skill_score = 0.0
    long_skill_score = 0.0
    
    # Convert best profile to Word document
    profile_to_word_doc(best_profile_id, df_profiles, df_skills, df_experiences,
                        out_file_name=join(match_dir, f"{i}_candidate.docx"))


    # Best candidate
    print(f"\nMatching results for offer {i} ({offer['jobtitle']}):\n "
          f"\tCandidate #{best_profile_id}\n"
          f"\tScore: {global_score:.3f} (tech_score={tech_skill_score:.3f}, mission_score={long_skill_score:.3f})\n"
          f"\tJob title: {df_profiles.loc[df_profiles['id_profile'] == best_profile_id, 'jobtitle'].item()}\n"
          f"\tSkills: {', '.join(df_skills.loc[df_skills['id_profile'] == best_profile_id, 'skill'])}\n"
          f"\tExperiences: {', '.join(df_experiences.loc[df_experiences['id_profile'] == best_profile_id, 'title'])}\n")


  return self._get_style_id_from_style(self[style_name], style_type)



Matching results for offer 0 (developpeur logiciel):
 	Candidate #49282
	Score: 0.000 (tech_score=0.000, mission_score=0.000)
	Job title: responsable de projet
	Skills: ITIL, PMP, Project Management, Architecture, PMI, Lean Management, Linux, AIX, Oracle, Websphere, Tomcat, HACMP, KPI Dashboards, Gestion de projet, MBA, Gestion d’équipe
	Experiences: Co-fondateur, Chef de projet devops, Chef de projet informatique, Chef de projet d’infrastructure, Chef de projet technique, Ingénieur system et Réseaux


Matching results for offer 1 (developpeur logiciel embarque c#):
 	Candidate #49282
	Score: 0.000 (tech_score=0.000, mission_score=0.000)
	Job title: responsable de projet
	Skills: ITIL, PMP, Project Management, Architecture, PMI, Lean Management, Linux, AIX, Oracle, Websphere, Tomcat, HACMP, KPI Dashboards, Gestion de projet, MBA, Gestion d’équipe
	Experiences: Co-fondateur, Chef de projet devops, Chef de projet informatique, Chef de projet d’infrastructure, Chef de projet technique, I