Hybrid-Talent-Recommendation-of-LinkedIn

Aim

This project is for Hybrid recommedation for talents, we used different combinations of different technologies such as feature representation approachs(N-Grams, Word Embedding) and different learning algorithms(SVM, Logistic regression) to find the best approach for expert recommendaion of LinkedIn user profiles.

Content-based Recommender System Design

LinkedIn Profile Data Overview and Data Analysis

Data overview:

The LinkedIn user profile data is a 261.2 MB CSV file with 158096 LinkedIn user profile.

For every user profile, they have 67 attributes that can be categorized as following:

User id
Username
Connections number of user
Seven parts of work experience (current and past work experience, combined with job title, company name, company type, work duration, company location)
Four parts of educational backgrounds (university name, degree of education, major, end date of education, education details)
Skills
Languages

Data analysis:

Connections number of user: the number shows the connections of other LinkedIn users for every user.

Top 15 job position of all user profile:

Based on the statistic data, there are mainly three kinds of job position: technical position (engineer), management position(manager) and academic position(professor).

Thus, based on the data, we will mainly select job positions from those areas.

Average number of the effective work experience per user: Based on the statistic data, nearly 82% of the user have at least 1 past job experience. And more than 48% of users have 2 job experience.

Average number of effective education per user: Based on the statistic data, nearly 89% of the user have at least 1 education background. more than %56 of users have more than 2 education background.

Average number of the skills per user is: 19.72

Summary of the user data and selecting of user profile attributes:

Thus, based on the statistic data of the job position, education background and skills. We decided to use six past work experience, four parts of education (except university name, end date of education), skills, language.

Coding sytle

In order to make the robustness and scalability of the system, we use the Objected-Oriented (OO based) programming in our system, and the organization is shown in the linkedindata_old.py and the organization of the user profile is:

Feature representation lists:

N-gram models used:

Unigram(Bag-of-Words)

Bigram

Trigram

Word2vec:

Doc2vec:

Machine learning algorithms used:

Logistic Regression

Logistic Regression CV

SVM SVC

SVM NuSVC

SVM LinearSVC

Naive Bayes

Decision Tree

Random Forest

Data Description: The LinkedIn profile is combined with following content:

1.user id

2.user name

3.connections number of user

4.six parts of work experience(combined with job title, company name, company type, work duration, company location)

5.Highest education background

6.Three parts of other education background

7.skills and languages

The function of python file:

datafilter.py: filter the data into two different dataset based on the aim of the recommendation (users who are relevanat and users who are not relevant)

calculate_data_job_now.py: calculate user's work year of past work experience using regular experssion

datanormalize.py: normalize results of the work year data

bag-of-words.py: generate data based on bag-of-words

n-grams.py: generate data based-on bigram or trigram

generate_train_test_set.py: generate train and test set

generateweightingfile.py: merge user data with normalized work year data.

globalparameter.py: store the global parameter of the data, including the data path,train/test split ratio and other parameter.

main.py: main function of the program.

Collaborative-Filtering-User-Profile-Recommendation

Design and Methodology of the Collaborative Filtering Approach

The design of collaborative filtering is shown in figure. We will introduce main components in the collaborative filtering approach: Data preprocessing: In the data preprocessing part we will discuss what attributes will be suitable for collaborative filtering approaches. Forming matrix for learning algorithm: Process of forming user rating matrix will be presented and we will illustrate how we form the rating value for collaborative learning algorithm.

Data preprocessing for Collaborative Filtering Recom- mendation

For the collaborative filtering approach, we usually need to form a matrix based on our goal for the recommendation. For example, for a recommendation in e-commerce, we usually need to form user-item matrix to find similarity between customers. For a similar book recommendation, we usually need to build item-item matrix to show the inner connection between books. Thus in our collaborative filtering approach, we decided to form the matrix of user- workexperience-rating matrix (i.e. user as row of matrix, work experience attribute except the first work experience as columns of matrix and value in cells is the rating of work experience, which will be discussed thoroughly in the feature representation section) for collaborative filtering learning algorithms. The reason why we implement the strategy is that the collaborative filtering approach could fit our research aim (i.e. suitable approach for candidate recommendation for jobs). Thus we consider using work experience in the collaborative filtering approach. In order to build data for supervised learning, the job title of first work experience will be the output of learning and user-workexperience-rating matrix will be the output.

Building Matrix for Collaborative Filtering Approach

We will build a user-workexperience-rating matrix based on the user profile data. The work experience (Columns of matrix) is six work experiences of a user mentioned in section 3.3 and we calculate the rating based on the following rules: For a specific given 32 job title (i.e. software engineer), we will calculate the rating based on following rules: 1. If the job title in the work experience is same as the given job title and the work year is more or equal than 3 years, the rating value is 2. 2. If the job title in the work experience is the given job title and the work year is less than 3 years, the rating value is 1. 3. If the job title in the work experience is not the given job title, the rating value is -1.

High Dimension of User Profile Vector Space and Solution

Some feature representation methods(like N-grams) will produce huge user profile vector space, which will cause longer training time of recommendation system.

The Dimension value of user profile using Bigram is:

Thus, we decided to use Singular Value Decomposition (SVD) for reducing the dimension to 50:

Comparision of training time:

Recommendation Results:

We use precision and recall to evaluate results. The following is the evaluate results:

Average precision & recall of using all user profile data and N-gram model(work experience, education background, skill&language)

Average precision of using all user profile data and word embedding model(work experience, education background, skill&language)

Average precision of using all user profile data and document embedding model(work experience, education background, skill&language)

Here are part of results for recommenadtion result. For Further detailed information, please send me the message.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.idea		.idea
__pycache__		__pycache__
Doc2vec.py		Doc2vec.py
GloVe_embedding.py		GloVe_embedding.py
README.md		README.md
alg_bayes_oo.py		alg_bayes_oo.py
alg_decision_tree_oo.py		alg_decision_tree_oo.py
alg_logestic_regression_oo.py		alg_logestic_regression_oo.py
alg_ramdom_forest_oo.py		alg_ramdom_forest_oo.py
alg_svm_oo.py		alg_svm_oo.py
bag_of_words.py		bag_of_words.py
calculate_baseline.py		calculate_baseline.py
calculate_data_job_now.py		calculate_data_job_now.py
content-based.py		content-based.py
converttolowercase.py		converttolowercase.py
datafilter.py		datafilter.py
datanormalize.py		datanormalize.py
extract_multivalue_feature.py		extract_multivalue_feature.py
generate_train_test_set.py		generate_train_test_set.py
generateweightingfile.py		generateweightingfile.py
globalparameter.py		globalparameter.py
jobtitleextractor.py		jobtitleextractor.py
labeled_data_sentence.py		labeled_data_sentence.py
linkedindata.py		linkedindata.py
linkedindata_old.py		linkedindata_old.py
main.py		main.py
matrix_generator.py		matrix_generator.py
matrix_generator_pastexp.py		matrix_generator_pastexp.py
n_grams.py		n_grams.py
remove_non_alphapet.py		remove_non_alphapet.py
textprocessor.py		textprocessor.py

YuzhouPeng/Linedin-User-profile-Hybrid-Recommendation

Folders and files

Latest commit

History

Repository files navigation

Hybrid-Talent-Recommendation-of-LinkedIn

Aim

Content-based Recommender System Design

LinkedIn Profile Data Overview and Data Analysis

Coding sytle

Feature representation lists:

Collaborative-Filtering-User-Profile-Recommendation

Design and Methodology of the Collaborative Filtering Approach

Data preprocessing for Collaborative Filtering Recom- mendation

Building Matrix for Collaborative Filtering Approach

High Dimension of User Profile Vector Space and Solution

Recommendation Results:

About

Topics

Resources

Stars

Watchers

Forks

Languages