# Testing the model with LinkedIn profile descriptions

<div class="alert alert-info">
    <H2>Explanation</H2>
<p>In the other notebooks I developed 
    1) a webscraper that extracted 3000 job descriptions from Indeed.com and 
    2) I analysed the data and built a machine learning model to see if it could determine what title a job had just by looking at the job description.
    
In this short notebook, what I did is randomly extract 15 profile descriptions from LinkedIn to see if the model can guess what the person's title is. Note: If you are one of the person's whose description is here and you would like it removed, please let me know.
</p>
</div>

In [1]:
#we only need these three imports
import pandas as pd
import numpy as np
import pickle

In [2]:
#we upload the data
df = pd.read_excel('data/test_data.xlsx')

In [3]:
#we check the data
df

Unnamed: 0,description,title
0,Currently work as an analyst at Facebook. I'm ...,data analyst
1,I enjoy working with data because it's like a ...,data analyst
2,Experienced Senior Analyst with a demonstrated...,data analyst
3,__ is a Data Scientist on the styling recommen...,data scientist
4,"With a customer-centric mindset, I build data ...",data scientist
5,"Data scientist with a background in economics,...",data scientist
6,I currently work as a Software Engineer at Goo...,software engineer
7,All my life I have been observant on how to ma...,software engineer
8,Software Engineer working for Google Assistant...,software engineer
9,Business Analyst continuously delivering and d...,business analyst


In [4]:
#we provide numbers to each class
df['label'] = df['title'].map({'business analyst':1,'data analyst':2, 'data scientist':3,'machine learning engineer':4, 'software engineer':5})

In [5]:
#load the vectorizer and the model from model folder
loaded_vect = pickle.load(open('model/Vectorizer.sav', 'rb'))
loaded_model = pickle.load(open('model/Multinomial_NB.sav', 'rb'))

In [6]:
#we transform the text with our vectorizer and predict the label
X_test = loaded_vect.transform(df['description'])
result = loaded_model.predict(X_test)

In [7]:
#we add the prediction to the dataframe
df['prediction'] = result
df['label_predicted'] = df['prediction'].map({'business analyst':1,'data analyst':2, 'data scientist':3,'machine learning engineer':4, 'software engineer':5})
df

Unnamed: 0,description,title,label,prediction,label_predicted
0,Currently work as an analyst at Facebook. I'm ...,data analyst,2,data analyst,2
1,I enjoy working with data because it's like a ...,data analyst,2,data scientist,3
2,Experienced Senior Analyst with a demonstrated...,data analyst,2,data analyst,2
3,__ is a Data Scientist on the styling recommen...,data scientist,3,data scientist,3
4,"With a customer-centric mindset, I build data ...",data scientist,3,data scientist,3
5,"Data scientist with a background in economics,...",data scientist,3,data scientist,3
6,I currently work as a Software Engineer at Goo...,software engineer,5,machine learning engineer,4
7,All my life I have been observant on how to ma...,software engineer,5,software engineer,5
8,Software Engineer working for Google Assistant...,software engineer,5,software engineer,5
9,Business Analyst continuously delivering and d...,business analyst,1,business analyst,1


In [8]:
#we can check the confussion matrix, classification report, and accuracy score
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
print('Classification Report')
print(classification_report(df['label'], df['label_predicted']))
print('-'*20)
print('Confussion Matrix')
print(confusion_matrix(df['label'], df['label_predicted']))
print('-'*20)
print('Accuracy Score')
print(accuracy_score(df['label'], df['label_predicted']))

Classification Report
              precision    recall  f1-score   support

           1       1.00      0.33      0.50         3
           2       0.67      0.67      0.67         3
           3       0.60      1.00      0.75         3
           4       0.75      1.00      0.86         3
           5       1.00      0.67      0.80         3

    accuracy                           0.73        15
   macro avg       0.80      0.73      0.71        15
weighted avg       0.80      0.73      0.71        15

--------------------
Confussion Matrix
[[1 1 1 0 0]
 [0 2 1 0 0]
 [0 0 3 0 0]
 [0 0 0 3 0]
 [0 0 0 1 2]]
--------------------
Accuracy Score
0.7333333333333333


In [9]:
#we check the profile descriptions that the model did not guess correctly 
for i in df[df['label'] != df['label_predicted']]['description']:
    print('*'*100)
    print()
    print(i)
    print()

****************************************************************************************************

I enjoy working with data because it's like a business mystery to be solved. Facebook's mission is to give people the power to build community and bring the world closer together. I am incredibly honored to work at Facebook, which has no shortage of big data for engineering, processing, analyzing, and modeling. 

****************************************************************************************************

I currently work as a Software Engineer at Google Assistant Natural Language Understanding (NLU) team.

Previously, as a graduate student at Cornell University, I worked on AI Driving Olympics challenge by NIPS, under mentorship of Prof. Hadas Kress-Gazit. I completed my undergraduate studies at Manipal University.

In 2018, I interned at Amazon India Machine Learning Team, with my work primarily concerning ranking, content personalization, large-scale data analyses to uncover

<div class="alert alert-danger">
    <H4>Note</H4>
<p>The fact that the model does not correctly guess some profile descriptions can be due to many reasons. One is that the model is not very good (it is always possible!), another one can be that the person actually has a mismatch between what their title, what they write on their profile, and what the market expects for that type of title. 
</p>
</div>