## Job Reccomender for Jobs at Google ##

This is a project that attempts to make reccomendations for job titles. This can be easily adpated into the Bounty system.

#### Step 1: Connecting and Querying the DB

In [2]:
import sqlite3

db = sqlite3.connect('job_db')
c = db.cursor()

c.execute('select * from google_jobs')
my_data = c.fetchall()

#### Step 2: Adding DB data to DataFrame

In [3]:
import pandas as pd
import numpy as np

df = pd.DataFrame(data=my_data, columns=['company',
    'title',
    'category',
    'location',
    'responsibilities',
    'minimum_qualifications',
    'preferred_qualifications'])

df.head(3)

Unnamed: 0,company,title,category,location,responsibilities,minimum_qualifications,preferred_qualifications
0,Google,Google Cloud Program Manager,Program Management,Singapore,"Shape, shepherd, ship, and show technical prog...",BA/BS degree or equivalent practical experienc...,Experience in the business technology market a...
1,Google,"Supplier Development Engineer (SDE), Cable/Con...",Manufacturing & Supply Chain,"Shanghai, China",Drive cross-functional activities in the suppl...,BS degree in an Engineering discipline or equi...,"BSEE, BSME or BSIE degree.\nExperience of usin..."
2,Google,"Data Analyst, Product and Tools Operations, Go...",Technical Solutions,"New York, NY, United States",Collect and analyze data to draw insight and i...,"Bachelor’s degree in Business, Economics, Stat...",Experience partnering or consulting cross-func...


In [4]:
df['soup'] = df['category'] + df['responsibilities'] + df['minimum_qualifications'] \
    + df['preferred_qualifications']

df['soup'].head()

0    Program ManagementShape, shepherd, ship, and s...
1    Manufacturing & Supply ChainDrive cross-functi...
2    Technical SolutionsCollect and analyze data to...
3    Developer RelationsWork one-on-one with the to...
4    Program ManagementPlan requirements with inter...
Name: soup, dtype: object

#### Step 3: Data Exploration

In [5]:
df.groupby('category')['title'].count()

category
Administrative                       40
Business Strategy                    98
Data Center & Network                 2
Developer Relations                   5
Finance                             115
Hardware Engineering                 26
IT & Data Management                  5
Legal & Government Relations         46
Manufacturing & Supply Chain         16
Marketing & Communications          165
Network Engineering                   6
Partnerships                         60
People Operations                    86
Product & Customer Support           50
Program Management                   74
Real Estate & Workplace Services     25
Sales & Account Management          168
Sales Operations                     31
Software Engineering                 31
Technical Infrastructure             11
Technical Solutions                 101
Technical Writing                     5
User Experience & Design             84
Name: title, dtype: int64

In [6]:
data = df.copy()

try:    
    y = data.pop('title')
    x = data
except:
    pass

print(f'x columns: {list(x.columns)}\n\ny columns: {[y.name]}')

x columns: ['company', 'category', 'location', 'responsibilities', 'minimum_qualifications', 'preferred_qualifications', 'soup']

y columns: ['title']


In [7]:
x.isna().sum()

company                     0
category                    0
location                    0
responsibilities            0
minimum_qualifications      0
preferred_qualifications    0
soup                        0
dtype: int64

In [8]:
pd.DataFrame(y).isna().sum()

title    0
dtype: int64

#### Step 4: Training the Model

In [9]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

vec = CountVectorizer(stop_words='english')
matrix = vec.fit_transform(x['soup'])

cosine_sim = cosine_similarity(matrix, matrix)

matrix.shape

(1250, 5647)

In [10]:
def display_results(job_title, cos=cosine_sim):
    indices = pd.Series(y.index, index=y)
    idx = indices[job_title]

    scores = list(enumerate(cos[idx]))
    scores = sorted(scores, key=lambda x: x[1], reverse=True)

    scores = scores[1:11]

    job_titles = [i[0] for i in scores]

    return df['title'].iloc[job_titles]

In [11]:
display_results(y[2])

277     Customer Experience Data Scientist, Google Clo...
10                        Data Analyst, Consumer Hardware
304                       Data Analyst, Consumer Hardware
578     Data Science Analyst, Revenue Strategy and Ope...
873     Strategy and Business Analyst, Go-to-Market, G...
928           Associate, Business Operations and Strategy
935      Business Systems Analyst, Financial Applications
85            Quantitative Analyst Lead, Trust and Safety
379           Quantitative Analyst Lead, Trust and Safety
1190                                Staffing Project Lead
Name: title, dtype: object