## Project 4: Career Prospects In Data


## Mission: 
- Determine factors that affect job salary
- Determine factors that determine job category

- Bonus: To evaluate models using a confusion matrix that is optimized towards generating false negatives (predict low, actual high) 

<div class="alert alert-block alert-info">

### Cleaning Checklist
- Use all scraped data from two searches, reset index
- Drop reference number rows then check for duplicates
- Look into null values
- Clean up regions =/= north south east west central and impute as islandwide
- Inspect 'islandwide' jobs and make a decision to drop (look at impact and relevance)
- Drop jobs with no salary data (or perhaps put them aside for predicting)
- Split jobs by 'to' and create lower and upper salary range
- Check for jobs with annual salary stated and apply a divisor of 12 to both upper and lower range
- Clean up unrelated jobs by title using industry as a checker
- Clean up seniority (jobs with multiple seniority stated)
- Finally, drop job link and salary payment type

</div>

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.linear_model import RidgeCV
from sklearn.linear_model import Lasso
from sklearn.linear_model import LassoCV

from pprint import pprint
import scipy.stats as stats
import csv
import matplotlib.pyplot as plt
import seaborn as sns
import math

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
sns.set(style="ticks", color_codes=True)


In [2]:
# Loading and concatenating two datasets: One for search term: Data; one for search term: Business Intelligence
jobs = pd.read_csv('/Users/paklun/Desktop/materials-master/projects/project-4/Career_Database.csv')
bizint = pd.read_csv('/Users/paklun/Desktop/materials-master/projects/project-4/Career_Database_bizint.csv')
alljobs = pd.concat([jobs,bizint])
alljobs.reset_index(drop=True)

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Links,Region,Company,Title,Employment Type,Seniority,Industry,Salary,Payment,Description,Requirements
0,0,0,https://www.mycareersfuture.sg/job/data-engine...,South,MONEYSMART SINGAPORE PTE. LTD.,Data Engineer,Full Time,Executive,Information Technology,"$5,000to$7,000",Monthly,Roles & Responsibilities\nMission\nAs part of ...,Requirements\nCompetencies\nDegree in Computer...
1,1,1,https://www.mycareersfuture.sg/job/data-scient...,South,PORTCAST PTE. LTD.,Data Scientist,Full Time,Middle Management,Engineering,"$2,200to$6,000",Monthly,"Roles & Responsibilities\nIn this role, you wi...",Requirements\n● Comfortable working with larg...
2,2,2,https://www.mycareersfuture.sg/job/data-visual...,Central,SINGAPORE PRESS HOLDINGS LIMITED,Data Visualisation Designer,Permanent,Junior Executive,Design,"$3,500to$4,500",Monthly,Roles & Responsibilities\nDigital arm of Engli...,Requirements\nPrior experience in a data visua...
3,3,3,https://www.mycareersfuture.sg/job/data-analys...,Central,GRABTAXI HOLDINGS PTE. LTD.,Data Analyst,Full Time,Executive,Information Technology,,,Roles & Responsibilities\nGet to know our Team...,Requirements\nThe must haves:\nA Bachelor's/Ma...
4,4,4,https://www.mycareersfuture.sg/job/data-center...,Central,AMAZON ASIA-PACIFIC RESOURCES PRIVATE LIMITED,Data Center Engineering Project Engineer APAC,Full Time,Professional,"Design, Engineering","$9,000to$12,000",Monthly,Roles & Responsibilities\nThe Data Center Glob...,Requirements\nBasic Qualifications -\nMinimum ...
5,5,5,https://www.mycareersfuture.sg/job/lead-data-c...,Central,JOHNSON & JOHNSON PTE. LTD.,Lead Data Center Engineer,Permanent,Professional,"Engineering, Information Technology","$5,000to$10,000",Monthly,Roles & Responsibilities\nThe role of Lead Dat...,Requirements\nRequired Minimum Education: Bac...
6,6,6,https://www.mycareersfuture.sg/job/data-scient...,East,JABIL CIRCUIT (SINGAPORE) PTE. LTD.,Data Scientist - Intern (6 months),Internship,Fresh/entry level,"Information Technology, Manufacturing, Others","$800to$1,500",Monthly,Roles & Responsibilities\nEssential Duties & R...,Requirements\nEducation & Experience Requireme...
7,7,7,https://www.mycareersfuture.sg/job/data-engine...,Islandwide,ADECCO PERSONNEL PTE LTD,Data Engineer,"Contract, Full Time",Executive,Information Technology,"$5,000to$8,500",Monthly,Roles & Responsibilities\nThe Opportunity\nOur...,Requirements\nThe Talent\nMinimum of 3 - 5 yea...
8,8,8,https://www.mycareersfuture.sg/job/data-center...,Central,OPTIMUM SOLUTIONS (SINGAPORE) PTE LTD,Data Center Fiber Optics Cabling Engineer,"Contract, Full Time",Non-executive,Information Technology,"$3,000to$5,000",Monthly,Roles & Responsibilities\nCompany UEN: 1997008...,Requirements\nUrgent & Immediate Position.\nMu...
9,9,9,https://www.mycareersfuture.sg/job/data-analys...,East,EDELMAN SINGAPORE PTE. LTD.,Data Analyst,Full Time,Senior Executive,Others,,,Roles & Responsibilities\nDevelop predictive m...,Requirements\nQualifications\nYou should have ...


In [3]:
# Dropping reference columns
alljobs.drop(['Unnamed: 0', 'Unnamed: 0.1'], axis=1,inplace=True)
alljobs.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4300 entries, 0 to 99
Data columns (total 11 columns):
Links              4300 non-null object
Region             4300 non-null object
Company            4283 non-null object
Title              4283 non-null object
Employment Type    4284 non-null object
Seniority          4219 non-null object
Industry           4284 non-null object
Salary             3806 non-null object
Payment            3806 non-null object
Description        4284 non-null object
Requirements       4143 non-null object
dtypes: object(11)
memory usage: 403.1+ KB


In [4]:
# Duplicate check
print('Pre-duplicate drop Shape:',alljobs.shape)
alljobs.drop_duplicates(inplace=True)
print('Post-duplicate drop Shape:',alljobs.shape)

Pre-duplicate drop Shape: (4300, 11)
Post-duplicate drop Shape: (4227, 11)


In [5]:
alljobs.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4227 entries, 0 to 95
Data columns (total 11 columns):
Links              4227 non-null object
Region             4227 non-null object
Company            4210 non-null object
Title              4210 non-null object
Employment Type    4211 non-null object
Seniority          4146 non-null object
Industry           4211 non-null object
Salary             3736 non-null object
Payment            3736 non-null object
Description        4211 non-null object
Requirements       4071 non-null object
dtypes: object(11)
memory usage: 396.3+ KB


<div class="alert alert-block alert-warning">
Since we might still want to retain job links with salary, lets deal with the entries without Companies and Title before proceeding
    
</div>

In [6]:
alljobs[alljobs['Company'].isnull()]

Unnamed: 0,Links,Region,Company,Title,Employment Type,Seniority,Industry,Salary,Payment,Description,Requirements
625,https://www.mycareersfuture.sg/job/pre-approva...,South,,,,,,,,,
645,https://www.mycareersfuture.sg/job/senior-huma...,East,,,,,,,,,
717,https://www.mycareersfuture.sg/job/performance...,East,,,,,,,,,
792,https://www.mycareersfuture.sg/job/technical-p...,East,,,,,,,,,
846,https://www.mycareersfuture.sg/job/purchasing-...,Central,,,,,,,,,
944,https://www.mycareersfuture.sg/job/transition-...,"South, Central",,,,,,,,,
1094,https://www.mycareersfuture.sg/job/assistant-l...,South,,,,,,,,,
1452,https://www.mycareersfuture.sg/job/senior-data...,"South, East, Central",,,,,,,,,
1684,https://www.mycareersfuture.sg/job/digital-ele...,North,,,,,,,,,
1811,https://www.mycareersfuture.sg/job/accounts-ex...,East,,,"Permanent, Full Time",Executive,Accounting / Auditing / Taxation,"$2,000to$3,200",Monthly,Roles & Responsibilities\nThe Accounts Executi...,Requirements\n~ Assign account codes to all tr...


In [7]:
alljobs[alljobs['Seniority'].isnull()]

Unnamed: 0,Links,Region,Company,Title,Employment Type,Seniority,Industry,Salary,Payment,Description,Requirements
39,https://www.mycareersfuture.sg/job/assistant-d...,Permanent,MINISTRY OF DEFENCE,Assistant Director (Integrated Feedback System),Permanent,,"Human Resources , Public / Civil Service",,,"Roles & Responsibilities\nYou lead, develop an...",
137,https://www.mycareersfuture.sg/job/manager-sen...,Full Time,SMART NATION AND DIGITAL GOVERNMENT OFFICE,"Manager / Senior Manager, Finance and Resourci...",Full Time,,Public / Civil Service,,,Roles & Responsibilities\nExcited to make a di...,
141,https://www.mycareersfuture.sg/job/senior-mana...,Full Time,SMART NATION AND DIGITAL GOVERNMENT OFFICE,"Senior Manager / Manager (ICT), WOG ICT Infra ...",Full Time,,Public / Civil Service,,,Roles & Responsibilities\nAre you a change mak...,
166,https://www.mycareersfuture.sg/job/senior-mana...,Full Time,SKILLSFUTURE SINGAPORE AGENCY,"Senior Manager / Manager, Funding Policy (Reso...",Full Time,,Public / Civil Service,,,Roles & Responsibilities\n Responsibilities \...,
422,https://www.mycareersfuture.sg/job/manager-sen...,Full Time,MINISTRY OF MANPOWER,"Manager / Senior Manager, Compliance Strategy ...",Full Time,,Public / Civil Service,,,Roles & Responsibilities\nThe job holder is re...,
484,https://www.mycareersfuture.sg/job/procurement...,Full Time,MINISTRY OF HOME AFFAIRS,Procurement Executive,Full Time,,Public / Civil Service,,,Roles & Responsibilities\nThe Ministry of Home...,
613,https://www.mycareersfuture.sg/job/assistant-m...,Full Time,MINISTRY OF SOCIAL AND FAMILY DEVELOPMENT,"Assistant Manager (Systems, Exclusion and Visi...",Full Time,,Public / Civil Service,,,Roles & Responsibilities\nThe Gambling Safegua...,
619,https://www.mycareersfuture.sg/job/assistant-m...,Full Time,MINISTRY OF SOCIAL AND FAMILY DEVELOPMENT,Assistant Manager / Manager (Child and Family ...,Full Time,,Public / Civil Service,,,Roles & Responsibilities\nThe ComCare and Soci...,
625,https://www.mycareersfuture.sg/job/pre-approva...,South,,,,,,,,,
645,https://www.mycareersfuture.sg/job/senior-huma...,East,,,,,,,,,


<div class="alert alert-block alert-warning">
Seems like jobs with Seniority field null are govt jobs and most are not data related & also problematic in the Salary field (null values). Lets drop them
    
</div>

In [8]:
alljobs.dropna(subset=['Company','Title','Seniority'],inplace=True)
alljobs.shape

(4145, 11)

In [9]:
alljobs.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4145 entries, 0 to 95
Data columns (total 11 columns):
Links              4145 non-null object
Region             4145 non-null object
Company            4145 non-null object
Title              4145 non-null object
Employment Type    4145 non-null object
Seniority          4145 non-null object
Industry           4145 non-null object
Salary             3725 non-null object
Payment            3725 non-null object
Description        4145 non-null object
Requirements       4070 non-null object
dtypes: object(11)
memory usage: 388.6+ KB


<div class="alert alert-block alert-warning">
    Let's split the dataset into two, depending if salary is specified.
    </div>

In [18]:
nopay = alljobs[alljobs['Salary'].isnull()].copy().reset_index(drop=True)

In [19]:
nopay.shape

(420, 11)

In [12]:
paidjobs = alljobs[alljobs['Salary'].notnull()].copy().reset

In [21]:
paidjobs.shape
paidjobs.tail()

Unnamed: 0,Links,Region,Company,Title,Employment Type,Seniority,Industry,Salary,Payment,Description,Requirements
47,https://www.mycareersfuture.sg/job/bi-sales-ma...,Central,XPLORE INFOCOMZ SOLUTION (PTE. LTD.),BI Sales Manager (ASEAN),Permanent,Professional,Information Technology,"$6,000to$8,000",Monthly,Roles & Responsibilities\nIt is a Account Mana...,Requirements\n8+ years overall sales experienc...
66,https://www.mycareersfuture.sg/job/software-en...,South,ENHANZCOM PTE. LTD.,Software Engineer,Full Time,Executive,Information Technology,"$2,200to$2,500",Monthly,Roles & Responsibilities\nMaintain and impleme...,Requirements\nDiploma or Degree in IT or Comp...
80,https://www.mycareersfuture.sg/job/senior-sale...,Central,MIDEA ELECTRIC TRADING (SINGAPORE) CO. PTE. LTD.,Senior Sales & Marketing Manager,Full Time,"Manager, Executive, Senior Executive","Marketing / Public Relations , Sales / Retail","$5,000to$6,500",Monthly,Roles & Responsibilities\nManager of Sales & M...,Requirements\nExperience & Background\n3+ year...
89,https://www.mycareersfuture.sg/job/business-in...,South,ADECCO PERSONNEL PTE LTD,Business Intelligence Software Engineer,"Contract, Full Time",Executive,Information Technology,"$5,000to$7,500",Monthly,Roles & Responsibilities\nMain responsibilitie...,Requirements\nThis position requires 5+ years ...
95,https://www.mycareersfuture.sg/job/business-in...,Central,HAYS SPECIALIST RECRUITMENT PTE. LTD.,Business Intelligence Analyst,"Permanent, Contract, Full Time",Junior Executive,Information Technology,"$4,000to$5,000",Monthly,Roles & Responsibilities\nThis consultancy com...,Requirements\nExcellent communication and stak...


In [14]:
paidjobs.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3725 entries, 0 to 95
Data columns (total 11 columns):
Links              3725 non-null object
Region             3725 non-null object
Company            3725 non-null object
Title              3725 non-null object
Employment Type    3725 non-null object
Seniority          3725 non-null object
Industry           3725 non-null object
Salary             3725 non-null object
Payment            3725 non-null object
Description        3725 non-null object
Requirements       3673 non-null object
dtypes: object(11)
memory usage: 349.2+ KB


<div class="alert alert-block alert-warning">
    Setting unclean region data to 'islandwide'
    </div>

In [15]:
paidjobs.Region.value_counts()

Central                        1370
South                           614
East                            569
West                            452
Islandwide                      386
North                           202
Full Time                        33
South, Central                   18
North, Central                   18
East, Central                    18
Contract                         12
South, East, Central              8
Permanent                         7
North, West                       3
South, East                       3
West, Central                     3
North, South, Central             2
North, East                       2
North, South, East, Central       2
East, West                        1
South, West                       1
Permanent ...                     1
Name: Region, dtype: int64

In [16]:
paidjobs.loc[paidjobs.Region == 'Permanent ...','Region'] = 'Islandwide'
paidjobs.loc[paidjobs.Region == 'Full Time','Region'] = 'Islandwide'
paidjobs.loc[paidjobs.Region == 'Permanent','Region'] = 'Islandwide'
paidjobs.loc[paidjobs.Region == 'Contract','Region'] = 'Islandwide'
paidjobs.Region.value_counts()

Central                        1370
South                           614
East                            569
West                            452
Islandwide                      439
North                           202
South, Central                   18
North, Central                   18
East, Central                    18
South, East, Central              8
North, West                       3
South, East                       3
West, Central                     3
North, South, Central             2
North, East                       2
North, South, East, Central       2
East, West                        1
South, West                       1
Name: Region, dtype: int64

In [17]:
paidjobs.loc[paidjobs.Region == 'Islandwide']

Unnamed: 0,Links,Region,Company,Title,Employment Type,Seniority,Industry,Salary,Payment,Description,Requirements
7,https://www.mycareersfuture.sg/job/data-engine...,Islandwide,ADECCO PERSONNEL PTE LTD,Data Engineer,"Contract, Full Time",Executive,Information Technology,"$5,000to$8,500",Monthly,Roles & Responsibilities\nThe Opportunity\nOur...,Requirements\nThe Talent\nMinimum of 3 - 5 yea...
13,https://www.mycareersfuture.sg/job/big-data-te...,Islandwide,ITCAN PTE. LIMITED,Big Data Tester,"Contract, Full Time",Executive,Information Technology,"$4,500to$7,500",Monthly,Roles & Responsibilities\nAbility to write tes...,Requirements\n2 to 7 Years of IT Experience\nE...
48,https://www.mycareersfuture.sg/job/sap-bods-co...,Islandwide,TECHCOM SOLUTIONS & CONSULTANCY PTE. LTD.,SAP BODS Consultant,"Contract, Full Time",Professional,"Banking and Finance, Information Technology","$5,000to$6,500",Monthly,Roles & Responsibilities\nSeeking a skilled an...,Requirements\nSeeking a skilled and experience...
73,https://www.mycareersfuture.sg/job/senior-desk...,Islandwide,RAPSYS TECHNOLOGIES PTE. LTD.,Senior Desktop Support Engineer,Full Time,Executive,Information Technology,"$2,000to$2,300",Monthly,"Roles & Responsibilities\nInstall, upgrade, su...",Requirements\nWorks with vendor support contac...
77,https://www.mycareersfuture.sg/job/senior-quan...,Islandwide,LEIGHTON CONTRACTORS (ASIA) LIMITED (SINGAPORE...,Senior Quantity Surveyor,Permanent,"Manager, Senior Executive",Building and Construction,"$5,000to$10,000",Monthly,Roles & Responsibilities\nUpdate monthly proje...,Requirements\nDiploma/Degree in Quantity Surve...
78,https://www.mycareersfuture.sg/job/software-qu...,Islandwide,SCIENTE INTERNATIONAL PTE. LTD.,Software Quality Assurance Specialist,Full Time,Senior Executive,Information Technology,"$4,000to$8,000",Monthly,Roles & Responsibilities\nJob Summary:\nWe are...,Requirements\nDesired Skill-set:\nCertified So...
101,https://www.mycareersfuture.sg/job/temp-financ...,Islandwide,EDUCARE HUMAN CAPITAL PRIVATE LIMITED,Temp Finance Senior Exec,Temporary,Senior Executive,Accounting / Auditing / Taxation,"$2,000to$2,600",Monthly,Roles & Responsibilities\nTo support National ...,Requirements\nPreferably 1-2 years of relevant...
102,https://www.mycareersfuture.sg/job/senior-solu...,Islandwide,GOVERNMENT TECHNOLOGY AGENCY,Senior Solution Architect (Cloud Native),Permanent,Professional,"Information Technology, Public / Civil Service","$8,000to$12,000",Monthly,Roles & Responsibilities\nAs a member of Solut...,
122,https://www.mycareersfuture.sg/job/apac-web-co...,Islandwide,COGNIZANT TECHNOLOGY SOLUTIONS ASIA PACIFIC PT...,APAC Web Content and Localization Manager,Full Time,"Manager, Professional",Information Technology,"$9,500to$13,500",Monthly,Roles & Responsibilities\nResponsible for over...,Requirements\nEducation: Bachelor's degree in ...
128,https://www.mycareersfuture.sg/job/portfolio-m...,Islandwide,STANDARD CHARTERED BANK,Portfolio Management (C I Banking),Full Time,Manager,Banking and Finance,"$7,000to$11,000",Monthly,Roles & Responsibilities\nAbout Standard Chart...,"Requirements\nExperience:\nMaster’s degree, an..."
