# Project 4: Web Scraping Job Postings

### Factors that impact salary
    To predict salary you will be building either a classification or regression model, using features like the location, title, and summary of the job. If framing this as a regression problem, you will be estimating the listed salary amounts. You may instead choose to frame this as a classification problem, in which case you will create labels from these salaries (high vs. low salary, for example) according to thresholds (such as median salary). 
    
    Predictors: Location (Use NLP to extract Country), Title, Job Summary
    Target: Salary Amount (Regression) / Salary Category (Classification)

### Factors that distinguish job category
    Using the job postings you scraped for part 1 (or potentially new job postings from a second round of scraping), identify features in the data related to job postings that can distinguish job titles from each other. There are a variety of interesting ways you can frame the target variable, for example:

    What components of a job posting distinguish data scientists from other data jobs?
    
    Predictors: To be determined
    Target: Data Scientist Or Not 
    
    What features are important for distinguishing junior vs. senior positions?
    
    Predictors: To be determined
    Target: Junior / Senior Position [Position Level]
    
    Do the requirements for titles vary significantly with industry (e.g. healthcare vs. government)?
    
    Predictors: Title Features
    Target: Job Category (id=job-categories)
    

In [1]:
# Define Search Terms: data scientist, data analyst, research scientist, business intelligence

### Let's select a website to crawl

| Website | Location (NWES)   | Salary   | HTML class Structure   |
|------|------|------|------|
|   JobCentral  | Absent| Partial   | Messy  |
|   MyCareersFuture  | Present| Partial   | OK   | 
|   CareersGov  | Absent| Absent   | Messy   | 



In [2]:
# import requests

# url_front = 'https://www.mycareersfuture.sg/search?search='
# url_back = '&sortBy=new_posting_date&page='

# search_terms = ['data scientist', 'data analyst']
# for search_term in search_terms:
#     response = requests.get(url_front+search_term.replace(' ', '%20')+url_back+str(0)).text
#     soup = BeautifulSoup(response)
#     print(soup)
#     print(soup.find_all('a'))
# # https://www.mycareersfuture.sg/search?search=data%20analyst&sortBy=new_posting_date&page=0
# need to use selenium as it is a dynamic page

In [3]:
from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup

url_front = 'https://www.mycareersfuture.sg/search?search='
url_back = '&sortBy=new_posting_date&page='

search_terms = ['data scientist', 'data analyst', 'research scientist', 'business intelligence', 'software developer', 'project manager', 'accountant']

job_links = []

for search_term in search_terms:
    for page_number in range(0, 100): # assume max 100 pages
        n_links = 0
        browser = webdriver.Chrome('./chromedriver')    
        browser.get(url_front+search_term.replace(' ', '%20')+url_back+str(page_number))
        sleep(5)
        soup = BeautifulSoup(browser.page_source)
        browser.close()
        all_listing = soup.find('div', attrs={'class':'card-list'})
        if all_listing == None:
            break;
        for link in all_listing.find_all('a', href=True):
            if '/job/' in link['href']: # filter away non-job related links
                print(link['href'])
                job_links.append([search_term, link['href']])
                n_links += 1
        if n_links == 0: # page doesnt exist, stop going to next page
            break;

/job/data-scientist-singapore-power-5731c18be188d7d1f80a840a834ad207
/job/senior-data-scientist-singapore-power-b8f943fb05cb5ef1ec504e530dc425f3
/job/data-scientist-spotify-singapore-1801998abb817910b9acb022c3b636dc
/job/senior-data-scientist-lynx-analytics-702973c3a33b2b7a251a49cc10a66b8b
/job/data-scientist-information-technology-nomura-singapore-0463f4e6680987a4e8aa1ed80616846d
/job/data-scientist-standard-chartered-bank-f7a349b15079f2ffbc25872bd09570f4
/job/data-analyst-propertyguru-4b5f0b9ca4e8ecd00e941ed4eebbf9a0
/job/data-scientist-21861e90483b457a87d98b573aeedb8e
/job/data-scientist-jabil-circuit-51ea2ce5b0eedff231dc658097a05543
/job/data-scientist-hewlett-packard-enterprise-singapore-4cd401024939234386449a21c78de05b
/job/data-scientist-smartsoft-c605a552bf2a0ae3caad295b5a238394
/job/data-scientist-schellden-global-services-2cce3a0fd90690d4b4505d77e0eff1e7
/job/data-scientist-schellden-global-7b446f16f1967685b15c4d50565d63bd
/job/data-scientist-siemens-f76d3f9f081a63cff0d2e4319

/job/lead-business-intelligence-laminaar-aviation-infotech-efc93b23745533a2ab1cb8ca751afd81
/job/engineer-business-intelligence-turner-broadcasting-sales-southeast-asia-adddd860d2a1515040cea9f567414eaa
/job/data-scientist-singapore-power-5731c18be188d7d1f80a840a834ad207
/job/senior-data-scientist-singapore-power-b8f943fb05cb5ef1ec504e530dc425f3
/job/hr-business-partner-sabre-asia-pacific-8e344f14b64d82b21ef85bea331ed146
/job/financial-analyst-wilhelmsen-ships-service-38a28304ba1fc8a963dc370cf6df0d57
/job/head-data-engineering-fixed-mobile-09c037328fd2d76b67d47d6a0c1f90cc
/job/developer-0028c0ba1aaf96a6010e17ed2ed31b27
/job/business-analyst-consultant-infosys-consulting-5e6bcb059ecd569fd7bf8c389bf60437
/job/data-scientist-jabil-circuit-51ea2ce5b0eedff231dc658097a05543
/job/analytics-architect-icon-consulting-group-440ccbcb6320596315d08cbc26e03cc1
/job/finance-manager-ito-standard-chartered-bank-1380582446cd0645db18375eef1e61d7
/job/senior-database-specialist-streetsine-singapore-4a0b065

/job/engineering-client-experience-client-onboarding-tech-regional-lead-executive-director-goldman-sachs-services-4ba6ab2ca6a4ff2110c986f8cf76415f
/job/business-intelligence-analyst-alphatech-business-solutions-5e8fbce0e793642e0718ea51c4086ce2
/job/bi-software-sales-consultant-account-manager-xplore-infocomz-solution-78b6044845de290f5aee9a589171d779
/job/software-developer-geniegoo-56fbfda78bb0166cb2e757ad8a57ae2e
/job/net-software-developer-talent-trader-group-eb8cf3678e01e4c2b92c7ff0f7aa7a8b
/job/executive-sky-premium-international-8881bad4eff5a832584f6921eb637a78
/job/executive-sky-premium-international-2209fad2e90fd89209f59c03dd0443d2
/job/senior-software-engineer-developer-geenet-abb0ca0d0b11646dbe9e14c0289b2d35
/job/sr-software-engineer-magento-developer-ranosys-technologies-ccbcf2333a5b135f28c3aae28c71a99c
/job/developer-senior-developer-onelink-hr-consultancy-f793641b9f5b1b97538ce90be332231e
/job/mobile-developer-fehmarn-consulting-7bc3f6ce5ecb2f77e0c696de9f4cf38e
/job/senior-s

/job/software-engineer-kelly-services-4c6f7d94ba00c66f273eec54e5098aee
/job/software-engineer-kelly-services-1270f0d07702eccbad9bd741e2703fd7
/job/software-engineer-perx-technologies-76b8daaf581734b603b916a575914ef5
/job/performance-testing-engineer-power-consultancy-services-e9a890a3166bf3fee6b7064aaf04b9ec
/job/software-engineer-charterhouse-3258b02413352eb8e93cfc2a2887c948
/job/sap-abap-alphatech-business-solutions-777cea8993d97ff915c2c94fb915f513
/job/senior-software-developer-e1767dfe8363db27b6011c326758fffd
/job/senior-software-developer-6875e9f6ecceeb8ee32baa6a1b46eeea
/job/software-developer-a994727b6ea58244371baed91a57bfc0
/job/nav-consultant-5058734daa63d3af7cf8ababf6eece68
/job/senior-software-engineer-net-046858d2669ee9b18ab4d7fff42134d4
/job/senior-front-end-developer-03611f77a7df95b0fa14e6830e70c110
/job/senior-software-engineer-8ead103c4b5a19d87917e2fc4d31ecc2
/job/senior-software-engineer-562fd7766daada9f4855f39aa8c36ef9
/job/software-engineer-b990b40fda3b6e9e4f459bc89b

/job/software-engineer-golang-zalora-south-east-asia-100959b167e5b7439cce09ecbe83ef50
/job/backend-software-engineer-trusty-cars-9db1e450e997177af1b6c09e988c80b9
/job/staff-senior-engineer-ncs-af4ad64abac671df7d0bdf21a2b2aecd
/job/software-engineer-vital-vision-technology-0e2b3068510c3862efbdc64cbbadf846
/job/software-design-engineer-hirata-fa-engineering-d3a20d1a1a999faf880901de3960de8e
/job/mobile-framework-developer-tata-consultancy-services-asia-pacific-7bdc690a6147e562f9f354d1638586ab
/job/net-web-api-developer-spencer-ogden-8de11cbf7062cbd224bb85074ba82759
/job/rpa-developer-modular-infotech-a15527f1ecaad63341b529172e01ecb9
/job/big-data-developer-unisoft-infotech-6bcc4860e8128696c66558007b50b566
/job/front-end-developer-hydrogen-consulting-solutions-e48e5b0916f659625136afa0a094d01a
/job/net-hydrogen-consulting-solutions-f833ef291fdb64e971890cf6f1c203f0
/job/software-engineering-team-lead-doctor-anywhere-operations-9ca25f0c7b1200e5618acdd27b02317a
/job/full-stack-software-enginee

/job/software-test-automation-lead-international-sos-technology-services-05a7eb01428413e3fc1c0a5b3694ebf4
/job/backend-developer-%E2%80%93-listed-investment-firm-globesoft-services-c1e62459ed6977d8b86a138a158d9a1d
/job/advanced-software-engineer-sun-electric-7905e9d586099bd32cd3f909f22a3504
/job/android-developer-allegis-group-singapore-e43f939a531c4c266220b05eb12f1384
/job/android-developer-allegis-group-singapore-9caf87f3f320cacb3a27880b52411a42
/job/servicenow-developer-alphatech-business-solutions-49fcca78a58ed7cb5e57e25632d9ee5b
/job/rpa-developer-alphatech-business-solutions-6ad6dc57a1db94f23c9d3de24247cb9b
/job/bi-developer-alphatech-business-solutions-133329740ab94efdd027bcc337e6dd28
/job/software-engineer-i-access-solutions-de0e68c026479c5d9f9d8e47b8744aae
/job/senior-software-engineer-i-access-solutions-c0d961f18cc031cb4a54c3fa3bbf18d4
/job/full-stack-developer-bettertradeoff-b96c57a15307a2f882b6570e3718e3e0
/job/front-end-developer-bettertradeoff-4b6d22586f2f0f49101bab5b7a16

/job/senior-software-engineer-carousell-3973c7cbe28d31d10fd13d6317b8f9ef
/job/software-engineer-carousell-18ff9faca04684ce03726bbdb3a4050b
/job/software-engineer-carousell-1114ea9d0557781e102f6f4a4c4bec44
/job/infrastructure-engineer-allegiance-marketing-0c8d90a7322fd0afec927d624b9e156e
/job/software-engineer-charterhouse-0edac85b74352887bd533c3e80d72e0f
/job/embedded-engineer-sciente-international-242387709b699e43d52f45c7bd07ac1d
/job/software-developer-bioquest-advisory-af04797bb3479cfa060a263eb3d06a6e
/job/senior-software-engineer-izeno-1e885dbb786983dc0702b99f1783b32f
/job/software-engineer-sopra-steria-asia-55b29df74481e9d0d2b838a842a194f0
/job/android-developer-haulio-e0f4b292ba8f35666214b7d62705e110
/job/ios-developer-haulio-b2c5a309b78a1577061d3067d922e20f
/job/front-end-developer-haulio-fae539d8664a3a288929fdf2ba97eb82
/job/backend-developer-haulio-b4e52c6a5536aea735f9e71ca74bf5a0
/job/java-developer-sopra-steria-asia-3d14d850e16d3f913b526913b2ee417d
/job/machine-learning-engi

/job/software-developer-manager-4918045d35b384324cc098e7818c1288
/job/net-software-developer-hitec-sourcing-d74bded045ec9cf97fd7cd46eea199f6
/job/senior-software-developer-capita-78ea71745829b38b874b524c887a4ec9
/job/senior-software-developer-primalux-technology-c525d4e6ccd4f6a3df986c8fe66054d6
/job/java-software-developer-payments-application-associate-jpmorgan-chase-bank-na-b7809be76d686c1ce8e5e9c519df29bd
/job/software-developer-mindteck-singapore-925ca7b1af7086debe7e336bfbe24194
/job/application-developer-net-international-application-solutions-dc9e00685a0eb7873dca0f62a44fcd4d
/job/software-engineer-kanepi-1b23658a7569e02f0c6e0f5263b57e3c
/job/backend-developer-innosparks-d905e7d28cf6e2e43b7af3f88deb4f84
/job/senior-fullstack-developer-appvantage-18f76b9dc30f5447cefc6af6412b6031
/job/java-developer-encora-technologies-b8a28a5340f88a66bbad0c96136a2fb4
/job/full-stack-developer-perm-banking-david-goliath-9620a4132218802f18d895839d7c7758
/job/software-engineer-orion-systems-integratio

/job/software-engineer-ajax-mysql-c-html-css-javascript-supreme-hr-advisory-e7e9bc2f0dd994bb79b83b81f9bedee2
/job/technical-business-analyst-wirecard-asia-holding-7b75ca1ca448acc37c36f129836e03de
/job/mobile-developer-26d3dfad291b99f9ada1c6b2a3cbfcb9
/job/mobile-developer-f3f13c6010b5bf0388b4517a11cc1523
/job/software-developer-alphagrep-b162634f4eb3e09c00bc963630f27d14
/job/junior-software-developer-global-blue-service-company-singapore-5c1ffc38d3bbb4414be3698405278eab
/job/software-developer-asiacloud-solutions-7668897f73291788ea7dfb1e822edefd
/job/senior-software-developer-apar-technologies-9126334b4f93b208c1ea1e86b0502988
/job/engineer-software-agoda-company-ee4e660e907136cb1bc94f9981dc25bb
/job/software-engineer-infogain-solutions-cd8e390646aab49a1dffd1150fe41561
/job/software-engineer-cargo-community-network-90ae097da7dab3d378894e51473fc984
/job/software-engineer-toss-ex-e6b2102f46352b83556bef7481f92505
/job/senior-engineer-software-agoda-company-3b4846ea57791ce2d860ebdf81c2729d


/job/project-manager-tbwa-singapore-91f497458995c49834d896f574b0bb6f
/job/project-manager-lourdes-gavin-563c3862f08cd8c8a1b8f30117f7a96b
/job/project-manager-innocellence-systems-6b14c9330c0e61e39aa15212d2778d2e
/job/site-project-manager-calibre-consulting-2ee03e03d618d726e5398a321f23ab11
/job/project-manager-energizer-singapore-5a67049d9fea3efd1cda7237b1a9e2d1
/job/assistant-project-manager-spacelogic-80a5b6705dfc9ad3d6262ba209942a52
/job/assistant-project-manager-econ-geotech-59b4e35736925ac35ffadde6033207f8
/job/solar-project-manager-sunpro-energies-2ae8d78169c08c70406326efa882bf83
/job/project-manager-wilson-associates-0c1bb0d8a232f6db784508065785acf8
/job/project-manager-four-media-7a6a41e29918ad9e11cf9b1fc736a934
/job/sap-sd-mm-project-manager-3i-infotech-asia-pacific-4767ebf20c197e8cc7d38756aa133857
/job/project-manager-saksoft-ee50f6e8e7c6e3e3553fab0195ede28c
/job/project-manager-rj-crocker-consultants-0eecc30be9b33ed9e4c95a20d35a9cb3
/job/events-manager-x3-gmp-recruitment-serv

/job/business-intelligence-project-manager-ninja-logistics-2abca6a0e3f40defff5a053cc0e80826
/job/project-sales-manager--target-recruitment-e4f4473398960c474f5e0591390c471e
/job/project-engineer-manager-durapower-holdings-b7c92b2f845b995cddb7774ef6186ed2
/job/practice-lead-project-management-swift-terminal-services-6305ca2f3bc05f239f3aabce7fc02a8a
/job/bim-manager-hne-consultants-ede9d4f16c43a6b416bc411c9e346d8b
/job/planner-straits-construction-singapore-19740b2ca45f9a7f4c0563a1d8ff2c46
/job/site-engineer-china-railway-tunnel-group-co-c1e42ac1801a4e5e305e0fb93c2da9ea
/job/project-engineer-construction-industry-flintex-consulting-6cb349ac7930a387e5f7751e007e6739
/job/microbiologist-glaxosmithkline-f03b8164dbbd6cc303150096083b7768
/job/operations-specialist-novogeneait-genomics-singapore-3a6b79535e07b31615eac94e24a62669
/job/senior-system-analyst-oaktree-consulting-3fba5c8f393f35e769320698678809bc
/job/administrative-assistant-zara-isoteam-710a8f8ae75f9522c2404274235f2699
/job/required-s

/job/project-manager-property-casualty-asia-pacific-chubb-asia-pacific-4d6bc5c7e179ce412c555657d6f60e49
/job/project-manager-capgemini-singapore-82e5c7f96731c649be458de6f1a11779
/job/sap-project-manager-icon-consulting-group-7b25859a9f253cdd43c253a450a47f40
/job/project-coordinator-project-manager-supreme-hr-advisory-20760ca48dd9d3e73cbc967610d9f929
/job/sap-project-manager-vui-systems-ae62ba895fd4d56fbc567d0600be405c
/job/senior-project-manager-linkedin-singapore-c4185b906d0349a2a98c7f563f93b7ff
/job/senior-project-manager-xm-asia-pacific-08dc04b2d4de11d27bd22454147a4559
/job/project-manager-sats-43a4258f115e9564e1c21b523ad22f33
/job/required-process-manager-project-manager-itil-path-infotech-9aa50fb7436eba800d063165a0105999
/job/project-manager-7a449f2efa45677329a5836a0573bf9f
/job/project-manager-business-analyst-data-processes-antaes-asia-e0566ece2fcddd2c37fca036f10c6ddb
/job/vp-analytics-project-manager-dbs-transformation-group-to-dbs-bank-c5249dadfaa3fb17f3159b965637a8f3
/job/pro

/job/database-administrator-starhub-28dbf59120f098ed839bf6b3ef933d85
/job/ui-ux-designer-quinnox-solutions-9e522f99fc976280d5022983944353a2
/job/scrum-master-intver-global-consulting-a5721bfd4dcbef839333bd24a84c9c49
/job/project-manager-36321d9a04ef09359baa3c1ceeff29e4
/job/assistant-project-manager-4d01e9de96e583e18f0478b7aedac0b1
/job/business-analyst-project-manager-data-science-2fe50cb7a17176b9cfb57102153837de
/job/project-engineer-manager-17b4d73745dfd0e7571137bc096579eb
/job/safety-officer-eco-667557256d2466c803d984059f030292
/job/structural-supervisor-87ff0765237649945afcce00e05a459a
/job/resident-project-manager-air-energi-group-singapore-7754483b1db2f65b0e257a58e8334821
/job/associate-project-manager-abb-fce8dde5055d1535b3b10172b9ef16c2
/job/technical-project-manager-tocco-studios-2117ede4a90a10e93200aa778be29208
/job/project-manager-stone-forest-f6da2964e3d0f6f4da300e947427c9e8
/job/client-project-manager-global-collect-services-asia-pacific-e55541e3f1366a34deb5ed41b6edfbaf
/

/job/shore-coordinator-smit-singapore-66effcac052b6f86c8dfe73d900f8f08
/job/site-engineer-glory-construction-5e89991aaa0ddcbfe0f7c3006d68bec9
/job/solution-architect-upper-spring-consulting-59ba6761a4ee9cf567300edf6f90b591
/job/manager-mergers-acquisitions-integrations-sats-df941ecb2b85ada6a54ff7923d135cea
/job/sales-support-representative-expereo-singapore-fd069ce91c0d18bcccd4f6ab910b2d21
/job/software-technical-manager-activate-interactive-8917c00b1532cd73541f759f58e3b156
/job/senior-associate-data-analytics-advisory-ernst-young-advisory-e69441151b61f82e1365e2cbeac67a39
/job/senior-software-engineers-thatz-international-6db29cb219b441c384d4ee3cc61f6a82
/job/sales-designer-search-personnel-9543b3432b0cb5c5b32c1c45c00b045c
/job/architectural-id-coordinator-hon-builder-a1be2422fff8d9e9d27f6a19b105d7ea
/job/senior-systems-project-manager-113bfe394a5c6d33e5fef41406139415
/job/site-manager-cdbb58ebd613ad9c8ff43e962e9d7036
/job/project-manager-zhengda-corporation-933f7d231e9b4efc34f4520b45c

/job/site-project-manager-89e95c495f29273fb84c7e611b0c30e9
/job/project-manager-93adddc93d91557e1c7103e797db9131
/job/manager-services-delivery-c839532ba96dd5c446d9cac6c810a57f
/job/project-engineer-1274d916881d99e4689cea846562bb2f
/job/site-safety-health-officer-e97f76d173b3bbc2a2334239f2a28e56
/job/software-developers-ff717d45a2ddc75c26ea03ede66f26e5
/job/senior-civil-engineer-f477a9dd8c767ddad8fbfbe9d116f736
/job/senior-project-manager-appvantage-1a1729f4338057b0c6306f47d4b4f1cf
/job/pmo-jr-project-manager-infogain-solutions-4bee4f352339c1b35b99f5e33cf1d29e
/job/pmo-jr-project-manager-infogain-solutions-fe7011b6730ce95bec2a16674fb326b6
/job/project-manager-scb-building-construction-5601a4cf5be35383b2289f70dd975b54
/job/project-manager-rh-synergy-ab8f17faee27250cce15d90992ed7cc8
/job/project-manager-logicalis-singapore-abd3d1d5370ebeaba4b2eaeb998254c2
/job/project-manager-hays-specialist-recruitment-d65243f3a454eab7011a3261b5ac89e8
/job/project-manager-indonesia-south-east-asia-ceva-

/job/project-supervisor-engineer-manager-c2-system-a185324c5bdedceb623a1c05f04f23a6
/job/manager-building-construction-authority-75833954480a4fdee15246ff722595dd
/job/project-engineer-alpine-engineering-services-9bfd3818d2d4049d07d642c79166add3
/job/construction-mgrs-blue-barrel-fe33a6cd5805779dbf29288021dcfc97
/job/project-engineer-alpine-engineering-services-27bdd89ce60aba855029b70ddf8fd0ff
/job/project-engineer-alpine-engineering-services-2b4466086015069d8751aa5a4f443257
/job/site-engineer-chartworth-enterprise-singapore-e61b6eceebfb07ff7848ec0499937a71
/job/planning-engineer-ed-zublin-ag-singapore-branch-270d2138388ded276b4d0b6181c9a212
/job/senior-project-executive-singapore-semiconductor-industry-association-5fe16c466e6366f44405610223c57269
/job/project-engineer-dae118b5ff3b084ebdbcf80bf339f118
/job/site-manager-ckr-contract-services-e23d761e3babaca5fd8691bacfd6dcdb
/job/costing-engineer-jabil-circuit-cb0030124b8c6b2b6aea3f4955fbde2c
/job/automation-developer-bot-developer-keyrep

/job/accountant-hebei-jinbiao-construction-materials-25efdd38bde0fe30a026b88c324cd652
/job/accountant-j-b-boda-62f08de56540044e05788f595b3972b9
/job/senior-fund-accountant-bluechip-platforms-asia-14f9739489dbb99110c2d601be0b0e1f
/job/accountant-anixter-singapore-7f51857149662715f582918d04484099
/job/assistant-accountant-anixter-singapore-5e073d0376da68b6c244f156809673fd
/job/accountant-baxter-healthcare-c45ff078d81f8a1b35cb38174b56048a
/job/senior-accountant-halcyon-agri-corporation-e49cbf83cce535812369aa702cda4506
/job/accountant-wilhelmsen-ships-service-8d8391d22c149ef82ff6bc87e904c8f2
/job/accountant-64cc1a8daa88b21eec2cc0575d66aee2
/job/accountant-green-earth-international-d9889ef66010ab6e43c5026996b7eb2e
/job/accountant-banyan-tree-hotels-resorts-5f91ac2ca13196705068dbb221f15a4e
/job/project-accountant-hometeamns-f11dbb0b9897ae1f8443c16978754d7b
/job/technical-accountant-candidateasia-group-c67bcfab85b8dbf2cf6eaefce561102a
/job/accountant-rf360-singapore-5a4da758cc745c50034c82fbd3

/job/external-auditor-crowe-horwath-first-trust-3381c0cb767992c7eb5d0dbfd60881e0
/job/accounts-assistant-ceramique-aesthetics-a89cdc38996f263eadad92f4e98659b2
/job/accountant-a25d02073643748fe29433ad078c7101
/job/financial-accountant-merit-medical-singapore-d1a60974df1bf253d38ee02411751a3b
/job/accountant-tts-eurocars-f2ae5396749aad2ad8f7f4816aa4ba20
/job/accountant-hai-di-lao-holdings-bfafbf198f5b4e6a53d31c0fe54c801a
/job/accountant-zhengda-corporation-d1e7501ccd011ebf7e7de92e4327eb12
/job/deputy-head-accountant-generals-department-510002ad37434c4d078d5bdcd667c229
/job/asst-manager-financial-accounting-skillsforce-management-consultancy-54631deb3984577492be772bb999d90a
/job/finance-manager-firma-technologies-217ee2270f594efc75d035df10ca4ebd
/job/senior-accounts-executive-tidy-maintenance-engineering-2e2a7fc7c1838bbbaf6871860f5f599a
/job/accountant-esta-d1317516bd2d87498a7365c7cdcac2ca
/job/accountant-ibc-asia-deecf4ae9e109b1732cb427d575df880
/job/accountant-ibc-asia-da25431cf07fe4c161

In [4]:
print(len(job_links))

2100


In [5]:
job_links[:4]

[['data scientist',
  '/job/data-scientist-singapore-power-5731c18be188d7d1f80a840a834ad207'],
 ['data scientist',
  '/job/senior-data-scientist-singapore-power-b8f943fb05cb5ef1ec504e530dc425f3'],
 ['data scientist',
  '/job/data-scientist-spotify-singapore-1801998abb817910b9acb022c3b636dc'],
 ['data scientist',
  '/job/senior-data-scientist-lynx-analytics-702973c3a33b2b7a251a49cc10a66b8b']]

In [6]:
job_links[0][1]

'/job/data-scientist-singapore-power-5731c18be188d7d1f80a840a834ad207'

In [7]:
job_url_front = 'https://www.mycareersfuture.sg'
data_store = []
for job_link in job_links:
    browser = webdriver.Chrome('./chromedriver')    
    browser.get(job_url_front+job_link[1])
    sleep(3)
    soup = BeautifulSoup(browser.page_source)
    job_title = soup.find('h1', attrs={'id': 'job_title'})
    try:
        job_title = job_title.text
    except:
        pass
    job_company_name = soup.find('p', attrs={'name':'company'})
    try:
        job_company_name = job_company_name.text
    except:
        pass
    job_categories = soup.find('p', attrs={'id':'job-categories'})
    try:
        job_categories = job_categories.text
    except:
        pass
    job_location = soup.find('a', attrs={'href':'#location_map'})
    try:
        job_location = job_location.text
    except:
        pass
    job_employment_type = soup.find('p', attrs={'id':'employment_type'})
    try:
        job_employment_type = job_employment_type.text
    except:
        pass
    job_seniority = soup.find('p', attrs={'id':'seniority'})
    try:
        job_seniority = job_seniority.text
    except:
        pass
    job_last_posted_date = soup.find('span', attrs={'id':'last_posted_date'})
    try:
        job_last_posted_date = job_last_posted_date.text
    except:
        pass
    job_expiry_date = soup.find('span', attrs={'id':'expiry_date'})
    try:
        job_expiry_date = job_expiry_date.text
    except:
        pass
    job_description = soup.find('div', attrs={'id':'description-content'})
    try:
        job_description = job_description.text
    except:
        pass
    job_company_name = soup.find('p', attrs={'name':'company'})
    try:
        job_company_name = job_company_name.text
    except:
        pass
    job_company_info = soup.find('div', attrs={'data-cy':'companyinfo-writeup'})
    try:
        job_company_info = job_company_info.text
    except:
        pass
    job_requirement = soup.find('div', attrs={'id':'requirements-content'})
    try:
        job_requirement = job_requirement.text
    except:
        pass

    # get salary
    salary_range = soup.find('span', attrs={'class':'salary_range dib f2-5 fw6 black-80'})
    
    min_salary = None
    max_salary = None
    try:
        salary_range = salary_range.text
        min_salary, max_salary = salary_range.replace('$','').replace(',','').split('to') # need to convert to int
        min_salary = int(min_salary)
        max_salary = int(max_salary)

        salary_type = False
        salary_type = soup.find('div', attrs={'class':'salary tr-l'}).text

        print('salary_type:',len(salary_type.split('Annual'))>1)
        if len(salary_type.split('Annual'))>1:
            min_salary = min_salary/12
            max_salary = max_salary/12
    except:
        pass

    # search_term, job_url, job_title, job_categories, job_location, job_employment_type, job_seniority,0
    # job_last_posted_date, job_expiry_date, job_description, job_company_name, job_company_info, job_requirement, min_salary, max_salary
    data_per_job = [
        job_link[0], 
        job_link[1], 
        job_title, 
        job_categories, 
        job_location, 
        job_employment_type, 
        job_seniority, 
        job_last_posted_date, 
        job_expiry_date,
        job_description,
        job_company_name, 
        job_company_info,
        job_requirement,
        min_salary, 
        max_salary
    ]
#     break
    data_store.append(data_per_job)
    browser.close()

salary_type: False
salary_type: False
salary_type: True
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: True
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: True
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: Fa

salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: True
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: True
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: F

salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: True
salary_type: True
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: F

salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: True
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: 

salary_type: False
salary_type: False
salary_type: False
salary_type: True
salary_type: False
salary_type: False
salary_type: False
salary_type: True
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: False
salary_type: F

In [8]:
len(data_store)

2100

In [9]:
import pandas as pd
column_names = ['search_term', 'job_url', 'job_title', 'job_categories', 'job_location', 'job_employment_type', 'job_seniority','job_last_posted_date', 'job_expiry_date', 'job_description', 'job_company_name', 'job_company_info', 'job_requirement', 'min_salary', 'max_salary']
df = pd.DataFrame(data_store, columns=column_names)
df.head()

Unnamed: 0,search_term,job_url,job_title,job_categories,job_location,job_employment_type,job_seniority,job_last_posted_date,job_expiry_date,job_description,job_company_name,job_company_info,job_requirement,min_salary,max_salary
0,data scientist,/job/data-scientist-singapore-power-5731c18be1...,Data Scientist,Information Technology,"SP GROUP BUILDING, 2 KALLANG SECTOR 349277",Full Time,Professional,Posted 28 Jan 2019,Closing on 27 Feb 2019,Why Work for Us We Power the Nation. Make the...,SINGAPORE POWER LIMITED,SINGAPORE POWER LIMITED\nA leading energy util...,What You'll Need We are looking for Passion an...,4000.0,8000.0
1,data scientist,/job/senior-data-scientist-singapore-power-b8f...,Senior Data Scientist,Information Technology,"SP GROUP BUILDING, 2 KALLANG SECTOR 349277",Full Time,Middle Management,Posted 28 Jan 2019,Closing on 27 Feb 2019,Why Work for Us We Power the Nation. Make the...,SINGAPORE POWER LIMITED,SINGAPORE POWER LIMITED\nA leading energy util...,What You'll Need We are looking for Passion an...,8000.0,14000.0
2,data scientist,/job/data-scientist-spotify-singapore-1801998a...,Data Scientist,Others,"MARINA BAY FINANCIAL CENTRE, 8 MARINA BOULEVAR...",Permanent,Executive,Posted 28 Jan 2019,Closing on 27 Feb 2019,We seek an outstanding Data Scientist to join ...,SPOTIFY SINGAPORE PTE. LTD.,"At Spotify, we’re passionate about providing t...",Who you are Degree in Computer Science/Engine...,7500.0,9166.666667
3,data scientist,/job/senior-data-scientist-lynx-analytics-7029...,Senior Data Scientist,Information Technology,"PRUDENTIAL TOWER, 30 CECIL STREET 049712",Full Time,Senior Executive,Posted 28 Jan 2019,Closing on 27 Feb 2019,"Reporting to the CIO, this role in an integral...",LYNX ANALYTICS PTE. LTD.,\n\tLynx Analytics is a predictive analytics o...,Requirements ● Industry experience in data ...,7000.0,9500.0
4,data scientist,/job/data-scientist-information-technology-nom...,Data Scientist - Information Technology,Banking and Finance,"MARINA BAY FINANCIAL CENTRE, 10 MARINA BOULEVA...",Full Time,Professional,Posted 28 Jan 2019,Closing on 27 Feb 2019,Nomura Overview Nomura is an Asia-based fina...,NOMURA SINGAPORE LIMITED,Nomura is a leading financial services group a...,Key Experience & Skills Strong academic back...,20800.0,33000.0


In [10]:
df.to_csv('data_adhoc.csv')

In [11]:
df[df['min_salary'] > 100000]

Unnamed: 0,search_term,job_url,job_title,job_categories,job_location,job_employment_type,job_seniority,job_last_posted_date,job_expiry_date,job_description,job_company_name,job_company_info,job_requirement,min_salary,max_salary
