# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Project 4: Web Scraping Job Postings

## Business Case Overview

You're working as a data scientist for a contracting firm that's rapidly expanding. Now that they have their most valuable employee (you!), they need to leverage data to win more contracts. Your firm offers technology and scientific solutions and wants to be competitive in the hiring market. Your principal has two main objectives:

   1. Determine the industry factors that are most important in predicting the salary amounts for these data.
   2. Determine the factors that distinguish job categories and titles from each other. For example, can required skills accurately predict job title?

To limit the scope, your principal has suggested that you *focus on data-related job postings*, e.g. data scientist, data analyst, research scientist, business intelligence, and any others you might think of. You may also want to decrease the scope by *limiting your search to a single region.*

Hint: Aggregators like [Indeed.com](https://www.indeed.com) regularly pool job postings from a variety of markets and industries. 

**Goal:** Scrape your own data from a job aggregation tool like Indeed.com in order to collect the data to best answer these two questions.


### Importing the necessary libraries

In [1]:
#importing libraries for scraping
import pandas as pd
from bs4 import BeautifulSoup
import urllib
import os
from selenium import webdriver
import requests
import numpy as np
from time import sleep

### Scraping from Indeed.com.sg  
#### Defining a function to scrape data from a single job posting.

In [2]:
#The below function scrapes from the individual job pages
def onepage(link):
    #partial link expected
    joburl = 'https://www.indeed.com.sg'+link 
    jobhtml = requests.get(joburl).text
    jobsoup = BeautifulSoup(jobhtml, 'html.parser')
    #print(joburl)

    #Job title is in the header directly. Easily found.
    title = jobsoup.find('h3', {'class':'icl-u-xs-mb--xs icl-u-xs-mt--none jobsearch-JobInfoHeader-title'}).text
    #print(title)

    #Company and location has the same class. Meta data is hidden within. Will remove meta, grab both and split by the dash.
    #Sample below
    '''
    <div class="jobsearch-InlineCompanyRating icl-u-xs-mt--xs icl-u-xs-mb--md">
        <div class="icl-u-lg-mr--sm icl-u-xs-mr--xs">
         Rakuten Asia Pte Ltd
        </div>
        <div class="icl-u-lg-mr--sm icl-u-xs-mr--xs">
         -
        </div>
        <div>
         Singapore
    '''
    companylocationmeta = jobsoup.find('div',{'class':'jobsearch-InlineCompanyRating icl-u-xs-mt--xs icl-u-xs-mb--md'})

    #extracting meta info if present
    meta = companylocationmeta('meta')
    
    #Meta info may contain ratings. Get company rating and rating count (if any), else return NaN
    try:
        rating = meta[0].find('a')['aria-label']
        ratingCount = meta[0].find('meta',{'itemprop':"ratingCount"})['content']
    except:
        rating = np.nan
        ratingCount = np.nan
    #print(rating)
    #print(ratingCount)
    
    #Get company and location info
    for element in companylocationmeta(['meta']):
        element.extract()    
  
    companyandlocation = companylocationmeta.text.split('-')
    company = companyandlocation[0]
    location = companyandlocation[1]

    #Get job salary and job type (if any), else return NaN
    try:
        salaryjobtype = jobsoup.findAll('div',{'class':'jobsearch-JobMetadataHeader icl-u-xs-mb--md'}).pop()
        salary = salaryjobtype.find('div',{'class':'jobsearch-JobMetadataHeader-item'}).text
        jobtype = salaryjobtype.find('div',{'class':'jobsearch-JobMetadataHeader-item icl-u-xs-mt--xs'}).text
    except:
        salary = np.nan
        jobtype = np.nan

    #Get job description
    description = jobsoup.find('div',{'class':'jobsearch-JobComponent-description icl-u-xs-mt--md'}).text

    return {'job_title':title, 'company':company, 'location':location, 'company_rating':rating, 'rating_count':ratingCount, 'salary':salary, 'job_type':jobtype, 'description':description, 'url':joburl}

#### Writing nested for loops, for looping through each job posting per page, inside a bigger loop for all pages.

In [32]:
#luckily, Indeed limits search results to 100 pages.
pages = list(range(0,991,10))

In [33]:
#Iterate through the 100 pages, getting the data for each page and appending to a DataFrame
#Defining empty DataFrame with features
df = pd.DataFrame(columns=['job_title', 'company', 'location', 'company_rating', 'rating_count', 'salary', 'job_type', 'description', 'url'])
pagecount = 0
for page in pages:
    #The below url will update with the next page number for every iteration. This should loop 100 times.
    url = 'https://www.indeed.com.sg/jobs?q=data+science&l=singapore&start='+str(page) #search results for keyword 'data science' in Singapore region.
    html = requests.get(url).text
    soup = BeautifulSoup(html, 'html.parser')
    
    #For this page, we shall retrieve the links for all the job postings and put them in a list
    joblinks = []
    for job in soup.findAll('a',{'data-tn-element':'jobTitle'}):
        joblinks.append(job['href'])
        
    #Iterating through individual job postings in each single page of search results.
    jobcount=0
    for link in joblinks:
        try:
            df = df.append(onepage(link), ignore_index=True)
            jobcount+=1
        except:
            pass
    pagecount+=1
    print(jobcount, 'jobs on page', pagecount, 'successful.')
    #Probably a good idea to wait 1 sec before moving to next page.
    sleep(1)
    
print(len(df),'entries successfully scraped.')
df.to_csv('./jobsdf.csv')

15 jobs on page 1 successful.
15 jobs on page 2 successful.
15 jobs on page 3 successful.
15 jobs on page 4 successful.
15 jobs on page 5 successful.
15 jobs on page 6 successful.
15 jobs on page 7 successful.
15 jobs on page 8 successful.
15 jobs on page 9 successful.
14 jobs on page 10 successful.
15 jobs on page 11 successful.
15 jobs on page 12 successful.
14 jobs on page 13 successful.
15 jobs on page 14 successful.
15 jobs on page 15 successful.
14 jobs on page 16 successful.
14 jobs on page 17 successful.
14 jobs on page 18 successful.
15 jobs on page 19 successful.
15 jobs on page 20 successful.
14 jobs on page 21 successful.
15 jobs on page 22 successful.
15 jobs on page 23 successful.
14 jobs on page 24 successful.
14 jobs on page 25 successful.
14 jobs on page 26 successful.
14 jobs on page 27 successful.
14 jobs on page 28 successful.
15 jobs on page 29 successful.
15 jobs on page 30 successful.
15 jobs on page 31 successful.
14 jobs on page 32 successful.
14 jobs on page 3

#### Checking the DataFrame

In [34]:
#checking the DataFrame
df.sample(10)

Unnamed: 0,job_title,company,location,company_rating,rating_count,salary,job_type,description,url
404,Senior Data Science Developer,DEX Pte Ltd,Singapore,,,,,Do you enjoy leading Agile/iterative product d...,https://www.indeed.com.sg/company/DEX-Pte-Ltd/...
1376,Backend Server Developer,SIMONE STUDIOS PTE LTD,Singapore,,,,,Job DescriptionWe are looking for a Back-End D...,https://www.indeed.com.sg/company/SIMONE-STUDI...
894,"Senior Consultant, Forensic Technology (Singap...","FTI Consulting, Inc.",Singapore,,,,,"Company Background\n\nFTI Consulting, Inc. is ...",https://www.indeed.com.sg/pagead/clk?mo=r&ad=-...
518,"Manager, Data Management",AIA,Singapore,,,,,"Manager, Data Management\n\nThe Data Manager (...",https://www.indeed.com.sg/rc/clk?jk=e20fb79773...
1258,Economy Services Senior Associate,Grab Taxi,Bishan New Town,4.1 out of 5,101.0,,,"JOB DESCRIPTION\n\nAt Grab, our mission is to ...",https://www.indeed.com.sg/rc/clk?jk=18ed761112...
369,Researcher,NANYANG TECHNOLOGICAL UNIVERSITY,Singapore,4 out of 5,104.0,Contract,Contract,ContractRoles & Responsibilities\nThe Centre f...,https://www.indeed.com.sg/rc/clk?jk=8e5f6efbea...
218,Staff Data Scientist,Seagate Technology,Singapore,3.8 out of 5,547.0,,,The Analytics Business Solutions team from Ope...,https://www.indeed.com.sg/rc/clk?jk=bb9779b5bb...
808,"Analyst, Customer Delivery",MasterCard,Singapore,4.1 out of 5,472.0,,,"Analyst, Customer Delivery will play the suppo...",https://www.indeed.com.sg/rc/clk?jk=3391c249bc...
1388,"Senior Consultant, Forensic Technology (Singap...","FTI Consulting, Inc.",Singapore,,,,,"Company Background\n\nFTI Consulting, Inc. is ...",https://www.indeed.com.sg/pagead/clk?mo=r&ad=-...
671,"Data Scientist, Advisory, Performance Improvem...",EY,Singapore,4 out of 5,5621.0,,,Data Scientist\n\n\nEY Data and Analytics is t...,https://www.indeed.com.sg/rc/clk?jk=f8917c168f...


In [35]:
#checking to see how many jobs have salary info
df.loc[df['salary'].isnull() == False]
#This data is extremely unclean!! Wrong data recorded under 'salary'

Unnamed: 0,job_title,company,location,company_rating,rating_count,salary,job_type,description,url
0,"Internship - Associate Researcher – Science, E...",Procter and Gamble,Singapore,4.2 out of 5,4776,Internship,Internship,InternshipKeen to work for one of the top 3 Em...,https://www.indeed.com.sg/pagead/clk?mo=r&ad=-...
3,College Intern - Data Science,HP,Singapore,4 out of 5,10791,Internship,Internship,InternshipHP is the world’s leading personal s...,https://www.indeed.com.sg/rc/clk?jk=ccbd058d8d...
4,"Intern, Data & Innovation (January - May 2019)",AXA,Singapore,3.8 out of 5,761,Internship,Internship,InternshipThe AXA Group is a worldwide leader ...,https://www.indeed.com.sg/rc/clk?jk=f290e7fac9...
7,College Intern – Data Analytics,HP,Singapore,4 out of 5,10791,Internship,Internship,InternshipHP is the world’s leading personal s...,https://www.indeed.com.sg/rc/clk?jk=3dcb2e7899...
10,Data Technologist,Rakuten Asia Pte Ltd,Singapore,,,"$5,000 - $8,000 a month",Permanent,"$5,000 - $8,000 a monthPermanentResponsibiliti...",https://www.indeed.com.sg/company/Rakuten-Asia...
11,Data Management Intern,IQVIA,Singapore,3.7 out of 5,138,Internship,Internship,InternshipIQVIA™ is The Human Data Science Com...,https://www.indeed.com.sg/rc/clk?jk=c9a5e38a86...
12,College Intern - Data Analyst,HP,Singapore,4 out of 5,10791,Internship,Internship,InternshipHP is the world’s leading personal s...,https://www.indeed.com.sg/rc/clk?jk=474023d15a...
16,"Internship - Associate Researcher – Science, E...",Procter and Gamble,Singapore,4.2 out of 5,4776,Internship,Internship,InternshipKeen to work for one of the top 3 Em...,https://www.indeed.com.sg/pagead/clk?mo=r&ad=-...
18,Data Technologist,Rakuten Asia Pte Ltd,Singapore,,,"$5,000 - $8,000 a month",Permanent,"$5,000 - $8,000 a monthPermanentResponsibiliti...",https://www.indeed.com.sg/company/Rakuten-Asia...
20,"Data Scientist Intern, Real World Insights",IQVIA,Singapore,3.7 out of 5,138,Internship,Internship,InternshipIQVIA™ is The Human Data Science Com...,https://www.indeed.com.sg/rc/clk?jk=4342615346...


In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1400 entries, 0 to 1399
Data columns (total 9 columns):
job_title         1400 non-null object
company           1400 non-null object
location          1400 non-null object
company_rating    800 non-null object
rating_count      800 non-null object
salary            344 non-null object
job_type          344 non-null object
description       1400 non-null object
url               1400 non-null object
dtypes: object(9)
memory usage: 98.5+ KB


In [3]:
#Scraping from Indeed.com
url = 'https://www.indeed.com.sg/jobs?q=data+science&l=singapore&start=0' #search results for keyword 'data science' in Singapore region.
response = requests.get(url)
print(response.status_code)

200


In [4]:
html = response.text

### Parsing into beautiful soup

In [5]:
soup = BeautifulSoup(html, 'html.parser')

In [6]:
print(soup.prettify())

<!DOCTYPE html>
<html dir="ltr" lang="en">
 <head>
  <meta content="text/html;charset=utf-8" http-equiv="content-type"/>
  <script src="//d3fw5vlhllyvee.cloudfront.net/s/292c549/en_SG.js" type="text/javascript">
  </script>
  <link href="//d3fw5vlhllyvee.cloudfront.net/s/970d98c/jobsearch_all.css" rel="stylesheet" type="text/css"/>
  <link href="http://www.indeed.com.sg/rss?q=data+science&amp;l=singapore" rel="alternate" title="Data Science Jobs, careers in Singapore" type="application/rss+xml"/>
  <link href="/m/jobs?q=data+science&amp;l=singapore" media="only screen and (max-width: 640px)" rel="alternate"/>
  <link href="/m/jobs?q=data+science&amp;l=singapore" media="handheld" rel="alternate"/>
  <script type="text/javascript">
   if (typeof window['closureReadyCallbacks'] == 'undefined') {
        window['closureReadyCallbacks'] = [];
    }

    function call_when_jsall_loaded(cb) {
        if (window['closureReady']) {
            cb();
        } else {
            window['closureR

In [9]:
soup.find('a',{'target':'_blank'})

<a class="jobtitle turnstileLink" data-tn-element="jobTitle" href="/pagead/clk?mo=r&amp;ad=-6NYlbfkN0Dv_aZmso27iWtM_bVWXES_0lLikeUoghrn2NiLI-rOnZfvU1s2PQaBSOSqOULZnveZBBNWkWO6BYy0kk6GwTP4Pi7YkkfITPtEmeKlWvXEiRQREpWiKZTmbeArJssR6wBLdyVJdaCjnERkh3sce2ieX7rxJOL_MSy7_NEG3LwRNQYL6HEhEzKMDrSKjpD1OrCqkZAJ8PxMGErShLrXf5K2XuGL-7nIXt_BZqK00LnVKexV5Rif5mEDcMgzAeSzJDBxSbutAdg1wFH9wmYCDBwsk4zvNIvz4C74yJRgAcIYP6jYt0q2-qr1fb39EdjNkTt0cN-1e3cJ-a7cX7TeT9UjXgIeSU-O5UZxq3Q7Y3Yf4FWAlHrvrQQyBXtneJewRVKWp6rI4h3JSNIeWp2Y9w2Cc_4Xmx6dRC4=&amp;vjs=3&amp;p=1&amp;sk=&amp;fvj=0" id="sja1" onclick="setRefineByCookie([]); sjoc('sja1',0); convCtr('SJ')" onmousedown="sjomd('sja1'); clk('sja1');" rel="noopener nofollow" target="_blank" title="Internship - Associate Researcher – Science, Engineering or IT fields">Internship - Associate Researcher – <b>Science</b>, Engineering or...</a>

In [None]:
joblist = []
for n in soup.findAll('a',{'data-tn-element':'jobTitle'}):
    joblist.append(n['title'])
joblist

In [None]:
joblinks = []
for n in soup.findAll('a',{'data-tn-element':'jobTitle'}):
    joblinks.append(n['href'])
joblinks

chromedriver = './chromedriver/chromedriver.exe'
os.environ["webdriver.chrome.driver"] = chromedriver

driver = webdriver.Chrome(chromedriver)
driver.get('https://www.indeed.com.sg/pagead/clk?mo=r&ad=-6NYlbfkN0Ap4a4VXGcDnXwDcGiByOShHL5ovIsFElHTxgiYVXc3tN1VmZVroO8UOtAuwIE8r1xk9hlVj6DxKhWWpD_lyuwa64ksAKZyvqgAQjwqjECL46cIXkaDgF3qYjmU38bTMJ5KldYal9c5MYJhgBLebRZBjD1ld2yQGbXWo6ugr7GpJOmCIE9HfJmvBdkR9Pb01kF3xtjifSoDVxW0eM3mgVbaCFrNFXCm3_1SXHwdBzYUen0UWtcbdMQzGdOkhzrCqn-nICH0Y6txwUraqn-ky_7_fPKoD9WcG8-s15HTCTl5oQd-ryUAIHzn2TY9jd6m4J9Qf3Bg4kwnKsm9_y0fFEo6UAXydH6EuoipZVz4dJ_5Qp015EWq9KtmQSQhRFpDD98J3uUB3sqoSRJ68A2HGQVKIPu8DI8NT1CIv3__9zKboYLdrSiovCo2iys7BDp90M1LmEPqtflTCmiohqgLsc1YbIux9KO1MrY=&vjs=3&p=1&sk=&fvj=0')

In [None]:
#testing
job4url ='https://www.indeed.com.sg/pagead/clk?mo=r&ad=-6NYlbfkN0Dv_aZmso27iWtM_bVWXES_0lLikeUoghrn2NiLI-rOnZfvU1s2PQaB5UnOWhvkxiXcI3Cyvq15orYiltOjS58zootaHsgxXkoFqVhQzSwJv8LZ4b0oOZHf8aQisO9IbYlDYxlEoHE8AY9XfxaAQ-f8EIvYRN2KgQ9I1xv8qURIV4lHESusVphmFsOY5OWrCD8mGSFLmZwultrffxtGj3hxVB_DQsSPURl9lFOb46HtGEq9iwWadKDgl9YKMaRKmSGK1zJUCGNs2nUNOLcl93veOeDZOnxgVKgSGyQ01GjMo7uG8t437Gm_Yl2ksd6bbhWATc0HBkJXTNiR4ZRlxSjLcC-ScSFTk7ALKjiRdxs5ZfhCQYbLVi7uEqscmQ2nMmFfdFvDvbEBJZk76cGrJG2M&vjs=3&p=2&sk=&fvj=0'
job4html = requests.get(job4url).text
job4soup = BeautifulSoup(job4html, 'html.parser')
job4soup.find('h3',{'class':'icl-u-xs-mb--xs icl-u-xs-mt--none jobsearch-JobInfoHeader-title'}).text

In [None]:
print(job4soup.prettify())

In [19]:
test = pd.DataFrame(columns=['one','two','three'])

In [20]:
test

Unnamed: 0,one,two,three


In [21]:
data = {'one':111, 'two':222, 'three':333}

In [22]:
data2 = {'one':121, 'two':232, 'three':343}

In [23]:
test = test.append(data, ignore_index=True)
test

Unnamed: 0,one,two,three
0,111,222,333


In [24]:
test = test.append(data2, ignore_index=True)
test

Unnamed: 0,one,two,three
0,111,222,333
1,121,232,343


In [4]:
df = pd.read_csv('./jobsdf.csv')

In [5]:
df = df.drop("Unnamed: 0", axis=1)

In [6]:
df

Unnamed: 0,job_title,company,location,company_rating,rating_count,salary,job_type,description,url
0,"Internship - Associate Researcher – Science, E...",Procter and Gamble,Singapore,4.2 out of 5,4776.0,Internship,Internship,InternshipKeen to work for one of the top 3 Em...,https://www.indeed.com.sg/pagead/clk?mo=r&ad=-...
1,Information Technology - Senior Data Scientist...,Procter and Gamble,Central Singapore,4.2 out of 5,4776.0,,,Data Scientists at P&G; creates algorithms / A...,https://www.indeed.com.sg/pagead/clk?mo=r&ad=-...
2,"Senior Consultant, Forensic Technology (Singap...","FTI Consulting, Inc.",Singapore,,,,,"Company Background\r\n\r\nFTI Consulting, Inc....",https://www.indeed.com.sg/pagead/clk?mo=r&ad=-...
3,College Intern - Data Science,HP,Singapore,4 out of 5,10791.0,Internship,Internship,InternshipHP is the world’s leading personal s...,https://www.indeed.com.sg/rc/clk?jk=ccbd058d8d...
4,"Intern, Data & Innovation (January - May 2019)",AXA,Singapore,3.8 out of 5,761.0,Internship,Internship,InternshipThe AXA Group is a worldwide leader ...,https://www.indeed.com.sg/rc/clk?jk=f290e7fac9...
5,"Machine Learning / Deep Learning Developer , S...",SAP,Singapore,4.3 out of 5,1786.0,,,Requisition ID: 191857\r\nWork Area: Software-...,https://www.indeed.com.sg/rc/clk?jk=6b1139220d...
6,Data Scientist,Apple,Singapore,4.2 out of 5,6305.0,,,"Summary\r\nPosted: Sep 27, 2018\r\nWeekly Hour...",https://www.indeed.com.sg/rc/clk?jk=9fa4596440...
7,College Intern – Data Analytics,HP,Singapore,4 out of 5,10791.0,Internship,Internship,InternshipHP is the world’s leading personal s...,https://www.indeed.com.sg/rc/clk?jk=3dcb2e7899...
8,Data Scientist,ADVANCE.AI,Singapore,,,,,ADVANCE.AI is a data-driven financial technolo...,https://www.indeed.com.sg/rc/clk?jk=f14476179a...
9,Data Analytics,JP Morgan Chase,Singapore,3.9 out of 5,21083.0,,,"As an experienced Data Analyst, your mission i...",https://www.indeed.com.sg/rc/clk?jk=5cfaf62f78...


In [7]:
df.loc[df.duplicated()]

Unnamed: 0,job_title,company,location,company_rating,rating_count,salary,job_type,description,url
18,Data Technologist,Rakuten Asia Pte Ltd,Singapore,,,"$5,000 - $8,000 a month",Permanent,"$5,000 - $8,000 a monthPermanentResponsibiliti...",https://www.indeed.com.sg/company/Rakuten-Asia...
23,College Intern - Data Analyst,HP,Singapore,4 out of 5,10791.0,Internship,Internship,InternshipHP is the world’s leading personal s...,https://www.indeed.com.sg/rc/clk?jk=474023d15a...
34,Data Scientist - Optimization - Singapore,Grab Taxi,Singapore,4.1 out of 5,101.0,,,Get to know our Team:\r\n\r\nGrab’s Data Scien...,https://www.indeed.com.sg/rc/clk?jk=e1f43fb1cb...
36,Data Scientist Intern,PropertyGuru,Singapore,,,Internship,Internship,"InternshipIn this intern program, you will be ...",https://www.indeed.com.sg/rc/clk?jk=87f96b1891...
48,Data Analyst,Dyson,Singapore,3.7 out of 5,231.0,,,We are currently seeking a Data Analyst to joi...,https://www.indeed.com.sg/rc/clk?jk=0b0b1980f9...
51,Data Management Specialist,Pulse Metrics,Singapore,,,,,The Role\r\nThe Data Engineer will be expected...,https://www.indeed.com.sg/rc/clk?jk=1e4d46bd6a...
79,Data Science Analyst,Accenture,Singapore,4 out of 5,14930.0,,,Build your career here\r\nDo you enjoy inspiri...,https://www.indeed.com.sg/rc/clk?jk=9555640c29...
80,"Software Engineer, University Graduate, 2019 S...",Google,Singapore,4.3 out of 5,2629.0,,,Google's software engineers develop the next-g...,https://www.indeed.com.sg/rc/clk?jk=3faadb5133...
92,"Internship - Associate Researcher – Science, E...",Procter & Gamble,Singapore,4.2 out of 5,4776.0,Internship,Internship,InternshipKeen to work for one of the top 3 Em...,https://www.indeed.com.sg/rc/clk?jk=f13174e8ce...
93,Data Scientist,Wego Pte Ltd,Singapore,,,,,As part of the Data Science and Analytics Team...,https://www.indeed.com.sg/rc/clk?jk=9d1ccab493...


In [8]:
usefuldf = df.loc[df['salary'].notna()]

In [17]:
usefuldf.loc[usefuldf['job_type']=='Temporary, Contract, Internship, Permanent']

Unnamed: 0,job_title,company,location,company_rating,rating_count,salary,job_type,description,url
157,Data Analyst,Diversiteam,Singapore,,,"Temporary, Contract, Internship, Permanent","Temporary, Contract, Internship, Permanent","Temporary, Contract, Internship, PermanentYou ...",https://www.indeed.com.sg/company/Diversiteam/...
535,AI / ML Engineer,Diversiteam,Singapore,,,"Temporary, Contract, Internship, Permanent","Temporary, Contract, Internship, Permanent","Temporary, Contract, Internship, PermanentYou ...",https://www.indeed.com.sg/company/Diversiteam/...
732,Full-Stack Developer,Diversiteam,Singapore,,,"Temporary, Contract, Internship, Permanent","Temporary, Contract, Internship, Permanent","Temporary, Contract, Internship, PermanentYou ...",https://www.indeed.com.sg/company/Diversiteam/...


In [20]:
usefuldf.loc[535,'description']

"Temporary, Contract, Internship, PermanentYou are an ideal candidate if you have aspiration in implementing Machine Learning algorithms in big data environment and developing effective solutions for data pipeline. You’ll decipher the pattern of a large amount of data and take proof of concepts into production. You are the motor for tomorrow’s state-of-the-art technology.Job ResponsibilitiesBuilding and maintaining Machine Learning architectureAdapting standard AI/Machine Learning models to exploit modern parallel efficientlyImplementing AI/Machine Learning algorithms in a big data environmentData modelling and evaluationQualificationsBachelors in Computer Science, Statistics, Biostatistics or equivalentFamiliarity in writing code with Ruby, Python, C/C++, Go, Java, ScalaProficient knowledge of deep learning frameworks such as Tensorflow, scikit-learn, AzureInterests in Artificial Intelligence, Machine Learning, Virtual Reality, Augmented RealityStrong problem solving and analytical sk

In [9]:
finaldf = usefuldf.drop(usefuldf.index[usefuldf['salary'].isin(['Permanent','Internship','Temporary','Contract','Temporary, Internship', 'Part-time','Contract, Permanent','Temporary, Contract, Internship, Permanent'])])

In [14]:
finaldf.drop_duplicates(inplace=True)

In [15]:
finaldf.shape

(30, 9)

In [16]:
finaldf.loc[finaldf['job_type']=='Contract, Permanent']

Unnamed: 0,job_title,company,location,company_rating,rating_count,salary,job_type,description,url
413,Network Engineer (ACI and Data Center),MTS Global Pte Ltd,Singapore,,,"$6,000 - $9,000 a month","Contract, Permanent","$6,000 - $9,000 a monthContract, PermanentThe ...",https://www.indeed.com.sg/company/MTS-Global-P...
876,VMWare Consultant,Ark Virtualization Pte Ltd,Singapore,,,"$3,500 - $6,000 a month","Contract, Permanent","$3,500 - $6,000 a monthContract, PermanentJob ...",https://www.indeed.com.sg/company/Ark-Virtuali...
