This is a basic webscraping script designed for personal use.

This requires 3 user entries (keywords, phrase and location) to filter job search results.
Job details from the Indeed website are scraped automatically from all the search results pages. 
A csv file containing the extracted data (with below column headers) will be saved locally.

- Job Title, Company, Location, Salary, Summary, Post Date, Easy Apply and Page URL.

Please note that this was built considering only job queries in Canada.
There is no error handling in the user input phase as I built this for personal use.

In [1]:
#Establish the url string based on user input
def build_base_url(keywords, phrase, location, start=None): # default value for parameter

    keywords=requests.utils.quote(keywords)
    phrase=requests.utils.quote(phrase)
    location=requests.utils.quote(location)
    
    url = "https://ca.indeed.com/jobs?q={}&as_phr={}&l={}".format(keywords,phrase,location)
    if start is not None:
        url+="&start={}".format(start)
    return url

In [2]:
#Return an integer for values under Post Date for easy sorting later on

def get_days_ago(date):
    if date in ["Today","Just posted"]:
        return 0
    else:
        return int(date[:2]) 

In [3]:
#Extracts job details from html tags and returns a dictionary
def parse_job_details(item):
    jobdetails={}
    jobdetails["Job Title"]=item.find("div",{"class":"title"}).text.replace("\n","")
    jobdetails["Company"]=item.find("span",{"class":"company"}).text.replace("\n","")
    try:
        jobdetails["Location"]=item.find("span",{"class":"location accessible-contrast-color-location"}).text
        
    except:
        jobdetails["Location"]=""
    try:
        jobdetails["Salary"]=item.find("span",{"class":"salaryText"}).text.replace("\n","")
    except:
        jobdetails["Salary"]=""
    jobdetails["Summary"]=item.find("div",{"class":"summary"}).text.replace("\n","")
    jobdetails["Post Date"]=get_days_ago(item.find("span",{"class":"date"}).text.replace("\n",""))
    try:
        jobdetails["Easy Apply"]=item.find("span",{"class":"iaLabel"}).text.replace("\n","")
    except:
        jobdetails["Easy Apply"]=""
    jobdetails["Page URL"]="https://ca.indeed.com"+item.find("a").get('href')
    return jobdetails

In [4]:
import requests
from bs4 import BeautifulSoup

#Get user input
keywords=input("Enter job title, keywords, or company : ")
phrase=input("With the exact phrase : ")
location=input("Enter city or province in Canada: ")
first_page=build_base_url(keywords,phrase,location)

#Checking the first page of the search results
loadpage= requests.get(first_page,headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
pagecon = loadpage.content
soup=BeautifulSoup(pagecon,"html.parser")

#Find out how many pages are there in the search results
page=soup.find_all("span",{"class":"pn"})
lastpage=len(page)

#For each page in Indeed.com search results, the value after "&start=" is incremented by 10
increment=10

Enter job title, keywords, or company : data scientist
With the exact phrase : 
Enter city or province in Canada: vancouver


In [5]:
results=[] #main table where all job details are saved

for page in range(0,lastpage):
    url = build_base_url(keywords,phrase,location,page*increment)

    #Request info for each page in the search results
    callpage=requests.get(url,headers={'User-agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0'})
    savepage=callpage.content
    soup=BeautifulSoup(savepage,"html.parser")

    #Defines the webscraping scope
    all=soup.find_all("div",{"class":"jobsearch-SerpJobCard unifiedRow row result"})   

    #Process each item in soup                          
    for item in all:
        indeed = parse_job_details(item) #returns a dictionary 
        results.append(indeed) #job details are saved as a row in the list


In [6]:
#Converts the dictionary to a dataframe
import pandas
indeed_df=pandas.DataFrame(results)
indeed_df.sort_values(by=['Post Date'], inplace=True,ascending=True)


#Displays the dataframe (showing all job details in a table format)
indeed_df

Unnamed: 0,Job Title,Company,Location,Salary,Summary,Post Date,Easy Apply,Page URL
0,Data Scientist - RACE21,Teck Resources Limited,,,"Reporting to the Manager of Technology, the Da...",1,,https://ca.indeed.com/cmp/Teck-Resources-Limited
29,Data Scientist - RACE21,Teck Resources Limited,,,"Reporting to the Manager of Technology, the Da...",1,,https://ca.indeed.com/pagead/clk?mo=r&ad=-6NYl...
44,Data Scientist - RACE21,Teck Resources Limited,,,"Reporting to the Manager of Technology, the Da...",1,,https://ca.indeed.com/pagead/clk?mo=r&ad=-6NYl...
69,Lead Data Scientist,Global Relay,"Vancouver, BC",,You will be applying a wide variety of models ...,3,,https://ca.indeed.com/rc/clk?jk=83529dd1c84e54...
49,Senior Data Scientist,Global Relay,"Vancouver, BC",,As a Senior Data Scientist in the Analytics gr...,3,,https://ca.indeed.com/rc/clk?jk=d21e69dcac32c8...
7,Senior Data Scientist,Global Relay,"Vancouver, BC",,As a Senior Data Scientist in the Analytics gr...,3,,https://ca.indeed.com/rc/clk?jk=d21e69dcac32c8...
52,Lead Data Scientist,Global Relay,"Vancouver, BC",,You will be applying a wide variety of models ...,3,,https://ca.indeed.com/rc/clk?jk=83529dd1c84e54...
66,Senior Data Scientist,Global Relay,"Vancouver, BC",,As a Senior Data Scientist in the Analytics gr...,3,,https://ca.indeed.com/rc/clk?jk=d21e69dcac32c8...
10,Lead Data Scientist,Global Relay,"Vancouver, BC",,You will be applying a wide variety of models ...,3,,https://ca.indeed.com/rc/clk?jk=83529dd1c84e54...
3,Data Scientist,Asana,"Vancouver, BC",,You will design and use data analysis tools th...,4,Easily apply,https://ca.indeed.com/rc/clk?jk=2ac74ddbfe1f75...


In [7]:
#Saves the table locally in a csv format with the filename "Indeed"
indeed_df.to_csv("Indeed.csv")