# Web Scraping - Indeed.com
General steps for Web Scraping
1. Check whether the website allows web scraping
2. Obtain the source code (HTML File) by using the website URL
3. Download the website content
4. Parse the content using keywords tags for elements of interest
5. Extract relevant data/features
6. Organize raw data in structured format (e.g., CSV)

### Import Dependencies 

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.common.by import By

### Path to webdriver (Firefox, Chrome) 

In [2]:
# Ensure that the driver path is correct before running this script.
# Microsoft Windows
# driver_path = "./drivers/windows/geckodriver.exe"
# Linux
driver_path = r"./drivers/geckodriver"
# driver = webdriver.Firefox(executable_path=driver_path)
# driver = webdriver.Chrome(executable_path=driver_path)
driver = webdriver.Chrome()

### Define position and location 

In [3]:
## Enter a job position
position = "data analyst"
## Enter a location (City, State or Zip or remote)
locations = "Canada"

def get_url(position, location):
    url_template = "https://ca.indeed.com/jobs?q={}&l={}"
    url = url_template.format(position, location)
    return url

url = get_url(position, locations)
dataframe = pd.DataFrame(columns=["Title", "Company", "Location", "Rating", "Date", "Salary", "Description", "Links"])

### Scrape job postings

In [4]:
## Number of postings to scrape
postings = 300

jn= 600
for i in range(600, postings, 10):
    driver.get(url + "&start=" + str(i))
    driver.implicitly_wait(3)

    jobs = driver.find_elements(By.CLASS_NAME, 'job_seen_beacon')

    for job in jobs:
        result_html = job.get_attribute('innerHTML')
        soup = BeautifulSoup(result_html, 'html.parser')
        
        jn += 1
        
        liens = job.find_elements(By.TAG_NAME, "a")
        links = liens[0].get_attribute("href")
        
        title = soup.select('.jobTitle')[0].get_text().strip()
        company = soup.select('.companyName')[0].get_text().strip()
        location = soup.select('.companyLocation')[0].get_text().strip()
        try:
            salary = soup.select('.salary-snippet-container')[0].get_text().strip()
        except:
            salary = 'NaN'
        try:
            rating = soup.select('.ratingNumber')[0].get_text().strip()
        except:
            rating = 'NaN'
        try:
            date = soup.select('.date')[0].get_text().strip()
        except:
            date = 'NaN'
        try:
            description = soup.select('.job-snippet')[0].get_text().strip()
        except:
            description = ''
       
        dataframe = pd.concat([dataframe, pd.DataFrame([{'Title': title,
                                          "Company": company,
                                          'Location': location,
                                          'Rating': rating,
                                          'Date': date,
                                          "Salary": salary,
                                          "Description": description,
                                          "Links": links}])], ignore_index=True)
        print("Job number {0:4d} added - {1:s}".format(jn,title))

In [5]:
driver.quit()

### Scrape full job descriptions

In [163]:
Links_list = dataframe['Links'].tolist()
#Links_list

In [164]:
import random
import time

In [167]:
driver = webdriver.Chrome()
descriptions=[]
for i in Links_list:
        driver.get(i)
        driver.implicitly_wait(random.randint(3, 8))
        try:
            jd = driver.find_element(By.XPATH, '//div[@id="jobDescriptionText"]').text
            descriptions.append(jd)
        except Exception:
            descriptions.append('n/a')
            pass
        time.sleep(random.randint(5,10))

dataframe['Descriptions'] = descriptions

In [168]:
driver.quit()

driver.quit()

### Save results

In [169]:
# Convert the dataframe to a csv file
date = datetime.today().strftime('%Y-%m-%d')
dataframe.to_csv(date + "_" + position + "_" + locations + ".csv", index=False)

In [170]:
dataframe

Unnamed: 0,Title,Company,Location,Rating,Date,Salary,Description,Links,Descriptions
0,Bilingual Support Analyst,Kelly Services (Canada),Remote,3.8,PostedPosted 1 day ago,$27 an hour,Strong ability to analyze data and communicate...,https://ca.indeed.com/pagead/clk?mo=r&ad=-6NYl...,"Kelly's customer, located in Markham, ON, is a..."
1,Business Analyst- Accounting,Black Diamond Group,"Calgary, AB",3.3,EmployerActive 2 days ago,$22.34–$70.01 an hour,Design test plans/scripts to confirm reporting...,https://ca.indeed.com/pagead/clk?mo=r&ad=-6NYl...,Black Diamond rents and sells modular space so...
2,SAP CRM Functional Analyst,Canadian Pacific,"Calgary, AB",2.9,PostedPosted 30+ days ago,,Perform data analysis to understand how data i...,https://ca.indeed.com/pagead/clk?mo=r&ad=-6NYl...,Req ID: 100398\nDepartment: Information Servic...
3,Data analyst student,SNC-Lavalin,"Toronto, ON",4.0,PostedPosted 22 days ago,,To…,https://ca.indeed.com/rc/clk?jk=1f2318c2acc741...,To provide
4,REMOTE Project-based roles - Data Analyst,Deloitte,"Remote in Toronto, ON",3.9,PostedPosted 16 days ago,,"In addition, for this particular data analyst ...",https://ca.indeed.com/rc/clk?jk=ad5ed76a4a1a89...,Job Type: Independent Contractor\nReference co...
...,...,...,...,...,...,...,...,...,...
295,Business Analyst (PL432),Paralucent,Remote,,PostedPosted 27 days ago,,"Strong problem solving, critical thinking, and...",https://ca.indeed.com/rc/clk?jk=d6d6bf9ce46a0a...,We are seeking a Business Analyst for our clie...
296,Workforce Analyst - 9 Month Contract,407 ETR,"Woodbridge, ON",3.7,PostedPosted 2 days ago,,1-2 years of data analysis experience preferre...,https://ca.indeed.com/rc/clk?jk=bc5dde23545fb4...,Position Summary:\nAs a successful Workforce A...
297,Senior Business Analyst - Contract,KPMG,"Toronto, ON",3.9,PostedPosted 23 days ago,,"Perform research, data gathering and analysis....",https://ca.indeed.com/rc/clk?jk=1fbf6879d6f6e5...,Overview:\nKPMG is an industry leading firm th...
298,Digital Marketing Analyst,CARFAX,"Toronto, ON",,PostedPosted 30+ days ago,,Wrangle data from online and offline data sour...,https://ca.indeed.com/pagead/clk?mo=r&ad=-6NYl...,Join Team CARFAX as a Digital Marketing Analys...
