# Automating Job Search on LinkedIn with Python and Selenium

In today's fast-paced world, job hunting can be a daunting task. Thankfully, automation can ease the burden by allowing you to search for jobs efficiently and effectively, all within the comfort of your Python environment.

In this guide, we'll walk through a step-by-step process to automate LinkedIn job searches. By leveraging the power of Selenium, a popular web automation tool, and Python, a versatile programming language, you'll be able to search for job opportunities tailored to your preferences, industry, and location.

![](https://scrapfly.io/blog/content/images/web-scraping-with-selenium-and-python_banner_light.svg)

## Project Workflow:

**1. Setting Up Your Environment:**

- <u>Package Installation</u>: Install necessary packages such as Selenium and pandas using pip.

- <u>Web Driver Configuration</u>: Configure the appropriate web driver (in this case, Chrome) for Selenium.

**2. User Authentication:**

- <u>LinkedIn Login</u>: Automate the login process using your LinkedIn username and password securely.

- <u>Encrypted Input</u>: Use the getpass library to securely input your LinkedIn password without displaying it on the screen.

**3. Job Search and Navigation:**

- <u>Navigating to Jobs</u>: Access the LinkedIn jobs page and input your desired job domain or role.

- <u>Pagination Handling</u>: Navigate through multiple pages of job listings, automatically scrolling and collecting data from each page.

**4. Data Extraction and Storage:**

- <u>Extracting Job Data</u>: Collect job titles, locations, companies, and corresponding URLs from the job listings.

- <u>Data Structuring</u>: Organize the extracted data into lists or dictionaries for easy manipulation and storage.

- <u>Error Handling</u>: Implement error handling to manage situations where specific elements might not be present on the page.

**5. Exporting Data:**

- <u>Creating a DataFrame</u>: Utilize the Pandas library to create a DataFrame from the extracted data.

- <u>CSV Export</u>: Export the DataFrame to a CSV file, making it accessible for further analysis in tools like Excel or data science libraries in Python.

---
## <center>Let's Begin!!</center>
---

- Install the necessary libraries.

In [20]:
# !pip install selenium
# !pip install pandas

- Import the necessary modules.

In [21]:
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
# from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.actions.wheel_input import ScrollOrigin
import pandas as pd
import os
import time
import getpass

- Initialize Selenium Webdriver.

In [22]:
def init_driver():
    
#     options = Options()
#     options.add_argument("--headless=new")
#     driver = webdriver.Chrome(options=options)
    driver = webdriver.Chrome()
    driver.maximize_window()
    return driver

- Navigate to LinkedIn.

In [23]:
def navigate_to_linkedin(driver):
    
    driver.get("https://www.linkedin.com/feed/")
    time.sleep(1)

- Sign in to LinkedIn.
    - The **sign_in()** function requires the user to enter their LinkedIn username and password and clicks the sign-in button to authenticate the user and proceed to the next step.
    
    - ***Note:** If you run the code several times, you might be required to perform a CAPTCHA test manually to proceed ahead. Selenium will take care of rest of the process.*

In [24]:
def sign_in(driver):
    
    sign_in = driver.find_element(By.XPATH, '/html/body/div[1]/main/div/p/a')
    sign_in.click()
    
    uname = driver.find_element(By.ID, "username")
    username = getpass.getpass("Enter your LinkedIn username: ")
    uname.send_keys(username)
    
    pword = driver.find_element(By.ID, "password")
    p = getpass.getpass("Enter your LinkedIn password: ")
    pword.send_keys(p)
    
    final_sign_in = driver.find_element(By.XPATH, '//*[@id="organic-div"]/form/div[3]/button')
    final_sign_in.click()
    
    driver.implicitly_wait(15)

- Perform Job Search.
    - The **perform_job_search()** function lets users enter the job they're looking for and takes them to the 'Jobs' section.
    - First, we locate the 'Search Box' using its XPATH. Next, we take user input for the job search and press enter.
    - Once the page loads, we find the 'Jobs' button using its XPATH and click on it.

In [25]:
def perform_job_search(driver):
    
    search_box = driver.find_element(By.XPATH, '//*[@id="global-nav-typeahead"]/input')
    job_search_domain = input("\nPlease specify the job domain/job role that you are searching for: \n")
    search_box.send_keys(job_search_domain)
    search_box.send_keys(Keys.ENTER)
    
    time.sleep(5)
    
    jobs_button = driver.find_element(By.XPATH, '//button[text()="Jobs"]')
    jobs_button.click()
    time.sleep(5)
    
    return job_search_domain

- Scrape Job Listings.
    - The **scrape_jobs()** function checks if there are any jobs that match the user's criteria. If there are no matching jobs, it tells the user there are none.
    - If jobs are available, it scrolls till the end of the page to load the HTML content of that page, extracts the required information, and loads the next page if present.
    - We search for all the required tags using their XPATH, CLASS NAME, LINK TEXT, or CSS SELECTOR.

In [26]:
def scrape_jobs(driver):

    # Now, we are on the jobs page
    # Each page contains 25 jobs
    # Thus, we want to visit page number 1,2,3,4
    # to get the top 100 jobs
    

    # If we find 'No matching jobs found.' after job search, we print the same
    # If we do not find 'No matching jobs found.', selenium will throw
    # 'NoSuchElementException' Exception
    # Thus, we use exception handling for scraping relevant information
    try:
        no_matching_jobs = driver.find_element(By.XPATH, '/html/body/div[5]/div[3]/div[4]/div/div[1]/div/h1')
        print('\nNo matching jobs found.')
    except NoSuchElementException:

        job_titles_list = []
        job_links_list = []
        job_locations_list = []
        company_names_list = []
        page_number = 2

        print('\n---> Job search begins....\n')

        for i in range(1, 5): # 

            print('Please wait, searching for jobs....')
            
            # The tag stored in footer variable helps us
            # scroll at the bottom of the page
            footer = driver.find_element(By.XPATH, '//*[@id="main"]/div/div[1]/div')
            scroll_origin = ScrollOrigin.from_element(footer)
            for i in range(6):
                ActionChains(driver).scroll_from_origin(scroll_origin, 0, 550).perform()
                time.sleep(1)

            job_titles = driver.find_elements(By.CLASS_NAME, 'full-width.artdeco-entity-lockup__title.ember-view')
            # job_titles_list = []
            for job_title in job_titles:
                job_titles_list.append(job_title.text)

            # We extract the URL of a job posting
            # By iterating over all the tags in 'job_titles[]' list
            # And extracting the link associated with each job title

            # job_links_list = []
            for job_title in job_titles:
                job_links_list.append(job_title.find_element(By.LINK_TEXT, job_title.text).get_attribute("href"))        

            job_locations = driver.find_elements(By.CLASS_NAME, 'artdeco-entity-lockup__caption')
            # job_locations_list = []
            for job_location in job_locations:
                job_locations_list.append(job_location.text)        

            company_names = driver.find_elements(By.CLASS_NAME, 'job-card-container__primary-description ')
            # company_names_list = []
            for company_name in company_names:
                company_names_list.append(company_name.text)

            try:
                page_button = driver.find_element(By.CSS_SELECTOR, f'li[data-test-pagination-page-btn="{page_number}"]').find_element(By.TAG_NAME, 'button')
                page_button.click()
                page_number += 1
                time.sleep(2)
            except NoSuchElementException:
                break
        
        return job_titles_list, job_locations_list, company_names_list, job_links_list

- Export to CSV.
    - The **export_to_csv()** function creates a folder named 'Jobs', a Pandas DataFrame containing information about the top 100 jobs, and saves the DataFrame as a CSV file inside the 'Jobs' folder.

In [27]:
def export_to_csv(job_search_domain, job_titles_list, job_locations_list, company_names_list, job_links_list):
    
    list_dict = {'Job Title': job_titles_list, 'Location': job_locations_list, 'Company': company_names_list, 'Link': job_links_list} 
    
    folder_path = 'Jobs/'
    os.makedirs(folder_path, exist_ok=True)
    
    pd.set_option('display.max_colwidth', None)
    df = pd.DataFrame(list_dict)
    df.to_csv(os.path.join(folder_path, f"{job_search_domain} Jobs.csv"), index=None)

- Close the Browser

In [28]:
def close_browser(driver):
    print(f'\nEnd Of Job Search!')
    driver.close()

## Let's start scraping the top 100 jobs!

In [29]:
if __name__ == "__main__":
    driver = init_driver()
    navigate_to_linkedin(driver)
    sign_in(driver)
    job_search_domain = perform_job_search(driver)
    information = scrape_jobs(driver)
    export_to_csv(job_search_domain, *information)
    close_browser(driver)

Enter your LinkedIn username: ········
Enter your LinkedIn password: ········

Please specify the job domain/job role that you are searching for: 
Machine Learning Engineer

---> Job search begins....

Please wait, searching for jobs....
Please wait, searching for jobs....
Please wait, searching for jobs....
Please wait, searching for jobs....

End Of Job Search!
