# LIKEDIN WEB SCRAPING 

Dalam dunia kerja yang semakin kompetitif, mendapatkan informasi terkini tentang peluang pekerjaan sangatlah penting bagi para pencari kerja dan perusahaan. Salah satu platform yang sering digunakan untuk mencari dan mengiklankan lowongan pekerjaan adalah LinkedIn. Mengingat jumlah data yang sangat besar dan dinamis di LinkedIn, melakukan scraping data menjadi salah satu metode yang efektif untuk mengakses dan mengumpulkan informasi terkait lowongan kerja. 

Salah satu proses scraping data dari web adalah menggunakan library BeautifulSoup dan Selenium

In [None]:
!pip install selenium
!pip install chromedriver-autoinstaller
!pip install pandas
!pip install beautifulsoup4

In [26]:
from bs4 import BeautifulSoup
import random
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import chromedriver_autoinstaller

- BeautifulSoup digunakan untuk parsing HTML dan mengekstrak data yang diinginkan.
- selenium adalah alat otomatisasi untuk mengontrol browser secara programatik.
- chromedriver_autoinstaller secara otomatis mengunduh dan menginstal ChromeDriver yang kompatibel dengan versi Google Chrome di sistem pengguna.

In [83]:
# install the latest chromedriver if necessary
chromedriver_autoinstaller.install()

# configure chrome options
options = webdriver.ChromeOptions()
options.add_argument("--start-maximized")

# launch chrome browser
browser = webdriver.Chrome(options=options)

In [84]:
# open linkedin job search page
#( modify keywords as needed, url below is in one line)
browser.get (f'https://www.linkedin.com/jobs/search?keywords=Data%20Analysis&location=Indonesia&geoId=102478259&f_TPR=&f_E=1%2C2&original_referer=https%3A%2F%2Fwww.linkedin.com%2Fjobs%2Fsearch%3Fkeywords%3DData%2520Analysis%26location%3DIndonesia%26geoId%3D102478259%26trk%3Dpublic_jobs_jobs-search-bar_search-submit%26position%3D1%26pageNum%3D0&position=1&pageNum=0')

#set the number of pages to scrape
pages = 15

### Membuat automasi untuk melakukan scroll dan klik pada tombol "See more jobs"

In [85]:
#loop through the specified number of pages to retrieve job postings
for i in range(pages):
    print(f'Scraping page {i + 1}')
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    try:
        #click on the "see more job" button if present
        element = WebDriverWait(browser, 5).until(
            EC.presence_of_element_located(
                (By.XPATH, "//button[text()='See more jobs']")
            )
        )
        element.click()
    except Exception :
        pass

Scraping page 1
Scraping page 2
Scraping page 3
Scraping page 4
Scraping page 5
Scraping page 6
Scraping page 7
Scraping page 8
Scraping page 9
Scraping page 10
Scraping page 11
Scraping page 12
Scraping page 13
Scraping page 14
Scraping page 15


### Proses Scraping Data

In [86]:
# Scrape job postings
jobs = []
soup = BeautifulSoup(browser.page_source, "html.parser")
job_listings = soup.find_all("div", class_="base-card")

for job in job_listings:
    job_title = job.find("h3", class_="base-search-card__title")
    job_title = job_title.text.strip() if job_title else "N/A"

    job_company = job.find("h4", class_="base-search-card__subtitle")
    job_company = job_company.text.strip() if job_company else "N/A"

    job_location = job.find("span", class_="job-search-card__location")
    job_location = job_location.text.strip() if job_location else "N/A"

    job_posting_date = job.find("time", class_="job-search-card__listdate")
    job_posting_date = job_posting_date["datetime"] if job_posting_date else "N/A"

    apply_link = job.find("a", class_="base-card__full-link")
    apply_link = apply_link["href"] if apply_link else "N/A"
    job_ID = apply_link.split('?')[0][-10:] if apply_link != "N/A" else "N/A"
    
    #optionally, extract the more description
#    try:
#        description_soup = BeautifulSoup(browser.page_source, "html.parser")
#        job_description = description_soup.find_all("div", class_="decorated-job-posting__details")
#        job_requirements = description_soup.find("div", class_="show-more-less-html__markup relative overflow-hidden").text.strip()
#        job_senioritylevel = description_soup.find("span", class_="description__job-criteria-text description__job-criteria-text--criteria").text.strip()
#        job_employmenttype = description_soup.find("span", class_="description__job-criteria-text description__job-criteria-text--criteria").text.strip()
#    except AttributeError:
#        job_requirements = None
#        job_senioritylevel = None
#        job_employmenttype = None
        
    
    #optionally, extract the job description
  #  try:
   #     description_soup = BeautifulSoup(browser.page_source, "html.parser")
  #      job_description = description_soup.find("div", class_="jobs-aplly-button-").text.strip()
  #  except AttributeError:
  #      job_description = None
    
    jobs.append({
        "job ID": job_ID,
        "posting date": job_posting_date,
        "title": job_title,
        "company": job_company,
        "location": job_location,
        "link": apply_link
    })


### Hasil Scraping

In [87]:
#make data frame
df = pd.DataFrame(jobs)
df

Unnamed: 0,job ID,posting date,title,company,location,link
0,3943485503,2024-06-05,Consumer Insight Analyst,Kalbe Nutritionals (PT Sanghiang Perkasa),"Jakarta, Jakarta, Indonesia",https://id.linkedin.com/jobs/view/consumer-ins...
1,3996426597,2024-08-12,Sustainability Data Analyst,PT Lion Super Indo,"Jakarta, Indonesia",https://id.linkedin.com/jobs/view/sustainabili...
2,3818183787,2024-02-01,Kalbe Nutritionals Internship,Kalbe Nutritionals (PT Sanghiang Perkasa),"Jakarta, Jakarta, Indonesia",https://id.linkedin.com/jobs/view/kalbe-nutrit...
3,3948515197,2024-06-13,Marketing Internship - PT Kalbe Blackmores Nut...,Kalbe Nutritionals (PT Sanghiang Perkasa),"Jakarta, Jakarta, Indonesia",https://id.linkedin.com/jobs/view/marketing-in...
4,3988578383,2024-07-31,PPIC Internship,Kalbe Nutritionals (PT Sanghiang Perkasa),"West Karawang, West Java, Indonesia",https://id.linkedin.com/jobs/view/ppic-interns...
...,...,...,...,...,...,...
71,3959614450,2024-06-27,Data Analyst,PT Solomon Indo Global,Greater Surabaya,https://id.linkedin.com/jobs/view/data-analyst...
72,3953798332,2024-06-20,Selection Staff,PT TWO WIN INDONESIA,"Jakarta, Jakarta, Indonesia",https://id.linkedin.com/jobs/view/selection-st...
73,3994643522,2024-08-07,Research and Development Supervisor,Kalbe Nutritionals (PT Sanghiang Perkasa),"West Karawang, West Java, Indonesia",https://id.linkedin.com/jobs/view/research-and...
74,3956085661,2024-06-21,Business Operation Specialist,PT. ASTRA DAIHATSU MOTOR,"Jakarta, Jakarta, Indonesia",https://id.linkedin.com/jobs/view/business-ope...


In [76]:
# Save data into a CSV File
df = pd.DataFrame(jobs)
df.to_csv("jobs_data_analyst.csv")