<div style="background-color: #01377D; padding: 10px; text-align: center; color: white; font-size: 32px; font-family: 'Arial', sans-serif;">
    Python Web Scraping Using Selenium and Beautiful Soup <br><br>
    
</div>

## What is Web Scraping?

- Web scraping is the process of extracting data from websites automatically using programs or scripts.

- Instead of manually copying, we let code fetch HTML data, parse it, and extract the required info.

### What is Selenium and Beautiful Soup?

- When we’re talking about web scraping in Python, two of the most widely used tools are Selenium and Beautiful Soup. Each serves a unique purpose, and together, they form a powerful duo for extracting data from both static and dynamic websites.

## 1. BeautifulSoup (bs4)

- A Python library for parsing HTML/XML documents.

- It cannot load JavaScript dynamically.

- Works best on static websites (where HTML source already has the data).

### 👉 Example workflow:

- Get page source (via requests or Selenium).

- Parse it using BeautifulSoup.

- Extract data using tags, classes, IDs, etc.



## 2. Selenium

- A Python automation tool for controlling a browser (Chrome, Firefox, etc.).

- Can interact with dynamic websites (JavaScript-rendered content, buttons, scrolling, login).

- Useful when data doesn’t appear in HTML source directly.

### 👉 Features:

- Automates clicks, scrolls, forms.

- Waits for elements to load (WebDriverWait).

- Then we can fetch page_source and use BeautifulSoup to parse.

## 3. Selenium + BeautifulSoup Workflow

- This is the most common combo:

- Selenium → loads the website & executes JavaScript.

- Selenium → grabs HTML (driver.page_source).

- BeautifulSoup → parses that HTML and extracts structured data.

## importing libraries

In [1]:
!pip install selenium

Defaulting to user installation because normal site-packages is not writeable


In [2]:
!pip install webdriver-manager

Defaulting to user installation because normal site-packages is not writeable


In [3]:
from selenium import webdriver 
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd

In [4]:
chrome_options = Options()                         #Creates a configuration object for Chrome.
chrome_options.add_argument("--headless")  
chrome_options.add_argument(
    "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.6778.265 Safari/537.36"
)

In [5]:
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=chrome_options)

In [6]:
url = "https://www.foundit.in/search/software-engineer-jobs"
driver.get(url)

In [7]:
# WebDriverWait(driver, 30).until(
#         EC.presence_of_element_located((By.CLASS_NAME, "flex w-full flex-col gap-3"))
#     )

In [8]:
# WebDriverWait(driver, 30).until(
#         EC.presence_of_element_located((By.CLASS_NAME, "h3"))
#     )

In [9]:
# WebDriverWait(driver, 30).until(
#         EC.presence_of_element_located((By.CLASS_NAME, "div"))
#     )

In [10]:
# WebDriverWait(driver, 30).until(
#         EC.presence_of_element_located((By_XPATH,"//h3"))
#     )

In [11]:
WebDriverWait(driver, 30).until(
        EC.presence_of_element_located((By.CLASS_NAME, "text-darkKnight-700"))
    )

<selenium.webdriver.remote.webelement.WebElement (session="e361be1e422d75c7b896f3a87978c1d1", element="f.C18EA7468A86EA0DCF7DBBD3C6171620.d.F1435AEDFCF557D1866D051493069765.e.62")>

In [12]:
# WebDriverWait(driver, 30).until(
#         EC.presence_of_element_located((By.CLASS_NAME, "flex flex-col gap-4 rounded-2xl border-opacity-50 p-4 relative w-auto cursor-pointer border border-solid border-jobCardBorder bg-surface-primary-normal shadow-job-card hover:shadow-job-card-hover md:!w-[570px]q"))
#     )

In [13]:
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")

In [19]:
job_profiles_section = soup.find_all('h3', class_='text-darkKnight-700')

print("Top Job Profiles:")
for i, job in enumerate(job_profiles_section[:15], start=1):
    print(f"{i}. {job.text.strip()}")

Top Job Profiles:
1. Software Engineer
2. Software Engineer III
3. Senior Java Software Engineer
4. Senior Software Engineer
5. Software Engineer Java
6. Software Engineer Java
7. Software Engineer Java
8. Staff Software Engineer
9. Software Engineer Java Fullstack
10. Software Engineer Java Fullstack
11. Software Engineer Java Fullstack
12. Software Engineer Java Fullstack
13. Software Engineer Java Fullstack
14. Software Engineer Java Fullstack
15. Senior Software Engineer Java Fullstack


In [20]:
column=[
    'Top Job Profiles'
]

In [21]:
df=pd.DataFrame(job_profiles_section,columns=column)
df

Unnamed: 0,Top Job Profiles
0,Software Engineer
1,Software Engineer III
2,Senior Java Software Engineer
3,Senior Software Engineer
4,Software Engineer Java
5,Software Engineer Java
6,Software Engineer Java
7,Staff Software Engineer
8,Software Engineer Java Fullstack
9,Software Engineer Java Fullstack


In [22]:
driver.quit()