**Preparation**
- Install necessary Python libraries: selenium, beautifulsoup4 (bs4), and chromedriver-autoinstaller.
-Choose a dynamic webpage for scraping. For this project, we will scrape dynamic product data from a demo e-commerce site, like : inmotionhosting.

**Task**
- Initialize Selenium WebDriver
-Load the Web Page
-Identify the elements that contain hosting plan details.
-Extract necessary data such as plan names, features, and pricing.
-Store and Save the Data
-Close Selenium WebDriver

In [None]:
# Install necessary libraries
!pip install selenium beautifulsoup4 chromedriver-autoinstaller

Collecting selenium
  Downloading selenium-4.34.2-py3-none-any.whl.metadata (7.5 kB)
Collecting chromedriver-autoinstaller
  Downloading chromedriver_autoinstaller-0.6.4-py3-none-any.whl.metadata (2.1 kB)
Collecting trio~=0.30.0 (from selenium)
  Downloading trio-0.30.0-py3-none-any.whl.metadata (8.5 kB)
Collecting trio-websocket~=0.12.2 (from selenium)
  Downloading trio_websocket-0.12.2-py3-none-any.whl.metadata (5.1 kB)
Collecting outcome (from trio~=0.30.0->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.12.2->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl.metadata (5.6 kB)
Downloading selenium-4.34.2-py3-none-any.whl (9.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.4/9.4 MB[0m [31m38.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading chromedriver_autoinstaller-0.6.4-py3-none-any.whl (7.6 kB)
Downloading trio-0.30.0-py3-none-any.whl (499 kB)
[2K   [90m━━━━━━━━━━━━━

In [None]:
# Install headless Chrome
!apt-get update
!apt-get install -y chromium-browser

# Set the path for the installed Chrome browser
import os
os.environ['CHROMIUM_PATH'] = '/usr/bin/chromium-browser'

0% [Working]            Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
0% [Connecting to archive.ubuntu.com (185.125.190.81)] [1 InRelease 12.7 kB/129                                                                               Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
0% [Waiting for headers] [1 InRelease 69.2 kB/129 kB 54%] [Waiting for headers]0% [Waiting for headers] [1 InRelease 95.3 kB/129 kB 74%] [Waiting for headers]                                                                               Get:3 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,853 kB]
Hit:8 https:

In [None]:
import selenium
from selenium import webdriver
from bs4 import BeautifulSoup
import chromedriver_autoinstaller
import csv
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# Install ChromeDriver
chromedriver_autoinstaller.install()

# Configure Chrome options for headless execution
chrome_options = Options()
chrome_options.add_argument("--headless") # Run Chrome in headless mode (without a GUI)
chrome_options.add_argument("--no-sandbox") # Bypass OS security model, crucial in Docker/Colab
chrome_options.add_argument("--disable-dev-shm-usage") # Overcome limited resource problems

# Initialize Selenium WebDriver with options
driver = webdriver.Chrome(options=chrome_options)

# Load the webpage
driver.get("https://www.inmotionhosting.com/")
driver.implicitly_wait(10)

# Find the hosting plan elements using Selenium with the updated class name
plan_elements = driver.find_elements(By.CLASS_NAME, "imh-rostrum-card")


# Extract data from each plan
data = []
# Only proceed if plan_elements are found by Selenium
if plan_elements:
    for plan_element in plan_elements:
        element_soup = BeautifulSoup(plan_element.get_attribute("innerHTML"), "html.parser")
        try:
            plan_name_element = element_soup.find("h3", class_="imh-rostrum-card-title") # More specific locator for plan name
            plan_name = plan_name_element.text.strip() if plan_name_element else "N/A"

            # Features might be in a different structure, let's look for common list items or paragraphs
            # We'll broaden the search for features within the card
            features = []
            # Look for list items first
            list_features = element_soup.find_all("li")
            features.extend([li.text.strip() for li in list_features if li.text.strip()])

            # If no list items found, look for paragraphs or other text elements that might contain features
            if not features:
                 text_features = element_soup.find_all(['p', 'div'], class_=lambda x: x and ('feature' in x or 'description' in x))
                 features.extend([tf.text.strip() for tf in text_features if tf.text.strip()])


            plan_price_element = element_soup.find("span", class_="rostrum-price") # Updated class for price
            plan_price = plan_price_element.text.strip() if plan_price_element else "N/A"

            plan_data = {
                "Plan Name": plan_name,
                "Features": ", ".join(features),
                "Price": plan_price
            }
            data.append(plan_data)
        except Exception as e:
            print(f"Error extracting data from a plan element: {e}")
            print(f"Problematic element HTML:\n{plan_element.get_attribute('innerHTML')}")

# Save the data to a CSV file
if data:
    with open("hosting_plans.csv", "w", newline="") as csvfile:
        fieldnames = ["Plan Name", "Features", "Price"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

        writer.writeheader()
        for row in data:
            writer.writerow(row)
    print(f"\nSuccessfully extracted and saved {len(data)} plan(s) to hosting_plans.csv")
else:
    print("\nNo data extracted. 'hosting_plans.csv' was not created or was left empty.")


# Close the WebDriver
driver.quit()


Successfully extracted and saved 8 plan(s) to hosting_plans.csv


In [None]:
import pandas as pd

try:
    df = pd.read_csv('hosting_plans.csv')
    display(df.head())
except FileNotFoundError:
    print("Error: 'hosting_plans.csv' not found. Please make sure the scraping code ran successfully.")

Unnamed: 0,Plan Name,Features,Price
0,Shared Hosting,,$3.19
1,cPanel WordPress,,$3.69
2,VPS Hosting,,$9.99
3,Dedicated Hosting,,$35.00
4,Shared Hosting,"Free Domain & SSL, Free Website Builder, Unmet...",$3.19
