1. Setup and Importing Libraries

Explanation:

webdriver and By are used to interact with the webpage and locate elements.
WebDriverWait and expected_conditions help wait for elements to load.
Service and ChromeDriverManager handle the Chrome WebDriver automatically.
ActionChains helps simulate mouse actions, such as clicking.
time is used for adding delays (e.g., waiting for the page to load).

In [None]:
# Import necessary libraries
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.action_chains import ActionChains
import time
import csv

2. Starting the WebDriver

Explanation:

ChromeDriverManager().install() downloads and sets up the appropriate version of the Chrome driver.
webdriver.Chrome() opens the Chrome browser.

In [73]:
# Start WebDriver with Service configuration
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)


3. Opening the URL

driver.get(url) opens the given URL in the browser.

In [74]:
# Open the Petfinder URL
url = "https://www.petfinder.com/search/dogs-for-adoption/ca/newfoundland-and-labrador/?distance=Anywhere"
driver.get(url)


4. Scraping Pet Names from a Single Page

Explanation:

The function scrape_pets_on_page() waits for the pet cards to load using WebDriverWait.
It extracts the text (pet names) from each pet card and prints it out.
The total_pets_scraped counter keeps track of how many pets have been scraped.
Errors are handled with try-except blocks, in case something goes wrong while scraping the pet names.

In [77]:
# Variable to count the total number of pets scraped
total_pets_scraped = 0

# Define function to scrape pet names from a single page
def scrape_pets_on_page():
    global total_pets_scraped
    try:
        # Wait until the pet cards are loaded (increase the timeout to 20 seconds)
        pets = WebDriverWait(driver, 20).until(
            EC.presence_of_all_elements_located((By.CSS_SELECTOR, '[data-test="Pet_Card_Pet_Details_List"]'))
        )

        # Extract and print the names of all pets on the page
        for pet in pets:
            try:
                # Extract pet name using the correct data-test attribute
                pet_name = pet.text
                print(f"Dog's Name: {pet_name}")
                total_pets_scraped += 1
            except Exception as e:
                print(f"Error scraping pet data: {e}")
    
    except Exception as e:
        print(f"Error loading pets on the page: {e}")
        # Log the page source for debugging
        print(driver.page_source)


5. Code to Limit Scraping to 10 Pages

Scrape Pets on the Current Page:

The function scrape_pets_on_page extracts pet names from the current page.
Navigate to the Next Page:

The go_to_next_page function clicks the "Next" button to move to the next page.
Limit to 10 Pages:

The max_pages variable is set to 10.
A for loop iterates through the first 10 pages.
After scraping each page, the script moves to the next unless it’s the last page.
Graceful Exit:

Once 10 pages are scraped, the loop breaks, and the browser (driver) is closed.

In [78]:
# Function to navigate to the next page
def go_to_next_page():
    try:
        # Find the "Next" button and click it
        next_button = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, '/html/body/pf-app/div[4]/div/div/div[2]/pfdc-animal-search/div/div/div[2]/div[2]/div[2]/pfdc-element[2]/div/div/div[2]/pfdc-page-controls/div/div[3]/button'))
        )
        next_button.click()
        print("Navigated to the next page.")
    except Exception as e:
        print(f"Error navigating to the next page: {e}")


# Main script to scrape 10 pages
total_pets_scraped = 0
max_pages = 10  # Limit to the first 10 pages

try:
    for page in range(1, max_pages + 1):
        print(f"Scraping page {page}...")
        scrape_pets_on_page()  # Scrape the current page
        if page < max_pages:  # Go to the next page if we haven't reached the last page
            go_to_next_page()
        else:
            print("Reached the limit of pages to scrape.")
            break

except Exception as e:
    print(f"Error during scraping: {e}")

finally:
    print(f"Total pets scraped: {total_pets_scraped}")
    #driver.quit()  # Close the browser after scraping

Scraping page 1...
Dog's Name: Murphy#A-3043
Dog's Name: Hoss
Dog's Name: Bull
Dog's Name: Hazel
Dog's Name: Ava
Dog's Name: Trickster
Dog's Name: CJ
Dog's Name: Tigger
Dog's Name: Bri
Dog's Name: Rambo and Maggie
Dog's Name: Zeus
Dog's Name: Hoss
Dog's Name: Zeus
Dog's Name: Kai and Kurt
Dog's Name: Sam
Dog's Name: Natalie
Dog's Name: Britt
Dog's Name: Wilma
Dog's Name: Ruthie
Dog's Name: Oliver
Dog's Name: Charles
Dog's Name: Max
Dog's Name: Winston
Dog's Name: Amelia
Dog's Name: Esmeraldo
Dog's Name: Ruben
Dog's Name: Mr. Bill
Dog's Name: Demi
Dog's Name: Hera
Dog's Name: Paris
Dog's Name: Valiant
Dog's Name: Wonderous
Dog's Name: Zeke
Dog's Name: Otis
Dog's Name: Red Sox
Dog's Name: Achilles
Dog's Name: Fenway
Dog's Name: Harry
Dog's Name: Bones
Dog's Name: Robby
Error navigating to the next page: Message: element click intercepted: Element <button class="fieldBtn fieldBtn_altHover m-fieldBtn_iconRt m-fieldBtn_tight m-fieldBtn_full" type="button" pf-mix-click="$closest.goToNextPage

In [79]:
try:
    # Wait for all pet card images to load
    images = WebDriverWait(driver, 30).until(
        EC.presence_of_all_elements_located((By.XPATH, '//pfdc-animal-search-results//pfdc-pet-card//pfdc-lazy-load/img'))
    )
    
    # Extract image URLs
    image_urls = [img.get_attribute('src') for img in images if img.get_attribute('src')]
    
    # Print the fetched image URLs
    for idx, url in enumerate(image_urls):
        print(f"Image {idx + 1}: {url}")

except Exception as e:
    print(f"Error fetching images: {e}")
    
finally:
    print(f"Total pets scraped: {total_pets_scraped}")
    #driver.quit()  # Close the browser after scraping

Image 1: https://dl5zpyw5k3jeb.cloudfront.net/photos/pets/73147859/1/?bust=1732117005&width=300
Image 2: https://dbw3zep4prcju.cloudfront.net/animal/00ac39c6-384f-47e3-8243-dff999c1978e/image/0d129fda-aaa8-48f9-9c45-45febb5fc9d7.jpg?versionId=roMrWJtV2FF8_GxjKV7x8s2putxKjXy6&bust=1732429169&width=300
Image 3: https://dbw3zep4prcju.cloudfront.net/animal/5813962f-dac0-40d0-b476-e1601a5e75c2/image/d836abd6-133f-4ce5-9ce3-6ddfc93214f3.jpg?versionId=nyIDWUBg0zrW3Iq2pxeuyQFCHxQ0H9Xu&bust=1732231226&width=300
Image 4: https://dbw3zep4prcju.cloudfront.net/animal/96126ef5-53f9-4be0-95fd-18e7a78fdda8/image/ce024541-4e8b-49c3-8022-b629e435f340.jpg?versionId=Md0UXZ_2ttwD6NL45zDjVv8lGIfd_ADY&bust=1732230539&width=300
Image 5: https://dbw3zep4prcju.cloudfront.net/animal/6e923937-47a5-41c7-b058-3c7a5a10fa70/image/94975c57-8ae6-4b6b-a584-906032f858a3.jpg?versionId=b2swj0dLPqxfCHMMIR0HSE0C6s_I8mZb&bust=1732230358&width=300
Image 6: https://dbw3zep4prcju.cloudfront.net/animal/eee70bfd-c867-441f-bd8c-705

7.Collecting Dog Ages

In [80]:
# Set up Selenium WebDriver
driver = webdriver.Chrome()  # Replace with the WebDriver for your browser

# Base URL with a placeholder for the page number
base_url = "https://www.petfinder.com/search/dogs-for-adoption/ca/newfoundland-and-labrador/?distance=Anywhere&page={}"

# Initialize an empty list to store dog ages
all_dog_ages = []

try:
    # Loop through the first 10 pages
    for page in range(1, 11):  # Adjust range for more pages if needed
        # Construct the URL for the current page
        current_url = base_url.format(page)
        print(f"Scraping page {page}...")
        
        # Load the page
        driver.get(current_url)
        
        # Allow the page to load
        time.sleep(3)  # Adjust the delay as needed
        
        # Wait for the dog elements to load
        WebDriverWait(driver, 10).until(
            EC.presence_of_all_elements_located((By.XPATH, '//pfdc-pet-card'))
        )
        
        # Locate the age information using the provided XPath
        dog_ages_elements = driver.find_elements(By.XPATH, '//pfdc-pet-card/a/div/div/div[2]/ul/li[1]/ul/li[1]')
        
        # Extract text from each element and append to the list
        dog_ages = [age.text for age in dog_ages_elements]
        all_dog_ages.extend(dog_ages)  # Append to the main list
        
        # Print the extracted data for the current page
        for age in dog_ages:
            print(age)  # Print only the age (e.g., Adult, Young, Puppy)

    # Show the total number of pets scraped
    print(f"\nTotal pets scraped: {len(all_dog_ages)}")

except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Close the browser
    driver.quit()


Scraping page 1...
Young
Adult
Adult
Puppy
Puppy
Puppy
Adult
Adult
Adult
Puppy
Puppy
Adult
Adult
Young
Adult
Puppy
Puppy
Young
Young
Puppy
Puppy
Puppy
Puppy
Puppy
Puppy
Puppy
Adult
Young
Adult
Adult
Young
Young
Young
Adult
Adult
Adult
Adult
Adult
Young
Adult
Scraping page 2...
Adult
Adult
Adult
Puppy
Puppy
Adult
Puppy
Young
Adult
Young
Young
Young
Adult
Adult
Young
Adult
Puppy
Young
Adult
Young
Adult
Puppy
Young
Adult
Adult
Adult
Young
Young
Young
Adult
Young
Young
Puppy
Puppy
Puppy
Puppy
Young
Puppy
Puppy
Puppy
Scraping page 3...
Adult
Puppy
Young
Adult
Adult
Young
Adult
Puppy
Puppy
Adult
Adult
Young
Adult
Adult
Young
Young
Adult
Adult
Adult
Puppy
Young
Young
Adult
Adult
Adult
Young
Puppy
Puppy
Puppy
Young
Young
Young
Young
Young
Young
Puppy
Adult
Adult
Adult
Young
Scraping page 4...
Adult
Young
Young
Young
Adult
Puppy
Adult
Young
Adult
Adult
Young
Young
Young
Adult
Adult
Adult
Young
Puppy
Puppy
Puppy
Puppy
Young
Young
Young
Young
Puppy
Adult
Puppy
Puppy
Puppy
Puppy
Adult
Adult
Young


7.Collecting Dog Breeds

In [82]:
# Set up Selenium WebDriver
driver = webdriver.Chrome()  # Replace with the WebDriver for your browser

# Base URL with a placeholder for the page number
base_url = "https://www.petfinder.com/search/dogs-for-adoption/ca/newfoundland-and-labrador/?distance=Anywhere&page={}"

# Initialize an empty list to store dog breeds
all_dog_breeds = []

try:
    # Loop through the first 10 pages
    for page in range(1, 11):  # Adjust range for more pages if needed
        # Construct the URL for the current page
        current_url = base_url.format(page)
        driver.get(current_url)
        print(f"Scraping page {page}...")

        # Allow the page to load
        time.sleep(5)
        
        # Wait for the breed elements to load
        WebDriverWait(driver, 10).until(
            EC.presence_of_all_elements_located((By.XPATH, '//pfdc-pet-card'))
        )
        
        # Locate the breed information using the provided XPath
        dog_breeds_elements = driver.find_elements(By.XPATH, '//pfdc-pet-card/a/div[1]/div/div[2]/ul/li[1]/ul/li[2]/pf-truncate')
        
        # Extract text from each element
        dog_breeds = [breed.text for breed in dog_breeds_elements if breed.text.strip()]
        all_dog_breeds.extend(dog_breeds)  # Append to the main list
        
        # Print the extracted data for the current page
        for breed in dog_breeds:
            print(breed)

except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Print total pets scraped and all collected breeds
    total_scraped = len(all_dog_breeds)
    print(f"\nTotal pets scraped: {total_scraped}")
    
    # Print all collected dog breeds
    print("\nAll collected dog breeds:")
    for idx, breed in enumerate(all_dog_breeds, 1):
        print(f"{idx}: {breed}")
    
    # Close the browser
    driver.quit()


Scraping page 1...
Labrador Retriever Mix
Basset Hound
Pit Bull Terrier
Labrador Retriever Mix
Black Labrador Retriever Mix
Boxer & Black Labrador Retriever Mix
German Shepherd Dog Mix
Catahoula Leopard Dog
Shar-Pei & Catahoula Leopard Dog Mix
Black Labrador Retriever Mix
German Shepherd Dog
Weimaraner
American Bulldog Mix
German Shepherd Dog Mix
American Bulldog Mix
Bulldog & Shar-Pei Mix
Terrier Mix
Chocolate Labrador Retriever Mix
Beagle & Labrador Retriever Mix
Schnauzer Mix
Schnauzer Mix
Schnauzer Mix
Schnauzer Mix
Schnauzer Mix
Pit Bull Terrier Mix
Australian Cattle Dog / Blue Heeler & Beagle Mix
Irish Wolfhound Mix
Labrador Retriever
Husky Mix
Collie Mix
Pit Bull Terrier Mix
Pit Bull Terrier Mix
German Shepherd Dog
Doberman Pinscher
Doberman Pinscher
Doberman Pinscher
Doberman Pinscher
Doberman Pinscher
Hound Mix
Terrier Mix
Scraping page 2...
Pit Bull Terrier Mix
Catahoula Leopard Dog Mix
Catahoula Leopard Dog Mix
Black Labrador Retriever
Black Labrador Retriever
Black Labrador