This installs Google Chrome and Chromedriver on the Colab VM so Selenium can automate the browser. We also install the Selenium library for Python.

In [None]:
# !apt-get update
# !apt-get install -y chromium-chromedriver
# !cp /usr/lib/chromium-browser/chromedriver /usr/bin
# !pip install selenium


0% [Working]            Hit:1 https://cli.github.com/packages stable InRelease
0% [Connecting to archive.ubuntu.com (185.125.190.83)] [Connecting to security.                                                                               Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:5 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:6 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:8 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,940 kB]
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Get:11 https://r2u.stat.illinois.edu/

We add the path to chromedriver so Python knows where to find it.

In [None]:
# import sys
# sys.path.insert(0, '/usr/lib/chromium-browser/chromedriver')


These libraries help with browser automation (Selenium), time control, data extraction, and CSV export.

In [1]:
import time
import pandas as pd
from datetime import datetime
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException


This class automates Pinterest search using Selenium. It scrolls down until it collects the desired number of pins and extracts details like title, image URL, and more.



In [2]:
class PinterestScraper:
    def __init__(self):
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--no-sandbox')
        options.add_argument('--disable-dev-shm-usage')
        options.add_argument('--window-size=1920,1080')
        options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/114.0.0.0 Safari/537.36')
        self.driver = webdriver.Chrome(options=options)

    def handle_cookie_popup(self):
        try:
            WebDriverWait(self.driver, 5).until(
                EC.element_to_be_clickable((By.CSS_SELECTOR, "button[data-test-id='cookie-banner-accept-button']"))
            ).click()
            print("Cookie popup handled.")
            time.sleep(2)
        except TimeoutException:
            print("No cookie popup found.")

    def _extract_pin_data(self, pin, search_term):
        try:
            pin_link = pin.find_element(By.TAG_NAME, "a").get_attribute("href")
            pin_id = pin_link.split('/')[-2] if pin_link else None
            img = pin.find_element(By.TAG_NAME, "img")
            return {
                'pin_url': pin_link,
                'pin_id': pin_id,
                'title': img.get_attribute("alt"),
                'description': img.get_attribute("alt"),  # Pinterest removed detailed description
                'image_url': img.get_attribute("src"),
                'search_term': search_term,
                'scraped_date': datetime.now().strftime("%Y-%m-%d %H:%M:%S")
            }
        except:
            return None

    def scrape_pins(self, search_term, max_pins=100):
        url = f"https://www.pinterest.com/search/pins/?q={search_term.replace(' ', '%20')}"
        self.driver.get(url)
        time.sleep(3)
        self.handle_cookie_popup()

        pins_data = []
        last_height = self.driver.execute_script("return document.body.scrollHeight")

        while len(pins_data) < max_pins:
            self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            time.sleep(2)
            pins = self.driver.find_elements(By.CSS_SELECTOR, "div[data-test-id='pin']")
            for pin in pins:
                if len(pins_data) >= max_pins:
                    break
                data = self._extract_pin_data(pin, search_term)
                if data and data['pin_url'] not in [p['pin_url'] for p in pins_data]:
                    pins_data.append(data)
                    print(f"\rScraped {len(pins_data)} pins...", end='')

            new_height = self.driver.execute_script("return document.body.scrollHeight")
            if new_height == last_height:
                break
            last_height = new_height

        self.driver.quit()
        return pd.DataFrame(pins_data)


We create an instance of the scraper and call scrape_pins() with a search term and pin limit.

If pins were successfully scraped, we save the data into a CSV file with a timestamp.


In [3]:
SEARCH_TERM = "Vrikshasana"
MAX_PINS = 100  # You can change to 300 if needed

scraper = PinterestScraper()
df = scraper.scrape_pins(SEARCH_TERM, MAX_PINS)

if not df.empty:
    filename = f"pinterest_{SEARCH_TERM.replace(' ', '_')}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
    df.to_csv(filename, index=False)
    print(f"\nScraping complete. Saved to: {filename}")
else:
    print("No pins scraped.")


No cookie popup found.
Scraped 100 pins...
Scraping complete. Saved to: pinterest_Vrikshasana_20250905_211008.csv


This allows the user to download the CSV directly to their local system from Colab.

In [None]:
# from google.colab import files
# files.download(filename)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>