# Image scraper

## Step 1: Create the Conda Environment
Open your terminal (on macOS/Linux) or Anaconda Prompt (on Windows) and run the following command to create a new Conda environment named image_scraper with Python 3.11:

```console
conda create --name image_scraper python=3.11
```

## Step 2: Activate the Environment

Activate the newly created environment using:

```console
conda activate image_scraper
```

## Step 3: Install Required Packages

Install the necessary libraries for web scraping (Selenium and Requests), WebDriver Manager for Selenium driver management, and Jupyter to run and manage your notebooks. Here's the command sequence:

```console
conda install selenium requests jupyter -c conda-forge
pip install webdriver-manager
```

Using -c conda-forge specifies that Conda should install these packages from the Conda-Forge repository, which often has more up-to-date packages than the default channel.

## Step 4: Install Jupyter Kernel
To make your image_scraper environment available as a kernel in Jupyter Notebooks, install ipykernel and add it as a kernel:

```console
conda install ipykernel -c conda-forge
python -m ipykernel install --user --name=image_scraper --display-name="Python 3.11 (image_scraper)"
```

This command adds the image_scraper environment as a kernel option in Jupyter, allowing you to select it when working on notebooks related to your image scraping project.

## Step 5: Run Jupyter Notebook

See code below for the image scraper:


In [2]:
%pip install webdriver_manager

Collecting webdriver_manager
  Downloading webdriver_manager-4.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting python-dotenv (from webdriver_manager)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Downloading webdriver_manager-4.0.1-py2.py3-none-any.whl (27 kB)
Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv, webdriver_manager
Successfully installed python-dotenv-1.0.1 webdriver_manager-4.0.1
Note: you may need to restart the kernel to use updated packages.


In [7]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import requests
import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
def download_image(image_url, directory, index):
    """
    Downloads an image from a given URL into a specified directory.
    """
    try:
        response = requests.get(image_url)
        if response.status_code == 200:
            file_path = os.path.join(directory, f'image_{index}.jpg')
            with open(file_path, 'wb') as file:
                file.write(response.content)
            print(f"Downloaded {file_path}")
    except Exception as e:
        print(f"Error downloading {image_url}: {e}")

def scrape_images(search_term, limit):
    """
    Scrapes images from DuckDuckGo based on a search term up to a specified limit.
    """
    # Initialize the Chrome driver
    driver = webdriver.Chrome(ChromeDriverManager().install())

    # Go to DuckDuckGo's image search page
    driver.get(f"https://duckduckgo.com/?q={search_term}&iax=images&ia=images")
    time.sleep(3)  # Allow time for the page to load

    # Scroll the page to ensure that images are loaded
    last_height = driver.execute_script("return document.body.scrollHeight")
    while True:
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(3)  # Wait for new images to load
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height

    # Find image elements on the page
    images = driver.find_elements(By.CSS_SELECTOR, "img.tile--img__img.js-lazyload")
    images = images[:limit]  # Limit the number of images to download

    # Create a directory for the downloaded images
    directory = f"downloaded_images_{search_term.replace(' ', '_')}"
    if not os.path.exists(directory):
        os.makedirs(directory)

    # Download the images
    for index, image in enumerate(images):
        image_url = image.get_attribute('src')
        if image_url:
            download_image(image_url, directory, index)

    driver.quit()
    print(f"Finished downloading images to {directory}.")

# Example usage
search_term = "flowers"  # Specify the search term here
limit = 5  # Specify the maximum number of images to download

scrape_images(search_term, limit)


Downloaded downloaded_images_flowers\image_0.jpg
Downloaded downloaded_images_flowers\image_1.jpg
Downloaded downloaded_images_flowers\image_2.jpg
Downloaded downloaded_images_flowers\image_3.jpg
Downloaded downloaded_images_flowers\image_4.jpg
Finished downloading images to downloaded_images_flowers.


## Step 6: Deactivating the Environment

Once you're done working in the image_scraper environment, you can deactivate it and return to your base environment by running:

```console
conda deactivate
```

## Important Considerations

- This script is for educational purposes. Always respect the website's terms of service and copyright laws.
- The performance and reliability of web scraping scripts can vary based on network conditions and changes to the target website's layout and design.