### Python Image Scraper Notebook
This notebook is designed to scrape images from specified websites and save them locally. It utilizes libraries such as `requests`, `BeautifulSoup`, and `os` for web scraping and file handling.

In [1]:
%pip install requests beautifulsoup4 ddgs

Note: you may need to restart the kernel to use updated packages.


Simple query to DuckDuckGo search engine using ddgs library. 

In [2]:
from ddgs import DDGS

with DDGS() as ddgs:
    results = ddgs.text("What is python", max_results=5)

    for r in results:
        # print(r["title"])
        print(r["href"])
        # print(r["body"])
        print()


https://en.wikipedia.org/wiki/Python_(programming_language)

https://www.python.org/doc/essays/blurb/

https://www.w3schools.com/python/python_intro.asp

https://www.coursera.org/articles/what-is-python-used-for-a-beginners-guide-to-using-python

https://www.pythontutorial.net/getting-started/what-is-python/



Image search

In [3]:
with DDGS() as ddgs:
    images = ddgs.images("Cute cats", max_results=5)

    for img in images:
        print(img["image"])
        print(img["title"])
        print(img["source"])
        print()

https://www.rd.com/wp-content/uploads/2021/04/GettyImages-138468381-scaled-e1619028416767.jpg
50 Cute Kittens You Need to See | The Cutest Kitten Photos Ever
Bing

https://static.boredpanda.com/blog/wp-content/uploads/2016/08/Cute-kittens-46-57b323088a692__605.jpg
91 Cute Cats To Make Your Heart Melt | Bored Panda
Bing

https://www.rd.com/wp-content/uploads/2021/04/GettyImages-10100201-scaled.jpg?w=2560
50 Cute Kittens You Need to See | The Cutest Kitten Photos Ever
Bing

https://www.rd.com/wp-content/uploads/2021/04/GettyImages-540542926-scaled-e1619016093503.jpg?w=1593
50 Cute Kittens You Need to See | The Cutest Kitten Photos Ever
Bing

https://wallpapertag.com/wallpaper/full/6/c/1/434699-popular-cute-kitten-wallpapers-2560x1440.jpg
Cute Kitten Wallpapers ·① WallpaperTag
Bing



Image Scraping

In [4]:
import requests
import os 

save_folder = "test_images"
os.makedirs(save_folder, exist_ok=True)

def download_image(query, num_images):
    with DDGS() as ddgs:
        results = ddgs.images(
            query=query,
            max_results=num_images
        )

        for i, img in enumerate(results):
            url = img["image"]
            try:
                response = requests.get(url, timeout=10)
                response.raise_for_status()

                filename = f'{save_folder}/{query.replace(" ", "_")}_{i+1}.jpg'

                with open(filename, 'wb') as f:
                    f.write(response.content)
                print(f"Downloaded {filename}")
            except requests.exceptions.RequestException as e:
                print(f"Failed to download image from {url}: {e}")


In [5]:
download_image("tungtungtung sahur", 5)

Downloaded test_images/tungtungtung_sahur_1.jpg
Downloaded test_images/tungtungtung_sahur_2.jpg
Downloaded test_images/tungtungtung_sahur_3.jpg
Downloaded test_images/tungtungtung_sahur_4.jpg
Failed to download image from https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Tung_tung_tung_sahur.webp/500px-Tung_tung_tung_sahur.webp.png: 403 Client Error: Forbidden for url: https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Tung_tung_tung_sahur.webp/500px-Tung_tung_tung_sahur.webp.png


In [7]:
search_images = [
    "tungtungtung sahur",
    "cute cats",
    "beautiful landscapes",
    "modern architecture",
    "delicious food"]

for query in search_images:
    download_image(query, 5)

Downloaded test_images/tungtungtung_sahur_1.jpg
Downloaded test_images/tungtungtung_sahur_2.jpg
Downloaded test_images/tungtungtung_sahur_3.jpg
Downloaded test_images/tungtungtung_sahur_4.jpg
Failed to download image from https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Tung_tung_tung_sahur.webp/500px-Tung_tung_tung_sahur.webp.png: 403 Client Error: Forbidden for url: https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Tung_tung_tung_sahur.webp/500px-Tung_tung_tung_sahur.webp.png
Failed to download image from https://www.rd.com/wp-content/uploads/2021/04/GettyImages-138468381-scaled-e1619028416767.jpg: 403 Client Error: Forbidden for url: https://www.rd.com/wp-content/uploads/2021/04/GettyImages-138468381-scaled-e1619028416767.jpg
Downloaded test_images/cute_cats_2.jpg
Failed to download image from https://www.rd.com/wp-content/uploads/2021/04/GettyImages-10100201-scaled.jpg?w=2560: 403 Client Error: Forbidden for url: https://www.rd.com/wp-content/uploads/2021/04/GettyI