### XKCD image Scraper Script – Code Documentation

This script automatically downloads the latest XKCD comic images by starting at https://xkcd.com, saving each comic, then following the “Prev” button to older comics. It stops after downloading 10 comics or when it reaches the end of the site.

### 1. Importing Required Libraries

In [None]:
import bs4
import requests
import time
import os


i.bs4 (BeautifulSoup): Parses HTML pages, making it easy to extract elements.

ii.requests: Downloads web pages and images.

iii.time: Adds delays between downloads to avoid hitting the server too quickly.

iv.os: Handles folder creation and file paths.

### 2. Starting URL

In [None]:
url = 'https://xkcd.com'


This is the homepage of XKCD. The script begins here and then moves backward through older comics.

### 3. Create a Folder to Store Images

In [None]:
os.makedirs('xkcd', exist_ok=True)


i.Creates a directory named xkcd.

ii.exist_ok=True prevents errors if the folder already exists.

### 4. Download Limit Counters

In [None]:
num_downloads = 0
max_downloads = 10


The script stops after downloading 10 comics, or earlier if it reaches the last page.

### 5. Main Download Loop

In [None]:
while not url.endswith('#') and num_downloads < max_downloads:


The loop continues as long as:

i.The URL doesn’t end with “#” → XKCD uses # on the final page.

ii.We haven’t downloaded 10 comics yet.

### 6. Download and Parse the Current Page

In [None]:
print(f'Downloading page {url}')
response = requests.get(url)
response.raise_for_status()

soup = bs4.BeautifulSoup(response.text, 'html.parser')


i.Downloads the HTML of the current XKCD page.

ii.raise_for_status() stops the script if the request fails.

iii.BeautifulSoup converts the HTML into a searchable structure.

### 7. Locate the Comic Image

In [None]:
comic_elem = soup.select('#comic img')


Selects the <img> inside the #comic div.

This is where every XKCD comic image is found.

### 8. Handle Missing Image

In [None]:
if comic_elem == []:
    print('Could not find comic image.')


If the comic section is empty, display a warning and continue.

### 9. Build the Image URL

In [None]:
comic_url = 'https:' + comic_elem[0].get('src')
print(f'Downloading image {comic_url}')


XKCD image URLs look like:

//imgs.xkcd.com/comics/example.png


So we prepend https: to create a valid full URL.

### 10. Download and Save the Comic

In [None]:
img_res = requests.get(comic_url)
img_res.raise_for_status()

image_file = open(os.path.join('xkcd', os.path.basename(comic_url)), 'wb')
for chunk in img_res.iter_content(1024):
    image_file.write(chunk)
image_file.close()


i.Requests the actual image file.

ii.Saves it into the xkcd/ folder.

iii.Uses .iter_content(1024) to write the image in 1 KB chunks, preventing memory overload.

### 11. Find the “Previous” Comic Link

In [None]:
prev_link = soup.select('a[rel="prev"]')[0]
url = 'https://xkcd.com' + prev_link.get('href')


i.The “Prev” button is an anchor tag with rel="prev".

ii.The script extracts the link to the earlier comic.

iii.Updates url to point to the previous page.

### 12. Update Counters and Add Delay

In [None]:
num_downloads += 1
time.sleep(1)


i.Increments the download count.

ii.Waits 1 second to avoid hammering the website.

### 13. Completion Message

In [None]:
print('Done')


### Summary (for documentation)

This script downloads the most recent XKCD comics by:

1.Starting at the homepage

2.Extracting the comic image from each page

3.Saving the image to a local folder

4.Following the "Prev" button to older comics

5.Repeating until 10 comics are saved or the start of the archive is reached

It uses BeautifulSoup for HTML parsing, Requests for network calls, and basic file handling to store the images.