Srcset images not being archived

grab-site seems to not archive the \<picture\>\<srcset\> URL in a Substack blog that I tried the tool on. I believe this may be an issue in [wpull](https://github.com/ArchiveTeam/ludios_wpull).

![image](https://github.com/user-attachments/assets/863223b2-2694-4aa0-a981-0a3457d431d4)

![image](https://github.com/user-attachments/assets/966fc14b-bbbe-442e-b0b6-caf1a540f3b3)

### Step-by-step reproduction instructions

First I run:
```
grab-site --level=2 --concurrency=20 --page-requisites-level=2 --import-ignores=$(pwd)/ignores 'https://promptingweekly.substack.com/p/prompting-principle-if-youre-fighting' 'https://substackcdn.com/bundle/assets/store.modern-3dec36e9.js' 'https://substack-post-media.s3.amazonaws.com/public/images/4206cf36-9fcc-4b06-95e1-d751f9f4c3b7_388x388.jpeg'
```

I include these other two URLs so that their domain names shouldn't be considered "offsite".

The contents of the ignores file is:
```
platform.openai.com
reddit.com
discord.com
discordapp.com
^https?://[^p][^.]+.substack.com
shopify.com
^https://static.airtable.com/esbuild/by_sha
https://promptingweekly.substack.com/account\?utm_medium=web&utm_source=subscribe-widget
https://promptingweekly.substack.com/p/[^?/]+\?utm_source=substack&utm_medium=email&utm_content=share&action=share&token=
```

Then I open the archive using ReplayWeb.page-2.2.4.AppImage, and navigate to the page: `https://promptingweekly.substack.com/p/prompting-principle-if-youre-fighting`

You can download the WARC here: https://drive.google.com/file/d/1fJuWwgSTVfh9IdD47RC2lw67tWSryG4S/view?usp=sharing

### Appearance of replayed page

There are several images on the page that directly get displayed when opening the live site. However, archiving the page with grab-site and replaying with [ReplayWeb.page](https://github.com/webrecorder/replayweb.page), the images do not load directly, appearing as broken images or blank spaces.

Archived:
![image](https://github.com/user-attachments/assets/80c57f63-2d86-4791-ba22-3003b1a306c9)

Live site:
![image](https://github.com/user-attachments/assets/4c8363ba-17d2-46f3-8d83-edbad293e6a8)

Archived:
![image](https://github.com/user-attachments/assets/a328cc89-fae2-4012-947c-86a23fa6b59c)

Live site:
![image](https://github.com/user-attachments/assets/149c462b-0ef9-4abc-b4d5-46e05f651ee0)

The same issues are observed with [pywb](https://github.com/Webrecorder/pywb)

In addition, some scripts don't work properly. When navigating to the previous or next blog page, ReplayWeb.page will first display a page saying "Post not found". Refreshing the page will make it load properly (but still with the missing images).

![image](https://github.com/user-attachments/assets/bcbbdb3e-a3b7-470c-9be7-6cbfcb7ee874)

My belief is that both the missing images and the script errors are caused by missing files in the crawl.

### Additional details

I run Ubuntu 20.04 LTS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Srcset images not being archived #243

Step-by-step reproduction instructions

Appearance of replayed page

Additional details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Srcset images not being archived #243

Description

Step-by-step reproduction instructions

Appearance of replayed page

Additional details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions