Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes substack pages have malformed image exports that break migrator #962

Closed
randyau opened this issue Dec 30, 2023 · 2 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@randyau
Copy link

randyau commented Dec 30, 2023

Using the CLI migrate tool on my substack and about half of them have broken feature images. Same thing also happens with the Beta migrator tool in Labs since it's probably using this exact same code

migrate substack -v
0.36.2

The feature_image export in ghost-import.json features a CDN's URL instead of the expected local scraped copy.
Following the CDN link yields an "Access Denied" error

"title": "We might not see leap seconds after 2035 🤯",
...
"feature_image": "https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7c193e59-fad6-49df-b659-b16976e1ce59_1024x683.jpeg",

I went to the originating post in the exported html from Substack (exported 2023-12-28), and the top image that should've been converted to the featured_image is this img tag. Looks like the img sources the "bucketeer" AWS host that is the broken url being imported, and also has a data-attrs referencing the same broken url. Not sure which one the migrating tool is pulling. The tag also provides a raft of srcsets to actual images that are downloadable, so the html actually displays in a browser.

<img src="https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7c193e59-fad6-49df-b659-b16976e1ce59_1024x683.jpeg" 

width="1200" 
height="800.390625" 

data-attrs="{&quot;src&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/7c193e59-fad6-49df-b659-b16976e1ce59_1024x683.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:683,&quot;width&quot;:1024,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:188889,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null}" 

class="sizing-large" 
alt="" 

srcset="https://substackcdn.com/image/fetch/w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c193e59-fad6-49df-b659-b16976e1ce59_1024x683.jpeg 424w, 
https://substackcdn.com/image/fetch/w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c193e59-fad6-49df-b659-b16976e1ce59_1024x683.jpeg 848w, 
https://substackcdn.com/image/fetch/w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c193e59-fad6-49df-b659-b16976e1ce59_1024x683.jpeg 1272w, 
https://substackcdn.com/image/fetch/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F7c193e59-fad6-49df-b659-b16976e1ce59_1024x683.jpeg 1456w" 

sizes="100vw" 
fetchpriority="high">

The problem seems to affect all my posts prior to around January 2023, but it's not clear why there's a difference at all.

example of a broken html file from the export here
buggy_html.zip

@PaulAdamDavis
Copy link
Member

Hi @randyau,

Thanks for the detailed report and sample file! 🙌

I've had a quick look and can see a solution, which I'll get implemented & released to the CLI tools and beta migratory soon. I'll update this issue when that's done.

@PaulAdamDavis PaulAdamDavis added the bug Something isn't working label Dec 30, 2023
@PaulAdamDavis PaulAdamDavis self-assigned this Dec 30, 2023
PaulAdamDavis added a commit that referenced this issue Jan 2, 2024
@PaulAdamDavis
Copy link
Member

This is now fixed and released in @tryghost/migrate@0.37.0 & @tryghost/mg-substack@0.4.0, and in the self-service migration tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants