Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: garbage collection / dead file detection #191

Open
jameshfisher opened this issue Dec 18, 2020 · 7 comments
Open

Feature request: garbage collection / dead file detection #191

jameshfisher opened this issue Dec 18, 2020 · 7 comments

Comments

@jameshfisher
Copy link

I recently wanted to detect and clear out all the old files in my static site. I did this by

  1. running hyperlink
  2. munging its text output to get a list of files accessible from the roots
  3. using find to list all files in my static site
  4. using diff to list those files in my static site that are not accessible from the roots
  5. deleting those files

Hyperlink was very useful here, but I think it would be cool if the feature was built in, or if there was a good example in the README for how to do this with hyperlink. Ultimately I'd like to have this test in my CI to ensure I keep the site clean.

(Not posting my script here because it's awful messy, mostly due to scraping hyperlink's stdout!)

@papandreou
Copy link
Collaborator

That's a great idea. If you're naming all your entry point pages, it should be fairly easy to diff the set of visited asset file:// urls against what's on disc (minus favicon.ico, which might not be explicitly referenced 😼 )

@jameshfisher
Copy link
Author

Also minus things like 404.html, which probably shouldn't be explicitly referenced 😅

@papandreou
Copy link
Collaborator

You probably still want to check that your 404.html doesn't contain broken links, so that should already be in the set of entrypoints that you're using hyperlink to check? 🤔

@jameshfisher
Copy link
Author

That's ....... actually a good point 😬

@Munter
Copy link
Owner

Munter commented Dec 18, 2020

I like the idea, but I'm actually not sure hyperlink is the best tools for it. Probably better to create a new tool that runs an assetgraph population based on entry points like hyperlink and a population query of { followRelations: { crossorigin: false }} and just list the files. Unless your site is huge, in which case the hyperlink custom population that tries to keep memory usage low during the run might be needed

@papandreou
Copy link
Collaborator

Sure, a separate tool is also a possiblity. But hyperlink already maintains a set of processed urls so that it won't re-process already visited assets after they've become unloaded:

const processedUrls = new Set();

It's also a plus that hyperlink already has the pretty mode implemented.

There'll be a challenge with the followSourceMaps: false mode for projects that contain original sources, but other than that it seems like low hanging fruit.

That being said, I'd be happy to make a separate tool if you can come up with a good name that's not already taken in the npm registry :)

@papandreou
Copy link
Collaborator

orphans isn't taken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants