-
-
Notifications
You must be signed in to change notification settings - Fork 108
Support downloading seed file from URL #852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
4408c5f
to
ba8041f
Compare
…me types and use default ext ensure exceptions logged correctly using formatErr
It looks like GitHub always returns a content-type of This might be harder than it appears at first glance. Alternatively we could use a file characterization tool like Siegfried, which may net much more accurate results once the content is written to a file. |
@ikreymer I reverted the MIME check changes and think this should be good to go now. I'm not sure that we need to enforce a |
Fixes #841
Crawler work toward long URL lists in Browsertrix. This PR moves seed handling from the arg parser's validation step to the crawler's bootstrap step in order to be able to async fetch the seed file from a URL.