Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Following redirects #56

Open
ZackMattor opened this issue Nov 29, 2016 · 4 comments
Open

Following redirects #56

ZackMattor opened this issue Nov 29, 2016 · 4 comments
Assignees

Comments

@ZackMattor
Copy link

Howdy! Just wondering if i'm implementing this right. I need to follow redirects, and there doesnt seem to be an option toggle so I tried implementing it this way. It seems to work, but would like some feedback!

Spidr.site(@url, max_depth: 2, limit: 20) do |spider|
  spider.every_redirect_page do |page|
    spider.visit_hosts << URI.parse(page.location).host
    spider.enqueue page.location
  end
end
@ZackMattor
Copy link
Author

Seems to throw an error if the location is "index.html" or similar...

@postmodern
Copy link
Owner

Is the error coming from spidr or your code example? page.location grabs the Location header which may not always be absolute. Maybe try page.to_absolute(page.location)?

@chamnap
Copy link

chamnap commented Jun 29, 2017

Probably should add to README.

@postmodern
Copy link
Owner

postmodern commented Jan 29, 2022

Spidr should automatically follow redirects so the above code is redundant. The Page#each_url method converts everything yielded by Page#each_link to an absolute URL. Page#each_link in turn calls Page#each_redirect, which checks for the Location header. If you manually use page.location, it may not also be an absolute URL, so you'll need to call page.to_absolute(page.location).

I might consider adding Page#redirect_urls or Page#location_urls which would return absolute URLs for convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants