-
Notifications
You must be signed in to change notification settings - Fork 20
Closed
Labels
Description
This code causes hyperlink to recursively traverse http://www.cc.com despite being called with the -i flag that should ensure that only site internal pages are traversed:
<embed src='http://www.thedailyshow.com/sitewide/video_player/view/default/swf.jhtml'></embed>Output:
$ hyperlink -ri BUG.html
Guessing --root from input files: file:///Users/pbm/
TAP version 13
# Crawling internal assets
ok 1 load BUG.html
ok 2 load http://www.thedailyshow.com/sitewide/video_player/view/default/swf.jhtml
ok 3 load http://www.cc.com/shows/the-daily-show-with-trevor-noahsitewide/video_player/view/default/swf.jhtml
ok 4 load http://www.cc.com/shows
ok 5 load http://www.cc.com/shows/hart-of-the-city
ok 6 load http://www.cc.com/shows/crank-yankers
^C
Looks like a redirect chain from this embed src ends up on a html page, which hyperlink doesn't correctly identify as cross domain.
$ curl -I http://www.thedailyshow.com/sitewide/video_player/view/default/swf.jhtml
HTTP/1.1 301 Moved Permanently
Date: Tue, 12 May 2020 08:12:53 GMT
Content-Type: text/html
Content-Length: 166
Connection: keep-alive
Cache-Control: no-store, no-cache, must-revalidate
Expires: Tue, 12 May 2020 08:12:53 GMT
Location: http://www.cc.com/shows/the-daily-show-with-trevor-noahsitewide/video_player/view/default/swf.jhtml
Server: EasyRedir
$ curl -I http://www.cc.com/shows/the-daily-show-with-trevor-noahsitewide/video_player/view/default/swf.jhtml
HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Server: Apache/2.4.29 (Unix)
X-Powered-By: PHP/7.1.1
Location: /shows
Cache-Control: max-age=60
Date: Tue, 12 May 2020 08:13:22 GMT
Connection: keep-alive