Skip to content

embed src causes internal traversal to extend to foreign domain #181

@Munter

Description

@Munter

This code causes hyperlink to recursively traverse http://www.cc.com despite being called with the -i flag that should ensure that only site internal pages are traversed:

<embed src='http://www.thedailyshow.com/sitewide/video_player/view/default/swf.jhtml'></embed>

Output:

$ hyperlink -ri BUG.html
Guessing --root from input files: file:///Users/pbm/
TAP version 13
# Crawling internal assets
ok 1 load BUG.html
ok 2 load http://www.thedailyshow.com/sitewide/video_player/view/default/swf.jhtml
ok 3 load http://www.cc.com/shows/the-daily-show-with-trevor-noahsitewide/video_player/view/default/swf.jhtml
ok 4 load http://www.cc.com/shows
ok 5 load http://www.cc.com/shows/hart-of-the-city
ok 6 load http://www.cc.com/shows/crank-yankers
^C

Looks like a redirect chain from this embed src ends up on a html page, which hyperlink doesn't correctly identify as cross domain.

$ curl -I http://www.thedailyshow.com/sitewide/video_player/view/default/swf.jhtml
HTTP/1.1 301 Moved Permanently
Date: Tue, 12 May 2020 08:12:53 GMT
Content-Type: text/html
Content-Length: 166
Connection: keep-alive
Cache-Control: no-store, no-cache, must-revalidate
Expires: Tue, 12 May 2020 08:12:53 GMT
Location: http://www.cc.com/shows/the-daily-show-with-trevor-noahsitewide/video_player/view/default/swf.jhtml
Server: EasyRedir

$ curl -I http://www.cc.com/shows/the-daily-show-with-trevor-noahsitewide/video_player/view/default/swf.jhtml
HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Server: Apache/2.4.29 (Unix)
X-Powered-By: PHP/7.1.1
Location: /shows
Cache-Control: max-age=60
Date: Tue, 12 May 2020 08:13:22 GMT
Connection: keep-alive

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions