-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e7dfe89
commit 20b0235
Showing
1 changed file
with
50 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
#!/usr/bin/env ruby | ||
# Author: Benjamin Oakes <hello@benjaminoakes.com> | ||
|
||
require 'open-uri' | ||
|
||
require 'rubygems' | ||
require 'hpricot' | ||
|
||
def show_usage | ||
STDERR.puts <<EOF | ||
Usage: #{$PROGRAM_NAME} [--help] extension html_url1 [ html_url2 ... html_urlN ] | ||
Extract links from a webpage by filetype. Nice for passing off to wget or curl. | ||
--help Show this help text. | ||
extension File extension to look for (e.g., mp3, pdf, etc.) | ||
html_urls Links to HTML documents to scan. | ||
Example: | ||
$ #{$PROGRAM_NAME} mp3 http://johnvanderslice.com/romanian-names | ||
http://johnvanderslice.com/mp3sjv/romanian-names/Fetal%20Horses.mp3 | ||
http://johnvanderslice.com/mp3sjv/romanian-names/Too%20Much%20Time.mp3 | ||
EOF | ||
exit(-1) | ||
end | ||
|
||
if ARGV.any? { |a| '--help' == a } | ||
show_usage | ||
end | ||
|
||
extension = ARGV.shift | ||
|
||
if ARGV.empty? | ||
show_usage | ||
end | ||
|
||
ARGV.each do |html_url| | ||
doc = Hpricot.parse(open(html_url)) | ||
|
||
# TODO would it be better to just search for URLs instead? | ||
doc.search('a').each do |anchor| | ||
href = anchor['href'] # TODO Probably need to add the server in some cases | ||
|
||
if href.match(/\.#{extension}$/i) | ||
puts href | ||
end | ||
end | ||
end | ||
|