Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
NAME scrape.pl - simple HTML scraping from the command line ABSTRACT This is a simple program to extract data from HTML by specifying CSS3 or XPath selectors. SYNOPSIS scrape.pl URL selector selector ... # Print page title scrape.pl http://perl.org title # The Perl Programming Language - www.perl.org # Print links with titles, make links absolute scrape.pl http://perl.org a //a/@href --uri=2 # Print all links to JPG images, make links absolute scrape.pl http://perl.org a[@href=$"jpg"] DESCRIPTION This program fetches an HTML page and extracts nodes matched by XPath or CSS selectors from it. If URL is `-', input will be read from STDIN. OPTIONS --sep Separator character to use for columns. Default is tab. --uri COLUMNS Numbers of columns to convert into absolute URIs, if the known attributes do not everything you want. --no-uri Switches off the automatic translation to absolute URIs for known attributes like `href' and `src'. REPOSITORY The public repository of this module is http://github.com/Corion/App-scrape. SUPPORT The public support forum of this program is http://perlmonks.org/. AUTHOR Max Maischein `firstname.lastname@example.org' COPYRIGHT (c) Copyright 2011-2011 by Max Maischein `email@example.com'. LICENSE This module is released under the same terms as Perl itself.