Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Simple HTML scraping from the command line
Perl
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
lib/App
t
.gitignore
Changes
MANIFEST
MANIFEST.skip
MYMETA.json
MYMETA.yml
Makefile.PL
README

README

NAME
    scrape.pl - simple HTML scraping from the command line

ABSTRACT
    This is a simple program to extract data from HTML by specifying CSS3 or
    XPath selectors.

SYNOPSIS
        scrape.pl URL selector selector ...

        # Print page title
        scrape.pl http://perl.org title
        # The Perl Programming Language - www.perl.org

        # Print links with titles, make links absolute
        scrape.pl http://perl.org a //a/@href --uri=2
    
        # Print all links to JPG images, make links absolute
        scrape.pl http://perl.org a[@href=$"jpg"]

DESCRIPTION
    This program fetches an HTML page and extracts nodes matched by XPath or
    CSS selectors from it.

    If URL is `-', input will be read from STDIN.

OPTIONS
    --sep
        Separator character to use for columns. Default is tab.

    --uri COLUMNS
        Numbers of columns to convert into absolute URIs, if the known
        attributes do not everything you want.

    --no-uri
        Switches off the automatic translation to absolute URIs for known
        attributes like `href' and `src'.

REPOSITORY
    The public repository of this module is
    http://github.com/Corion/App-scrape.

SUPPORT
    The public support forum of this program is http://perlmonks.org/.

AUTHOR
    Max Maischein `corion@cpan.org'

COPYRIGHT (c)
    Copyright 2011-2011 by Max Maischein `corion@cpan.org'.

LICENSE
    This module is released under the same terms as Perl itself.

Something went wrong with that request. Please try again.