Skip to content

zzamboni/grabcartoons

Repository files navigation

GrabCartoons

GrabCartoons is a comic-summarizing utility. It is modular, and it is easy to write modules for new comics.

(ChangeLog)

You can see a sample of grabcartoons output here.

Table of Contents

Installation

You can download the latest source code for this project in either zip or tar formats. It should run as-is on most modern Perl installations.

You can also clone this git repository:

git clone https://github.com/zzamboni/grabcartoons.git

You can run ./grabcartoons.pl directly from within the source directory, or run make install to install it under /usr/local. You can specify the PREFIX variable if you want to install somewhere else (e.g. make install PREFIX=/some/path).

Grabcartoons works out of the box on Linux/Unix/macOS. Windows is not explicitly supported, but it can be made to work with some changes. See #11 for details.

Usage

Basic usage example:
./grabcartoons.pl sinfest xkcd savage_chickens gocomics.com:gasoline > sample-output.html

And then open sample-output.html in your web browser.

Full set of options:

./grabcartoons.pl --help
GrabCartoons version 2.8.4
Usage: ./grabcartoons.pl [ options ] [ comic_id ...]
    --all       or -a  generate a page with all the known comics on stdout.
    --list [t:] or -l  produce a list of the known comic_id's on stdout. If
                       t: is given, the list of comics from the given template
                       is produced.
    --htmllist [t:]    produce HTML list of known comic_id's on stdout. If
                       t: is given, the list of comics from the given template
                       is produced.
    --file     or -f   read list of comics from specified file.
    --random n         select n comics at random (they will be output after
                       any other comics requested)
    --write    or -w   write output to specified file instead of stdout.
    --version  or -V   print version number
    --verbose  or -v   be verbose
    --help     or -h   print this message.
    --notitles or -t   do not show comic titles (for those that have them)
    --templates        produce a list of defined templates
    --genmodules       for any template specifications (template:comictag),
                       write a snippet to comictag.pl in the directory
                       specified by --genout.
    --genout dir       output directory for generated comics.
                       (default: /Users/taazadi1/.grabcartoons/modules)

By default, it will produce a page with the given comics on stdout.

comic_id can be:
  - Any of the predefined modules (e.g. sinfest, adam_at_home)
  - Of the form 'template:comic title', including quotes if the title has
    spaces (e.g. 'gocomis.com:Citizen Dog', comics.com:Frazz). This will
    generate on the fly a module for the given comic.
  - Of the form 'template:*' or 'template:', which means "all the comics
    from the named template". This can also be passed as argument to
    the --list and --htmllist options to produce the listing from the
    given template instead of from the built-in modules.

Available comics

You can see the list of available comics with using the --list or --htmllist options.

Here’s the list of comics for which we currently have modules:

Templates

GrabCartoons also includes templates that allow you to fetch any comic from a given site or using a common mechanism. At the moment we have the following templates:

Templates defined:
	arcamax.com	Comics hosted at arcamax.com
	comics.com	Comics hosted at gocomics.com
	comicskingdom.com	Comics hosted at comicskingdom.com
	gocomics.com	Comics hosted at gocomics.com
	og-image	Comics that can be extracted from the og:image property on their page

Templates define a common way of fetching all the comics from certain sites (such as comics.com or comicskingdom.com) that host multiple comic strips, or by using a common mechanism (e.g. sites that publish their latest comic using the og:image property). If a template exists, you can easily define new modules for comics from that site, or even request them on the fly without having to write a module, by specifying the comic_id as template:title.

How to define your own comics:

Modules are defined in files with .pl extension which specify where and how to fetch the comic.

Each comic definition is a set of pair/value keys assigned as a Perl hashref to an element of the %COMIC hash. For example:

If the comic is from a site for which a template exists, the definition is even easier, you just have to specify the comic name and the template. For example:

Each template defines how to automatically convert the comic title into a “tag” (which normally becomes part of the URL for the comic). If the automatic conversion does not work appropriately, you can manually specify the tag. For example:

The key used for the %COMIC hash is the “short name” of the comic. The valid fields in the hash are:

Title
title of the comic
Page
URL where to get it
Regex
regex to obtain image, must put the image in $1 (the first parenthesized group)
LinkRelImageSrc
if true, the image URL will be automatically obtained from the first <link rel = "image_src"> element in the page. This is increasingly being used by web comics to ease sharing on Facebook and other sites. If this flag is specified no Regex or other method needs to be specified.
MultipleMatches
if true, then all matches of Regex will be returned, concatenated, after doing any changes specified by SubstOnRegexResult or Prepend / Append on each element. If MultipleMatches is in effect, then the result of $1 + SubstOnRegexResult + Prepend / Append is expected to be an HTML snippet, not just an image URL.
ExtraImgAttrsRegex
regular expression to obtain additional attributes of the comic’s <img> tag. It has to match on the same line that Regex matches. If not specified, a generic text is used for the “alt” image attribute.
TitleRegex
regular expression to capture the title of the comic. It can match on any line before Regex matches. If it does not match, no title is displayed (just the comic name). Only works for comics for which Regex is also defined.
SubstOnRegexResult
an array of two- or three-element array references containing [ regex, string, [global] ]. If specified, the substitution specified by each element will be applied to the string captured by Regex or by StartRegex / EndRegex, before applying any Prepend / Append strings. Each tuple will be applied in the order they are specified. If “global” is given and true, a global replace will be done, otherwise only the first ocurrence will be replaced. The replacement string may include other fields, referenced as {FieldName}.
Prepend/Append
strings to prepend or append to $1 (or to the string captured by StartRegex / EndRegex) before returning it. May make use of other fields, referenced as {FieldName}.
StartRegex/EndRegex
regular expressions that specify the first and last lines to capture. The matching lines are included in the output if InclusiveCapture is true, and not included if InclusiveCapture is false (the default). If EndRegex is not specified, everything from StartRegex to the end of the page is captured. If Regex is also specified, it is only matched for inside the region defined by StartRegex / EndRegex.
InclusiveCapture
true/false value that specifies whether the lines that match StartRegex / EndRegex should be returned in the output. False by default.
RedirectMatch / RedirectURLCapture / RedirectURLAppend / RedirectURLPrepend / MultipleRedirects
These parameters control generalized redirection support. By default, these parameters are set so that standard redirection using the META REFRESH tag is followed, but can be set to redirect on arbitrary patterns. This is how it works: if the RedirectMatch regex matches on any line of the page, then the RedirectURLCapture pattern is applied to the same line, and should contain one capture group which returns the new URL to fetch and use. If RedirectURLAppend / RedirectURLPrepend are specified, these strings are concatenated with the result of the capture group before using it as the new URL. By default the Redirect* patterns are passed NOT along when fetching the new page, to prevent infinite redirection. This behavior can be modified by setting MultipleRedirects to a true value, so that multiple redirects using the same parameters are supported.
StaticURL
static image URL to return
StaticHTML
static HTML snippet to return
Function
a function to call. It receives the comic snippet as argument, and must return ($html, $title, $error).
NoShowTitle
if true, do not display the title of the comic (for those that always have it in the drawing).
Template
if present, specified a template that will be used for this comic (e.g. for comics coming from a single syndicated site, so the mechanism is the same for all of them) Essentially the fields from the template and the $COMIC snippet are merged and then processed in the usual way. If the template contains a _Template_Code atribute, it is executed on the merged snippet before processing it. Templates are defined in the file modules/20templates.pl.

Precedence (from higher to lower) is Function, StaticURL, StaticHTML, StartRegex / EndRegex and Regex.

Both Regex and StartRegex / EndRegex use Page, and optionally Prepend, Append, ExtraImgAttrsRegex, TitleRegex and SubstOnRegexResult.

StartRegex / EndRegex optionally uses InclusiveCapture.

Comic definitions are loaded from the modules directory, from your $HOME/.grabcartoons modules directory, and from any directories (separated by colons) contained in the GRABCARTOONS_DIRS environment variable.

The easiest way is probably to take one of the existing modules and base yours on that.

Contributions

If you develop any new modules, please share them! You can either post them to the project’s issue tracker, or fork the project, add your modules, and submit a pull request.

Authors

About

GrabCartoons is a comic-summarizing utility. It is modular, and it is very easy to write modules for new comics.

Resources

License

Stars

Watchers

Forks