GrabCartoons

GrabCartoons is a comic-summarizing utility. It is modular, and it is easy to write modules for new comics.

You can see a sample of grabcartoons output here.

Installation

You can download the latest source code for this project in either zip or tar formats. It should run as-is on most modern Perl installations.

You can also clone this git repository:

git clone https://github.com/zzamboni/grabcartoons.git

You can run ./grabcartoons.pl directly from within the source directory, or run make install to install it under /usr/local. You can specify the PREFIX variable if you want to install somewhere else (e.g. make install PREFIX=/some/path).

Grabcartoons works out of the box on Linux/Unix/macOS. Windows is not explicitly supported, but it can be made to work with some changes. See #11 for details.

Usage

Basic usage example:

./grabcartoons.pl sinfest xkcd savage_chickens gocomics.com:gasoline > sample-output.html

And then open sample-output.html in your web browser.

Full set of options:

./grabcartoons.pl --help

GrabCartoons version 2.8.4
Usage: ./grabcartoons.pl [ options ] [ comic_id ...]
    --all       or -a  generate a page with all the known comics on stdout.
    --list [t:] or -l  produce a list of the known comic_id's on stdout. If
                       t: is given, the list of comics from the given template
                       is produced.
    --htmllist [t:]    produce HTML list of known comic_id's on stdout. If
                       t: is given, the list of comics from the given template
                       is produced.
    --file     or -f   read list of comics from specified file.
    --random n         select n comics at random (they will be output after
                       any other comics requested)
    --write    or -w   write output to specified file instead of stdout.
    --version  or -V   print version number
    --verbose  or -v   be verbose
    --help     or -h   print this message.
    --notitles or -t   do not show comic titles (for those that have them)
    --templates        produce a list of defined templates
    --genmodules       for any template specifications (template:comictag),
                       write a snippet to comictag.pl in the directory
                       specified by --genout.
    --genout dir       output directory for generated comics.
                       (default: /Users/taazadi1/.grabcartoons/modules)

By default, it will produce a page with the given comics on stdout.

comic_id can be:
  - Any of the predefined modules (e.g. sinfest, adam_at_home)
  - Of the form 'template:comic title', including quotes if the title has
    spaces (e.g. 'gocomis.com:Citizen Dog', comics.com:Frazz). This will
    generate on the fly a module for the given comic.
  - Of the form 'template:*' or 'template:', which means "all the comics
    from the named template". This can also be passed as argument to
    the --list and --htmllist options to produce the listing from the
    given template instead of from the built-in modules.

Available comics

You can see the list of available comics with using the --list or --htmllist options.

Here’s the list of comics for which we currently have modules:

Abstruse Goose (abstrusegoose)
Achewood (achewood)
Adam@Home (adam_at_home)
A Girl And Her Fed (agirlandherfed)
Alien Loves Predator (alien_loves_predator)
Applegeeks (applegeeks)
A Softer World (asofterworld)
Atland (atland)
Better Book Titles (betterbooktitles)
Bloom County 2019 (bloom-county)
Bloom County (bloom-county-old)
Buttersafe (buttersafe)
Calvin and Hobbes (calvin_and_hobbes)
Camp Weedonwantcha (campcomic)
Cathy Classics (cathy)
Chopping Block (choppingblock)
Cow and Boy (cowandboy)
Ctrl+Alt+Del (ctrlaltdel)
Dan’s Daily Cartoon (danscartoons)
Dick Tracy (dick_tracy)
Diesel Sweeties (diesel_sweeties)
Dilbert (dilbert)
Dinosaur Comics (dinosaur_comics)
Doonesbury (doonesbury)
Errant Story (errantstory)
Extra Ordinary (extraordinary)
Full Frontal Nerdity (ffn)
Formal Sweatpants (formalsweatpants)
FoxTrot (foxtrot)
Garfield (garfield)
Get Fuzzy (getfuzzy)
Glasbergen (glasbergen)
Goats (goats)
Goblins (goblins)
Girls with Slingshots (gws)
Herman (herman)
Irregular Webcomic (irregular)
The Joy of Tech (joy_of_tech)
Junior Scientist Power Hour (jspowerhour)
Kevin and Kell (kevin_and_kell)
The Last Halloween (lasthalloween)
Liberty Meadows (liberty_meadows)
Lighter than Heir (lighter_than_heir)
Little Gamers (little_gamers)
MacHall (machall)
MegaTokyo (megatokyo)
Monty (monty)
Mother Goose & Grimm (mother_goose)
Scenes From A Multiverse (multiverse)
Nedroid (nedroid)
9 to 5 (nine_to_five)
Nodwick (nodwick)
Non Sequitur (non_sequitur)
The Oatmeal (oatmeal)
Off the Mark (offthemark)
Order of the Stick (oots)
Pearls Before Swine (pearls)
Penny Arcade (penny_arcade)
Piled Higher and Deeper (phd)
Power Nap (powernap)
pVp (pvp)
Questionable Content (questionable_content)
Real Life Adventures (real_life_adventures)
Red Meat (redmeat)
Robot Hugs (robot_hugs)
Rose is Rose (rose_is_rose)
Savage Chickens (savage_chickens)
Schlock Mercenary (schlock_mercenary)
Sherman’s Lagoon (sherman)
Shit Happens (shithappens)
Sinfest (sinfest)
Skadi (skadi)
Sluggy Freelance (sluggy_freelance)
Saturday Morning Breakfast Cereal (smbc)
Sufficiently Remarkable (sufficiently_remarkable)
The Trenches (the_trenches)
The Zombie Hunters (the_zombie_hunters)
Three Panel Soul (three_panel_soul)
Toothpaste for Dinner (toothpastefordinner)
Unshelved (unshelved)
User Friendly (user_friendly)
What’s Normal Anyway? (whatsnormalanyway)
Wondermark (wondermark)
xkcd (xkcd)
Zen Pencils (zenpencils)
Ziggy (ziggy)

Templates

GrabCartoons also includes templates that allow you to fetch any comic from a given site or using a common mechanism. At the moment we have the following templates:

Templates defined:
	arcamax.com	Comics hosted at arcamax.com
	comics.com	Comics hosted at gocomics.com
	comicskingdom.com	Comics hosted at comicskingdom.com
	gocomics.com	Comics hosted at gocomics.com
	og-image	Comics that can be extracted from the og:image property on their page

Templates define a common way of fetching all the comics from certain sites (such as comics.com or comicskingdom.com) that host multiple comic strips, or by using a common mechanism (e.g. sites that publish their latest comic using the og:image property). If a template exists, you can easily define new modules for comics from that site, or even request them on the fly without having to write a module, by specifying the comic_id as template:title.

How to define your own comics:

Modules are defined in files with .pl extension which specify where and how to fetch the comic.

Each comic definition is a set of pair/value keys assigned as a Perl hashref to an element of the %COMIC hash. For example:

If the comic is from a site for which a template exists, the definition is even easier, you just have to specify the comic name and the template. For example:

Each template defines how to automatically convert the comic title into a “tag” (which normally becomes part of the URL for the comic). If the automatic conversion does not work appropriately, you can manually specify the tag. For example:

The key used for the %COMIC hash is the “short name” of the comic. The valid fields in the hash are:

Title: title of the comic
Page: URL where to get it
Regex: regex to obtain image, must put the image in $1 (the first parenthesized group)
LinkRelImageSrc: if true, the image URL will be automatically obtained from the first <link rel = "image_src"> element in the page. This is increasingly being used by web comics to ease sharing on Facebook and other sites. If this flag is specified no Regex or other method needs to be specified.
MultipleMatches: if true, then all matches of Regex will be returned, concatenated, after doing any changes specified by SubstOnRegexResult or Prepend / Append on each element. If MultipleMatches is in effect, then the result of $1 + SubstOnRegexResult + Prepend / Append is expected to be an HTML snippet, not just an image URL.
ExtraImgAttrsRegex: regular expression to obtain additional attributes of the comic’s <img> tag. It has to match on the same line that Regex matches. If not specified, a generic text is used for the “alt” image attribute.
TitleRegex: regular expression to capture the title of the comic. It can match on any line before Regex matches. If it does not match, no title is displayed (just the comic name). Only works for comics for which Regex is also defined.
SubstOnRegexResult: an array of two- or three-element array references containing [ regex, string, [global] ]. If specified, the substitution specified by each element will be applied to the string captured by Regex or by StartRegex / EndRegex, before applying any Prepend / Append strings. Each tuple will be applied in the order they are specified. If “global” is given and true, a global replace will be done, otherwise only the first ocurrence will be replaced. The replacement string may include other fields, referenced as {FieldName}.
Prepend/Append: strings to prepend or append to $1 (or to the string captured by StartRegex / EndRegex) before returning it. May make use of other fields, referenced as {FieldName}.
StartRegex/EndRegex: regular expressions that specify the first and last lines to capture. The matching lines are included in the output if InclusiveCapture is true, and not included if InclusiveCapture is false (the default). If EndRegex is not specified, everything from StartRegex to the end of the page is captured. If Regex is also specified, it is only matched for inside the region defined by StartRegex / EndRegex.
InclusiveCapture: true/false value that specifies whether the lines that match StartRegex / EndRegex should be returned in the output. False by default.
RedirectMatch / RedirectURLCapture / RedirectURLAppend / RedirectURLPrepend / MultipleRedirects: These parameters control generalized redirection support. By default, these parameters are set so that standard redirection using the META REFRESH tag is followed, but can be set to redirect on arbitrary patterns. This is how it works: if the RedirectMatch regex matches on any line of the page, then the RedirectURLCapture pattern is applied to the same line, and should contain one capture group which returns the new URL to fetch and use. If RedirectURLAppend / RedirectURLPrepend are specified, these strings are concatenated with the result of the capture group before using it as the new URL. By default the Redirect* patterns are passed NOT along when fetching the new page, to prevent infinite redirection. This behavior can be modified by setting MultipleRedirects to a true value, so that multiple redirects using the same parameters are supported.
StaticURL: static image URL to return
StaticHTML: static HTML snippet to return
Function: a function to call. It receives the comic snippet as argument, and must return ($html, $title, $error).
NoShowTitle: if true, do not display the title of the comic (for those that always have it in the drawing).
Template: if present, specified a template that will be used for this comic (e.g. for comics coming from a single syndicated site, so the mechanism is the same for all of them) Essentially the fields from the template and the $COMIC snippet are merged and then processed in the usual way. If the template contains a _Template_Code atribute, it is executed on the merged snippet before processing it. Templates are defined in the file modules/20templates.pl.

Precedence (from higher to lower) is Function, StaticURL, StaticHTML, StartRegex / EndRegex and Regex.

Both Regex and StartRegex / EndRegex use Page, and optionally Prepend, Append, ExtraImgAttrsRegex, TitleRegex and SubstOnRegexResult.

StartRegex / EndRegex optionally uses InclusiveCapture.

Comic definitions are loaded from the modules directory, from your $HOME/.grabcartoons modules directory, and from any directories (separated by colons) contained in the GRABCARTOONS_DIRS environment variable.

The easiest way is probably to take one of the existing modules and base yours on that.

Contributions

If you develop any new modules, please share them! You can either post them to the project’s issue tracker, or fork the project, add your modules, and submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 460 Commits
modules		modules
.gitignore		.gitignore
ChangeLog		ChangeLog
LICENSE		LICENSE
Makefile		Makefile
README.org		README.org
convert.pl		convert.pl
grabcartoons.pl		grabcartoons.pl
sample-output.html		sample-output.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

modules

modules

.gitignore

.gitignore

ChangeLog

ChangeLog

LICENSE

LICENSE

Makefile

Makefile

README.org

README.org

convert.pl

convert.pl

grabcartoons.pl

grabcartoons.pl

sample-output.html

sample-output.html

Repository files navigation

GrabCartoons

Table of Contents

Installation

Usage

Available comics

Templates

How to define your own comics:

Contributions

Authors

About

Releases

Contributors 2

Languages

License

zzamboni/grabcartoons

Folders and files

Latest commit

History

Repository files navigation

GrabCartoons

Table of Contents

Installation

Usage

Available comics

Templates

How to define your own comics:

Contributions

Authors

About

Resources

License

Stars

Watchers

Forks

Languages