Skip to content

bfmartin/finder_dsi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the Dilbert Strip Index (finder_dsi), used by the Dilbert Strip Finder at http://www.bfmartin.ca/finder.

It contains:

  • The source data in JSON format used to index the content of the Dilbert comic strips so a full-text search can be performed.

  • Some programs written in Ruby to load the data into a MariaDB database, test the validity of the data, and some command-line utilities to report and maintain the data.

You can reach me at http://www.bfmartin.ca/contact

This project is dedicated to all fans of Dilbert and to Scott Adams in particular for creating the comic strip.

Copyright and Licenses

JSON files

The JSON data files (dsistrips.json and dsibooks.json) are copyright 1999 - 2017 by Byron F. Martin. They are licensed under the Creative Commons Attribution 2.5 Canada license. The short version is that you can do whatever you want with these files as long as you give me credit for creating them. A link to my site will be good, though it's not required.

Programs written by me

The program files (Ruby and Rakefile files) that were written by me (which is everything except the bundled module lib/dsi/stem.rb) are donated to the public domain. Do whatever you want with them.

Bundled module

A bundled Ruby module for word stemming (lib/dsi/stem.rb) is copyright by its author(s). Please refer to its documentation for its license information. It was obtained from http://tartarus.org/~martin/PorterStemmer/ruby.txt.

JSON files

Dates

All dates are in the format YYYY-MM-DD.

dsistrips.json

This is the main data file that describes each Dilbert strip. For each strip, the JSON file contains:

  • Date of newspaper publication.

  • Synopsis, a one or two sentence description of the strip.

  • Subject (optional), the main topic.

  • Keywords (optional), many words to describe the action and important concepts.

  • Characters (optional), the characters with speaking parts.

  • Saga (optional), to mark the beginning of three or more strips with a common theme.

  • Notes (optional), to describe important items about the strip, like the first appearance of a character.

  • Comment (optional), text about the strip for my own purposes and is not meant to be displayed in search results.

dsidialog.json

This data is not included here but is generated from some text files available on the Internet. The programs format the data for easy searching.

This file is generated by running the command

    rake dialog:prepare

This will download the necessary file(s) from their respective sites and generate dsidialog.json. See the Rakefile for details on how this happens.

dsibooks.json

NOTE: This file is retired, and is not maintained any more. You can find all strips online, so finding them in books is superfluous. This file stays, but has no books past 2014.

This describes each Dilbert comic collection and its contents. For each book, the file contains (all are required):

  • a one word code id for the book.

  • the book title

  • the layout of a week of strips, beginning with Sunday, and separated by commas. Most books are laid out as 1,3,3 which means:

    • Sunday on one page

    • three more strips on the next page

    • three more strips on the next page.

  • A list containing the start and end dates and the page number that they start on. This is a list because sometimes strips are not in chronological order.

Ruby Requirements

If you like to play around with the Ruby programming language, there are some example programs included in this package. To make them work you will need to have the following software packages installed. Your operating system's package manager should be able to help with these.

  • Ruby, at least version 2.3.

  • Rake

  • The following ruby gems

    • json (to parse JSON files)
    • net/http (to fetch dialog file)
    • optimist (to parse command line options)
    • rake_notes (shows TODOs and FIXMEs in code)

    Try: gem install <gemname>

Database

Some programs refer to a database. The SQL will work with MariaDB, and uses the following table definitions. The search feature relies on MariaDB's full text indexing.

bin directory

There are several programs included to work with the DSI data.

If the program name begins with 'dsi', then its purpose is to maintain or report on the dsi data.

  • dsi-generate-dialog.rb

    this will read raw dialog files (as downloaded from different web locations) and merge them into a standard format JSON file.

  • dsi-key.rb

    this will read the dsistrips json file and format it for printing. it takes the keywords or characters or subjects, and prints a list of dates that contain that item.

  • dsi-notes.rb

    this will read the dsistrips json file and print all items with notes. Useful for browsing and debugging.

If the program name begins with 'finder', then it generates data to update the Finder web site.

  • finder-load.rb

    reads dsistrips and creates lines to be loaded into the bfmartin.ca database. See the database schema in the README. also loads dialog if available, otherwise inserts nulls.

  • finder-reindex.sh

    a wrapper to do everything. it will download all required files, reformat them, and load the database.

Feedback

If you have questions or feedback, you can reach me at http://www.bfmartin.ca/contact

You can visit the Dilbert Strip Finder web site at http://www.bfmartin.ca/finder

About

Text descriptions for Dilbert comic strips used by the Strip Finder

Resources

Stars

Watchers

Forks

Packages

No packages published