Skip to content
main
Switch branches/tags
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

timefind

timefind lets you find the exact moment that something was added to a website.
It quickly flips through Web Archive snapshots using binary search, pinpointing the date of the modification.

For example, you can search for the first mention of the iPhone on Apple's homepage:

$ timefind apple.com iphone
Looking for string predicate “iphone”.

Getting snapshot list for apple.com...
Got 212,432 snapshots, from 96-10-23 18:55:02 to 21-05-30 14:37:26.

Checking extremities...
Doesn't match: 96-10-23 18:55:02 (http://web.archive.org/web/19961023165502/http://www.apple.com:80/)
Matches: 21-05-30 14:37:26 (http://web.archive.org/web/20210530123726/https://www.apple.com/)

Searching...
Matches: 09-02-10 20:10:32 (http://web.archive.org/web/20090210191032/http://www.apple.com:80/)
Doesn't match: 02-12-06 00:43:50 (http://web.archive.org/web/20021205234350/http://www.apple.com:80/?)
Doesn't match: 06-01-08 21:04:20 (http://web.archive.org/web/20060108200420/http://apple.com:80/)
Matches: 07-07-25 23:28:27 (http://web.archive.org/web/20070725212827/http://www.apple.com/)
[...]
Doesn't match: 07-01-09 19:48:16 (http://web.archive.org/web/20070109184816/http://www.apple.com/)
Doesn't match: 07-01-09 19:48:16 (http://web.archive.org/web/20070109184816/http://www.apple.com/)

Bisecting completed!
Last non-matching snapshot is 07-01-09 19:48:16 (http://web.archive.org/web/20070109184816/http://www.apple.com/).
First matching snapshot is 07-01-10 06:21:28 (http://web.archive.org/web/20070110052128/http://www.apple.com:80/).

Voilà! Click through that last URL (the “first matching snapshot”), and you'll see how the very first iPhone was marketed. Or, follow the preceding link (the “last non-matching snapshot”) to see the website right before the announcement was made.

🛠 Usage

Installing timefind

With Node.js present, install timefind globally by running npm install -g timefind.

timefind supports macOS, Linux, and Windows.

Performing a search

To use timefind, supply it with a URL to investigate and a string to search for. It'll find the very first appearance of the string.
For instance, this will look for the first mention of “community” on the Elm homepage:

$ timefind elm-lang.org community

timefind's default behavior is fairly specific (it only looks for complete words, in user-visible text, that were added but not removed). Look through the options to pick the right settings for your search.

💬 Important note: binary search

To quickly scan through thousands of snapshots, timefind relies on binary search, the same algorithm used by the git bisect command.
The downside of this method is that it can only search for changes that do not get reversed. If the change you're looking for was eventually undone—for instance, a promo banner was displayed for a month only—you'll have to restrict the search timeframe for timefind to work.

🎛 Options

timefind's default behavior is to look for a string being added, and then never removed, for the complete lifetime of the page. You can change this behavior using options.

Use a different kind of predicate, like a regex or a function, using predicate arguments. Alter search behavior, such as looking for the disappearance of something, using search behavior arguments.

Predicate types

Search behavior

#1 --string: specify a string predicate

The default option. To match, a page must contain the string.

For instance, look for when the Undertale site started mentionned merchandise:

$ timefind undertale.com merch

-r --regex: specify a regular expression predicate

To match, a page must match the regex.
You can omit the regex's surrounding slashes, except if you want to specify flags. (although by default, most flags are redundant because of smart matching
Make sure to escape backslashes.

For instance, find when a Wikipedia reached a million articles:

$ timefind wikipedia.org -r '\\d \\d{3} \\d{3}'

-f --function: specify a function predicate

The supplied function is called for each page, receiving the page's root Document node. It must return true if the page matches.
Make sure to escape backslashes and nested quotes.

For instance, search for the moment where the W3 published their one-thousandth standard:

$ timefind www.w3.org/TR/ -f 'dom => dom.getElementsByClassName("pubdetails").length >= 1000'

No predicate: interactive mode

If you don't specify a predicate, timefind will be in interactive mode: for each page it considers, it'll open the page in your default browser, and ask you whether or not the page matches.

Use this as a last resort, as this way of searching is significantly slower than non-interactive mode.

You could for instance try to find when Stripe last redesigned their website. Reply yes when you see the new design, and no when you see an older one:

$ timefind stripe.com

-i --inverse: inverse the predicate

By flipping the predicate, you can ask timefind to look for the removal of a string, instead of the addition of a string.

For instance, the early Khan Academy website would emphasize SAT preparation, but no longer mentions it at all anymore. You can find out when the SAT stopped being one of their selling points:

$ timefind khanacademy.org 'sat prep' -i

This works with all predicate types.

--oldest, --newest: restrict the search timeframe

To search for a change that was eventually reverted, you need to pick a timeframe during which the change was not reverted. The --oldest and --newest options let you pick a start date and an end date for the search. You can use only one or both at the same time.

The options accept multiple levels of precision, from 2011-03-12 10:30, to simply 2011 (which is interpreted as 2011-01-01 00:00).

For instance, we know that version 5 “Juno” of elementary OS was released some time in 2018. This means “Juno” was mentioned on the project's site for a while, before eventually being replaced by the name of the following release.
If we simply execute timefind elementary.io juno, the search will fail, as the name no longer appears on the page. But we can assume that it was still there in January 2019, and restrict the search:

$ timefind elementary.io juno --newest 2019

This search works because the truncated timeframe only contains a single transition, from Juno isn't mentioned to Juno is mentioned.

-a --full-source: don't limit search to user-visible text

By default, timefind only searches through user-visible text: the text displayed on the page, and the contents of alt and title attributes.
The --full-source option instructs timefind to search the complete raw page source instead.

For instance, you can find out when a video was first added to the Celeste website, by looking for the “youtube” string in the source, which shows up once they start using the YouTube embedded player:

$ timefind celestegame.com youtube -a

-b --no-smart: disable smart matching

By default, for string and regex predicates, timefind performs smart matching:

  • case is ignored: uppercase and lowercase are considered equivalent
  • runs of whitespace are collapsed into a single space: multiple spaces, line breaks, non-breakable spaces, etc, are all replaced with a single space
  • only complete words match: for instance, the predicate “possible” will not match the word “impossible”

The --no-smart option disables these three behaviors.

👩🏿‍💻 Contributing

If you'd like to contribute code to timefind (thank you for considering it!), be warned: timefind is written using Tasklemon. The API is nice and all, but Tasklemon wasn't meant for creating npm packages; the main resulting limitation is that the source code pretty much has to be all contained within a single file.

About

Search a website's history.

Resources

License

Packages

No packages published