timefind lets you find the exact moment that something was added to a website.
It quickly flips through Web Archive snapshots using binary search, pinpointing the date of the modification.
For example, you can search for the first mention of the iPhone on Apple's homepage:
$ timefind apple.com iphone
Looking for string predicate “iphone”. Getting snapshot list for apple.com... Got 212,432 snapshots, from 96-10-23 18:55:02 to 21-05-30 14:37:26. Checking extremities... Doesn't match: 96-10-23 18:55:02 (http://web.archive.org/web/19961023165502/http://www.apple.com:80/) Matches: 21-05-30 14:37:26 (http://web.archive.org/web/20210530123726/https://www.apple.com/) Searching... Matches: 09-02-10 20:10:32 (http://web.archive.org/web/20090210191032/http://www.apple.com:80/) Doesn't match: 02-12-06 00:43:50 (http://web.archive.org/web/20021205234350/http://www.apple.com:80/?) Doesn't match: 06-01-08 21:04:20 (http://web.archive.org/web/20060108200420/http://apple.com:80/) Matches: 07-07-25 23:28:27 (http://web.archive.org/web/20070725212827/http://www.apple.com/) [...] Doesn't match: 07-01-09 19:48:16 (http://web.archive.org/web/20070109184816/http://www.apple.com/) Doesn't match: 07-01-09 19:48:16 (http://web.archive.org/web/20070109184816/http://www.apple.com/) Bisecting completed! Last non-matching snapshot is 07-01-09 19:48:16 (http://web.archive.org/web/20070109184816/http://www.apple.com/). First matching snapshot is 07-01-10 06:21:28 (http://web.archive.org/web/20070110052128/http://www.apple.com:80/).
Voilà! Click through that last URL (the “first matching snapshot”), and you'll see how the very first iPhone was marketed. Or, follow the preceding link (the “last non-matching snapshot”) to see the website right before the announcement was made.
With Node.js present, install timefind globally by running npm install -g timefind
.
timefind supports macOS, Linux, and Windows.
To use timefind, supply it with a URL to investigate and a string to search for. It'll find the very first appearance of the string.
For instance, this will look for the first mention of “community” on the Elm homepage:
$ timefind elm-lang.org community
timefind's default behavior is fairly specific (it only looks for complete words, in user-visible text, that were added but not removed). Look through the options to pick the right settings for your search.
To quickly scan through thousands of snapshots, timefind relies on binary search, the same algorithm used by the git bisect command.
The downside of this method is that it can only search for changes that do not get reversed. If the change you're looking for was eventually undone—for instance, a promo banner was displayed for a month only—you'll have to restrict the search timeframe for timefind to work.
timefind's default behavior is to look for a string being added, and then never removed, for the complete lifetime of the page. You can change this behavior using options.
Use a different kind of predicate, like a regex or a function, using predicate arguments. Alter search behavior, such as looking for the disappearance of something, using search behavior arguments.
- #1 --string: specify a string predicate
- -r --regex: specify a regular expression predicate
- -f --function: specify a function predicate
- No predicate: interactive mode
- -i --inverse: inverse the predicate
- --oldest, --newest: restrict the search timeframe
- -a --full-source: don't limit search to user-visible text
- -b --no-smart: disable smart matching
The default option. To match, a page must contain the string.
For instance, look for when the Undertale site started mentioning merchandise:
$ timefind undertale.com merch
To match, a page must match the regex.
You can omit the regex's surrounding slashes, except if you want to specify flags. (although by default, most flags are redundant because of smart matching)
Make sure to escape backslashes.
For instance, find when a Wikipedia reached a million articles:
$ timefind wikipedia.org -r '\\d \\d{3} \\d{3}'
The supplied function is called for each page, receiving the page's root Document
node. It must return true
if the page matches.
Make sure to escape backslashes and nested quotes.
For instance, search for the moment where the W3 published their one-thousandth standard:
$ timefind www.w3.org/TR/ -f 'dom => dom.getElementsByClassName("pubdetails").length >= 1000'
If you don't specify a predicate, timefind will be in interactive mode: for each page it considers, it'll open the page in your default browser, and ask you whether or not the page matches.
Use this as a last resort, as this way of searching is significantly slower than non-interactive mode.
You could for instance try to find when Stripe last redesigned their website. Reply with yes when you see their latest design, and no when you see an older one:
$ timefind stripe.com
By flipping the predicate, you can ask timefind to look for the removal of a string, instead of the addition of a string.
For instance, the early Khan Academy website would emphasize SAT preparation, but no longer mentions it at all anymore. You can find out when the SAT stopped being one of their selling points:
$ timefind khanacademy.org 'sat prep' -i
This works with all predicate types.
To search for a change that was eventually reverted, you need to pick a timeframe during which the change was not reverted. The --oldest
and --newest
options let you pick a start date and an end date for the search. You can use only one or both at the same time.
The options accept multiple levels of precision, from 2011-03-12 10:30
, to simply 2011
(which is interpreted as 2011-01-01 00:00
).
For instance, we know that version 5 “Juno” of elementary OS was released some time in 2018. This means “Juno” was mentioned on the project's site for a while, before eventually being replaced by the name of the following release.
If we simply execute timefind elementary.io juno
, the search will fail, as the name no longer appears on the page. But we can assume that it was still there in January 2019, and restrict the search:
$ timefind elementary.io juno --newest 2019
This search works because the truncated timeframe only contains a single transition, from Juno isn't mentioned to Juno is mentioned.
By default, timefind only searches through user-visible text: the text displayed on the page, and the contents of alt
and title
attributes.
The --full-source
option instructs timefind to search the complete raw page source instead.
For instance, you can find out when a video was first added to the Celeste website, by looking for the “youtube” string in the source, which shows up once they start using the YouTube embedded player:
$ timefind celestegame.com youtube -a
By default, for string and regex predicates, timefind performs smart matching:
- case is ignored: uppercase and lowercase are considered equivalent
- runs of whitespace are collapsed into a single space: multiple spaces, line breaks, non-breakable spaces, etc, are all replaced with a single space
- only complete words match: for instance, the predicate “possible” will not match the word “impossible”
The --no-smart
option disables these three behaviors.
Once the search is successful, this causes the two result snapshots to be opened in your browser: the last non-matching snapshot, and the first matching snapshot.
Want to contribute code to timefind? (thank you for considering it!)
Submit a PR and let's talk about it! Or just send feedback—that's very useful too. Open an issue, or send me a note on Mastodon.