Stacks is a text archive created by the Rare Adventist Book Project, which can be used by anyone to create indexes, build concordances, and do amazing searches of historic Adventist and related texts.
Normally, we strive for utter perfection. This is not one of those times. This isn't our certified texts repository, it's OCR text in various states of correction, but more than adequate for searching and indexing.
How many words are currently in Stacks?
Here's an approximation:
find . -type f -name '*.txt' | xargs wc -w
If you want to search this dataset, but have no experience using
grep or any of that:
Download the repo using this link, or choose "Clone Or Download .. Download ZIP" above. [It's better to clone with
git. Downloading will work, but
gitwill help you with updates, which are frequent.]
Consider sending Laurence Anthony $10 as a donation for using AntConc, by clicking this link, then click the "Support This Tool" Button (Please fill in the amount $10) and subscribe to his YouTube Channel. He has excellent tutorials on using AntConc.
Extract the files from
Stacks_master.zipto a location (path) you'll be able to find later. Your personal home directory should be a safe bet. Don't let your OS just stick it somewhere you can't find. You need to be able to find the files.
Run the AntConc program, and use File .. Open File.. or Open Directory to open whatever sources (corpora) you'd like to search.
The downside of direct downloading (rather than by cloning it with
git) is that you'll have to download the whole repository every time there is an addition. This happens a lot sometimes. Also the links, which are painstakingly created to make the texts easy to find and open will likely be broken. If you're going to use this resource in any serious way, see the section below on installing
1: AntConc is a freeware corpus analysis toolkit for concordancing and text analysis. AntConc is free to use, but is licensed through its author, Laurence Anthony, please refer to his website https://www.laurenceanthony.net/software/antconc/ for details.
If your links or broken (missing) it's likely because you've downloaded the repository instead of using
git. Most of this section still applies, but the location of file index.html foreach title will be found in
src/<prefix>/<title>/idx/index.html or with the slashes going the other direction if you're using Windows.
If you want to use an index, such as is typical in the case of periodicals, use a web brower (like Firefox) and enter
in the location bar, then you'll see you're looking at a web page with links. It's pretty important that at this point you know where you put the files. You can navigate to the individual index pages. By clicking on them, or entering something like:
<some path where you extracted the files>/Stacks/Indexes/<title>-index.html
If you've downloaded the repository instead of checked it out (cloned), the main directory will be
Stacks_master. Also, the links may be broken (Windows). In this case you're on your own navigating the
src tree. You'll be able to use all of the texts, it's just going to take longer to find and open files.
Indexes are just html files with links to the original sources on the web. The original is the definitive way to read the documents. The OCR text may be readable, but in the case of rough, raw, or column intruded text (periodicals), be sure to read the originals.
Stacks are just directories which create a collection by linking to files within the src tree of the Archive.
The SRC tree
src tree is where we store some complexity, since most people will only use Stacks, Titles, and Indexes. Any title in the
src directory might contain the following directories:
/ocr/ The text result from the OCR process.
/pro/ The provenance of the document.
/txt/ The OCR text with corrections (if any), wrapped at a width of 72 characters.
/idx/ An optional html index, often in the case of periodicals.
Feel free to rummage around. You can't hurt anything.
Having redundant text in
/txt/ folders, doubles the size of each archive, but it also allow us to easily see differences between them, important if we want to compare back. Large periodical groupings don't have this distinction (yet) and the
/txt/ directory is just a link to the
If you don't understand whats going on in the
/src/ tree, may end up opening both folders and thus get double results (two duplicate files) when searching. for every directory you open. If you can, stick to the links provided in
/Stacks/ to avoid that problem. For Stacks you usually want to Open it as a Directory (AntConc File.. Open Dir) For Titles, File.. Open File(s). Indexs are opened with with a web browser.
Git is free (like freedom) software to mangage things like source code. We're using it mainly for data in this case. You should not be afraid of
git. If you can possibly install git on your platform, it will be a great benefit for keeping your Stacks in sync with the master branch here. Then, any time you wish to update, just issue
git pull from the
Stacks directory, or with whatever Git client you like, and only the new items or changes will be downloaded. That is as much as you'll ever need to know about it to keep your branch in sync, so proceed without fear!
This repository makes a lot of use of symbolic links. If you want them to work on WINDOWS run git-bash as Administrator. Probably something like finding the file below, and right click, "Run as Administrator".
Clone as as below:
git clone -c core.symlinks=true https://github.com/RareAdventistBookProject/Stacks.git
We have tentatively turned on Issue tracking for Stacks, but please remember this is not supported software, it's a data set. We have no time, no paid employees, and no ability to re-create your setup, but we'll DO WHAT WE CAN. Take an interest, google, read, solve your own problems if you can. If you solve a problem that would help others, write it up, and we'll publish your solution.
If you can use git and want to participate (contribute corrections and additions), just sent your pull request or email email@example.com.
License & Stipulations
All material is provided "as is" and without warranty. Additionally, you may never sue us for anything, in any place, for any reason, at any time. You may not abuse any of our server resources or in any way be naughty toward the project. No license is asserted over any individual public corpus. The project as a whole is licensed under Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) https://creativecommons.org/licenses/by-nc/4.0/. This means you cannot take this data and sell it, but you may distribute it, as long as you provide attribution. Any/all software not specificially licensed is issued under GPL-v2. https://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
All texts in this repository were published into the Public Domain or were published prior to 1923 and are in the Public Domain in the United States of America. You are free to use these texts any way you like. If you live outside the United States of America, it is your responsibility to ascertain the copyright status for the jurisdiction in which you live. Please see the provenance information in the
/pro/ folder for each title, although we make no claim of accuracy. The text of the King James Bible (KJV.txt) is provided under the The Project Gutenberg License. If this is a problem for you, there are many other text sources of the KJV available online. Our recommendation is the Blue Letter Bible.
Do nice things.
Sometimes in life you like to do nice things like smelling a flower, petting a kitten or donating to our book acquistion fund.