centillion: a document search engine that searches across Github issues, Github pull requests, Github files, Google Drive documents, and Disqus comment threads.
a centillion: a very large number consisting of a 1 with 303 zeros after it.
One centillion is 3.03 log-times better than a googol.
What is centillion
centillion is a search engine that can index different kinds of document collections: Google Documents (.docx files), Google Drive files, Github issues, Github files, Github Markdown files, and Disqus comment threads.
How centillion works
The backend of centillion defines how documents are obtained and how the search index is constructed. centillion builds and updates the search index by using APIs to get the latest versions of documents, and updates its search index accordingly.
The centillion frontend provides a web interface for running queries and interfacing with the search index. (More information)
How to configure centillion
To configure centillion read the full documentation
at http://nih-data-commons.us/centillion/. Some example
configuration files are in the
Before configuring centillion to search your organization's file systems,
we recommend following the Quickstart.
This quickstart will get you started with a centillion instance that is populated with fake documents (avoiding the need to make real API calls). This will allow you to try out centillion before you enable any APIs.
Start by cloning a copy of the repo:
cd git clone https://github.com/dcppc/centillion cd ~/centillion/
(This step is optional but recommended. If you do not have a virtualenv installed, follow these installation instructions.)
Start by setting up a virtual environment, where centillion will be installed:
virtualenv vp source vp/bin/activate
To install centillion, first install the required packages:
pip install -r requirements.txt
Now install centillion:
python setup.py build install
Test that your centillion installation went okay:
python -m centillion
If you see no output, that means centillion has been successfully installed.
If you see an error message, check that you have activated your virtual
Crete a temporary working directory:
mkdir -p /tmp/my-centillion-instance && cd /tmp/my-centillion-instance
Now create a minimal centillion instance with the following Python program:
import centillion app = centillion.webapp.get_flask_app(config_file='config.py') app.run()
config.py file can be copied verbatim from the example
configuration file in the repository:
cp ~/centillion/config/config_centillion.example.py config.py
Now run the centillion instance by running the script:
This will run the webapp on port 5000, so navigate to http://localhost:5000 in the browser.
centillion does not populate the search index, so the first time you run you will not see any documents in the search index.
Before you can use centillion, you must manually populate the search index.
Populate the Search Index:
To populate the search index, visit the control panel:
From here you can re-index the search engine. The example configuration file uses fake documents instead of real API calls, so the reindexing will work even without a network connection. To return to the index, click the centillion banner.
Visit the Master List:
The master list shows a list of every document indexed by centillion. Visit the master list:
Visit the help page for more information about running searches:
Try searching for the following terms to see search results:
Resources for centillion
centillion on Github: https://github.com/dcppc/centillion
centillion documentation: http://dcppc.github.io/centillion