Search and browse documents and data; find the people and companies you look for.
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.tx update i18n Oct 11, 2018
aleph
contrib
mappings Fix links for opensanctions. Dec 20, 2018
platform Bump version: 2.3.5 → 2.3.6 Feb 10, 2019
services Document conversion lives in its own repo again Jan 27, 2019
ui
.bumpversion.cfg Bump version: 2.3.5 → 2.3.6 Feb 10, 2019
.dockerignore Make convert-document actually work :( May 15, 2018
.gitignore Implement sitemap XML for SEO, fixes #462. Aug 14, 2018
.travis.yml
Dockerfile Upgrade to latest alephclient Jan 5, 2019
LICENSE.txt
Makefile Document conversion lives in its own repo again Jan 27, 2019
README.rst Refer to slack instead of the mailing list Nov 19, 2018
aleph.env.tmpl revert tmpl changes Jul 19, 2018
babel.cfg Server-side i18n support, refs #333. Mar 17, 2018
docker-compose.dev.yml Expose Elasticsearch Feb 1, 2019
docker-compose.yml Document conversion lives in its own repo again Jan 27, 2019
requirements-generic.txt Remove sentry and upgrade dependencies Jan 8, 2019
requirements-toolkit.txt Use specificity in tags API Jan 27, 2019
setup.py Bump version: 2.3.5 → 2.3.6 Feb 10, 2019

README.rst

Truth cannot penetrate a closed mind. If all places in the universe are in the Aleph, then all stars, all lamps, all sources of light are in it, too.

The Aleph, Jorge Luis Borges

Build Status

Aleph is a tool for indexing large amounts of both documents (PDF, Word, HTML) and structured (CSV, XLS, SQL) data for easy browsing and search. It is built with investigative reporting as a primary use case. Aleph allows cross-referencing mentions of well-known entities (such as people and companies) against watchlists, e.g. from prior research or public datasets.

Here's some key features:

  • Web-based search across large document and data sets.
  • Imports many file formats, including popular office formats, spreadsheets, email and zipped archives. Processing includes optical character recognition, language and encoding detection and named entity extraction.
  • Load structured entity graph data from databases and CSV files. This allows navigation of complex datasets like companies registries, sanctions lists or procurement data. Import tools for OpenSanctions. are included.
  • Receive notifications for new search matches with a personal watchlist.
  • OAuth authorization and access control on a per-source and per-watchlist basis.

Documentation

The documentation for Aleph is available on our Wiki. If you wish to run your own copy of Aleph (or contribute to the development), get started with the installation documentation.

Support

Aleph is used by multiple organisations, including Code for Africa, OCCRP and OpenOil. For coordination, please join the Aleph slack workspace: alephdata

If you find any errors or issues using Aleph please file an issue on GitHub or contact the mailing list.