Skip to content
🔍 Flexible & Powerful Question Answering Framework. Leveraging latest NLP models for the industry.
Python Jupyter Notebook Dockerfile
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs/img Update code snippet in docs Jan 24, 2020
haystack Add Elasticsearch Document Store (#13) Jan 24, 2020
test
tutorials Add Elasticsearch Document Store (#13) Jan 24, 2020
.gitignore Add method to train a reader on custom data (#5) Jan 23, 2020
.travis.yml Add TravisCI Nov 27, 2019
Dockerfile switch name from farm_haystack to haystack Nov 27, 2019
LICENSE Initial Commit Nov 14, 2019
MANIFEST.in Add MANIFEST Nov 27, 2019
README.rst Add Elasticsearch Document Store (#13) Jan 24, 2020
requirements.txt update farm version Jan 23, 2020
setup.py update farm version Jan 23, 2020

README.rst

Haystack — Neural Question Answering At Scale

Build Release License Last Commit

Introduction

The performance of modern Question Answering Models (BERT, ALBERT ...) has seen drastic improvements within the last year enabling many new opportunities for accessing information more efficiently. However, those models are designed to find answers within rather small text passages. Haystack let's you scale QA models to large collections of documents!

Haystack is designed in a modular way and let's you use any QA models trained with FARM or Transformers.

Swap your models easily from BERT to roBERTa and scale the database from dev (Sqlite) to production (elasticsearch).

Core Features

  • Powerful models: Utilize all latest transformer based models (BERT, ALBERT roBERTa ...)
  • Modular & future-proof: Easily switch to newer models once they get published.
  • Developer friendly: Easy to debug, extend and modify.
  • Scalable: Production-ready deployments via Elasticsearch backend.
  • Customizable: Fine-tune models to your own domain.

Components

  1. Retriever: Fast, simple algorithm that identifies candidate passages from a large collection of documents. Algorithms include TF-IDF or BM25, which is similar to what's used in Elasticsearch. The Retriever helps to narrow down the scope for Reader to smaller units of text where a given question could be answered.
  2. Reader: Powerful neural model that reads through texts in detail to find an answer. Use diverse models like BERT, Roberta or XLNet trained via FARM or Transformers on SQuAD like tasks. The Reader takes multiple passages of text as input and returns top-n answers with corresponding confidence scores. You can just load a pretrained model from huggingface's model hub or fine-tune it to your own domain data.
  3. Finder: Glues together a Reader and a Retriever as a pipeline to provide an easy-to-use question answering interface.
  4. Labeling Tool: (Coming soon)

Resources

Quickstart

Installation

There are two ways to install:

  • (recommended) from source, git clone <url> and run pip install [--editable] . from the root of the repositry.
  • from PyPI, do a pip install farm-haystack

Usage

https://raw.githubusercontent.com/deepset-ai/haystack/master/docs/img/code_snippet_usage.png

Deployment

Haystack has an extensible document store layer. There are currently implementations of Elasticsearch and SQL (see haystack.database.elasticsearch.ElasticsearchDocumentStore and haystack.database.sql.SQLDocumentStore).

Elasticsearch Backend

Elasticsearch is recommended for deploying on a large scale. The documents can optionally be chunked into smaller units (e.g., paragraphs) before indexing to make the results returned by the Retriever more granular and accurate. Retrievers can access an Elasticsearch index to find the relevant paragraphs(or documents) for a query. The default ElasticsearchRetriever uses Elasticsearch's native scoring (BM25), but can be extended easily with custom implementations.

You can get started by running a single Elasticsearch node using docker:

docker run -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.5.1

SQL Backend

The SQL backend layer is mainly meant to simplify the first development steps. By default, a local file-based SQLite database is initialized. However, if you prefer a PostgreSQL or MySQL backend for production, you can easily configure this since our implementation is based on SQLAlchemy.

REST API

A simple REST API based on FastAPI is included to answer questions at inference time. To serve the API, run uvicorn haystack.api.inference:app. You will find the Swagger API documentation at http://127.0.0.1:8000/docs

You can’t perform that action at this time.