# Training Your Own "Dense Passage Retrieval" Model

EXECUTABLE VERSION: [colab](https://colab.research.google.com/github/deepset-ai/haystack/blob/master/tutorials/Tutorial9_DPR_training.ipynb)

Haystack contains all the tools needed to train your own Dense Passage Retrieval model.
This tutorial will guide you through the steps required to create a retriever that is specifically tailored to your domain.


In [None]:
from haystack.document_store.memory import InMemoryDocumentStore
from haystack.retriever.dense import DensePassageRetriever

In [None]:
# Data Prep
Download datasets?
format

doc_dir = "data/article_txt_got"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

In [None]:
document_store = InMemoryDocumentStore()
retriever = DensePassageRetriever(document_store=document_store,
                                  query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
                                  passage_embedding_model="facebook/dpr-ctx_encoder-single-nq-base",
                                  max_seq_len_query=64,
                                  max_seq_len_passage=256,
                                  batch_size=16,
                                  use_gpu=True,
                                  embed_title=True,
                                  use_fast_tokenizers=True)


In [None]:
# Train
# Saves


retriever.train(
    data_dir="data/dpr_training/",
    train_filename="nq-dev-small.json",
    dev_filename="nq-dev-small.json",
    test_filename="nq-dev-small.json",
    n_epochs=1,
    batch_size=4,
    save_dir="..saved_models/dpr",
    embed_title=True
)

In [None]:
# Load?