# Firestore: Index the documents

This notebooks takes the documents from the SQuAD dataset and indexes them in Firestore.

In [1]:
import datasets

In [2]:
squad_dataset = datasets.load_dataset('squad')

In [3]:
data = squad_dataset["train"].to_pandas()

In [4]:
data.drop_duplicates(subset='context', keep='first', inplace=True)

In [5]:
len(data)

18891

In [6]:
data.head(3)

Unnamed: 0,id,title,context,question,answers
0,5733be284776f41900661182,University_of_Notre_Dame,"Architecturally, the school has a Catholic cha...",To whom did the Virgin Mary allegedly appear i...,"{'text': ['Saint Bernadette Soubirous'], 'answ..."
5,5733bf84d058e614000b61be,University_of_Notre_Dame,"As at most other universities, Notre Dame's st...",When did the Scholastic Magazine of Notre dame...,"{'text': ['September 1876'], 'answer_start': [..."
10,5733bed24776f41900661188,University_of_Notre_Dame,The university is the major seat of the Congre...,Where is the headquarters of the Congregation ...,"{'text': ['Rome'], 'answer_start': [119]}"


## Index into Firestore

In [7]:
from tqdm import tqdm

import firebase_admin
from firebase_admin import firestore

In [8]:
app = firebase_admin.initialize_app()
db = firestore.client()

In [9]:
def make_doc(row):
    doc = row.to_dict()
    
    id = doc["id"]
    del doc["id"]

    answers = {
        "text": doc["answers"]["text"].tolist(),
        "answer_start": doc["answers"]["answer_start"].tolist()
    }

    doc["answers"] = answers
    return id, doc

Example:

In [10]:
make_doc(data.iloc[0])

('5733be284776f41900661182',
 {'title': 'University_of_Notre_Dame',
  'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
  'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
  'answers': {'text': ['Saint Bernadette Soubirous'], 'answer_start': [515]}})

In [13]:
for index, row in tqdm(data.iterrows(), total=len(data)):
    id, doc = make_doc(row)
    db.collection("questions").document(id).set(doc)

100%|████████████████████████████████████████████████████████████████████████████████████| 18891/18891 [34:36<00:00,  9.10it/s]
