In [4]:
%load_ext autoreload
%autoreload 2
from pymongo import MongoClient
import sys
from pathlib import Path
from tqdm import tqdm
import json
import random
import bson

sys.path.append(str(Path("..").resolve()))
from src import *

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Define indexes on embedded docs' id
Tickets are embedded inside visitors' docs. Events reference them. Thus, tickets must have a ID to be retrievable.

First I've added a `_id` field to ticket embedded docs, with value _ObjectId_. Then, I defined an index on such a field.

## Performance improvements
Before defining the index:
```python
db.visitors.find({"tickets._id": ObjectId("<OID>")}).explain()
```

winning plan = `COLLSCAN`. `docsExamined = 50`. `totalKeysExamined = 0`. **Query scanned whole collection.**

Then,
```python
db.visitors.create_index({ "tickets._id": 1 })
```

winning plan = `IXSCAN` -> `FETCH`. `indexName = tickets._id_1`. `totalKeysExamined = 1`. `totalDocsExamined = 1`. **Query scanned 1 document.**

FETCH appears because Mongo must read the full document after the index gives the matching key.

## Index update

MongoDB maintains indexes automatically on writes.

## Load post-preprocessing data
The following cell reset the `omero_museum` database loading the `4_consistent` snapshot.

In [5]:
%%capture
MongoClient("mongodb://localhost:27017/").drop_database("omero_museum")
!mongorestore --host localhost:27017 --drop --db omero_museum  "../backup/4_consistent/omero_museum"

In [6]:
connector = MongoDBConnector("omero_museum")
db = connector.db

The collections of the [1m[33momero_museum[0m db are:
----------------------------------------
[activities]: [1m[31m_id[0m [1m[33mcapacity[0m [1m[32mduration[0m [1m[36menrolled[0m [1m[34mroom[0m [1m[35mstart_date[0m [1m[37mticketIds[0m [1m[90mworkshop_title[0m
[artworks]: [1m[31m_id[0m [1m[33mauthorIds[0m [1m[32mcomments_star_1[0m [1m[36mcomments_star_2[0m [1m[34mcomments_star_3[0m [1m[35mcomments_star_4[0m [1m[37mcomments_star_5[0m [1m[90mdate[0m [1m[91mdescription[0m [1m[93mdonation_state[0m [1m[92mdonator_id[0m [1m[96mis_original[0m [1m[94mlocation_name[0m [1m[95mmaterials[0m [1m[31mperiod[0m [1m[33mseller_id[0m [1m[32msize[0m [1m[36mtecniques[0m [1m[34mtrade[0m [1m[35mtype[0m
[authors]: [1m[31m_id[0m [1m[33mbirth_date[0m [1m[32mgender[0m [1m[36mhome_town[0m [1m[34mname[0m [1m[35msurname[0m
[departments]: [1m[31m_id[0m [1m[33mfloor[0m [1m[32mfree_spots[0m [1m[36mroom[

## Define indexes on _Embedded_ + _Referenced_ documents

In [7]:
db.visitors.create_index({ "tickets._id": 1 })

'tickets._id_1'

---
## Dump Final Database

In [8]:
%%capture
!mongodump --host localhost:27017 --db omero_museum --out "../backup/5_optimized"