In [2]:
%load_ext autoreload
%autoreload 2
from pymongo import MongoClient
import sys
from pathlib import Path
from tqdm import tqdm
import json

sys.path.append(str(Path("..").resolve()))
from src import *

# Migrate MySQL to MongoDB
As the database schema has evolved significantly since the current MySQL database was created, we've first migrated a subset of the collections using [_MongoDB Relational Migrator_](https://www.mongodb.com/resources/solutions/use-cases/mysql-to-mongodb).
The database snapshot up to this point can be loaded from `backup/1_after_migration` using the following cell.

In this notebook we are going to complete the migration using handwritten queries.

## Load post-migration data
The following cell reset the `omero_museum` database loading the `1_after_migration` snapshot.

In [38]:
%%capture
MongoClient("mongodb://localhost:27017/").drop_database("omero_museum")
!mongorestore --host localhost:27017 --drop --db omero_museum  "../backup/1_after_migration/omero_museum"

## Visualizing the migrated database

In [3]:
schema=json.load(open("schema.json"))
connector=MongoDBConnector("omero_museum")
db=connector.db

The collections of the [1m[33momero_museum[0m db are:
----------------------------------------
[activities]: [1m[31m_id[0m [1m[33mcapacity[0m [1m[32mduration[0m [1m[36menrolled[0m [1m[34mroom[0m [1m[35mstart_date[0m [1m[37mticket_ids[0m [1m[90mworkshop_title[0m
[artworks]: [1m[31m_id[0m [1m[33mauthor_ids[0m [1m[32mcomments_star_1[0m [1m[36mcomments_star_2[0m [1m[34mcomments_star_3[0m [1m[35mcomments_star_4[0m [1m[37mcomments_star_5[0m [1m[90mdate[0m [1m[91mdescription[0m [1m[93mdonation_state[0m [1m[92mdonator_id[0m [1m[96mis_original[0m [1m[94mlocation_name[0m [1m[95mmaterials[0m [1m[31mperiod[0m [1m[33mseller_id[0m [1m[32msize[0m [1m[36mtecniques[0m [1m[34mtrade[0m [1m[35mtype[0m
[authors]: [1m[31m_id[0m [1m[33mbirth_date[0m [1m[32mgender[0m [1m[36mhome_town[0m [1m[34mname[0m [1m[35msurname[0m
[departments]: [1m[31m_id[0m [1m[33mfloor[0m [1m[32mfree_spots[0m [1m[36mroom

There are 3 kinds of problems that catch the eye:
1. Collections need to be **renamed**,
2. Some collections are **missing**.
3. Several fields have **changed** during the design.

### 1. Renaming fields

In [40]:
collections_renaming = {
    "biglietti": "tickets",
    "reparti": "departments",
    "dipendenti": "roles",
    "opere": "artworks",
    "clienti": "visitors",
    "laboratori": "workshops",
    "questionari": "surveys",
    "artisti": "authors",
}
for old_name, new_name in collections_renaming.items():
    if old_name not in connector.collections:
        continue
    db[old_name].rename(new_name)
    cprint("Renaming collection", f"red:{old_name} --> {new_name}")

Renaming collection [1m[31mbiglietti --> tickets[0m
Renaming collection [1m[31mreparti --> departments[0m
Renaming collection [1m[31mdipendenti --> roles[0m
Renaming collection [1m[31mopere --> artworks[0m
Renaming collection [1m[31mclienti --> visitors[0m
Renaming collection [1m[31mlaboratori --> workshops[0m
Renaming collection [1m[31mquestionari --> surveys[0m
Renaming collection [1m[31martisti --> authors[0m


### 2. Adding missing collections

In [41]:
collections_missing = [
    "rooms",
    "messages",
    "suppliers",
    "limited_events",
    "activities",
]
for name in collections_missing:
    db.create_collection(name)
    cprint("Creating collection", f"green:{name}")

Creating collection [1m[32mrooms[0m
Creating collection [1m[32mmessages[0m
Creating collection [1m[32msuppliers[0m
Creating collection [1m[32mlimited_events[0m
Creating collection [1m[32mactivities[0m


### 3. Renaming documents' entries

In [42]:
entries_renaming = {
    "artworks": {
        "data": "date",
        "descrizione": "description",
        "sala": "room",
        "tipologia": "type",
        "titolo": "title",
    },
    "authors": {
        "cognome": "surname",
        "data_nasc": "birth_date",
        "luogo_nasc": "home_town",
        "nome": "name",
        "sesso": "gender",
    },
    "departments": {
        "nome": "name",
        "piano": "floor",
        "posti_occ": "free_spots",
        "stanza": "room",
    },
    "roles": {
        "cellulare": "phone_number",
        "cognome": "surname",
        "curriculum": "curriculum",
        "data_nasc": "birth_date",
        "data_registrazione": "date_start",
        "email": "email",
        "luogo_nasc": "hometown",
        "nome": "name",
        "sesso": "gender",
    },
    "surveys": {
        "accompagnatori_visita": "accompanying_persons_visit",
        "data_compilazione": "date_of_compilation",
        "motivazione_visita": "reason_for_visit",
        "numero_visite": "number_of_visits",
        "ritorno": "return",
        "tipologia_visita": "type_of_visit",
        "titolo_studi": "title_of_studies",
        "valutazione_esperienza": "evaluation_of_experience",
        "valutazione_struttura": "evaluation_of_facility",
        "valutazione_visita": "evaluation_of_visit",
    },
    "tickets": {
        "costo": "price",
        "data_stampa": "date",
    },
    "visitors": {
        "cellulare": "cell phone number",
        "cognome": "surname",
        "email": "email",
        "name": "name",
        "tariffa": "fare",
    },
    "workshops": {
        "costo_classe": "price_class",
        "costo_persona": "price_person",
        "durata": "duration",
        "nome": "title",
        "tipologia": "type",
    },
}
for coll_name, mapping in tqdm(entries_renaming.items()):
    coll = db[coll_name]

    for doc in coll.find({}):
        new_doc = {"_id": doc["_id"]}
        for old_key, new_key in mapping.items():
            if old_key in doc:
                new_doc[new_key] = doc[old_key]

        coll.replace_one({"_id": doc["_id"]}, new_doc)  

100%|██████████| 8/8 [00:00<00:00, 43.93it/s]


In [43]:
connector.stats()

The collections of the [1m[33momero_museum[0m db are:
----------------------------------------
[activities]:
[artworks]: [1m[31m_id[0m [1m[33mdate[0m [1m[32mdescription[0m [1m[36mroom[0m [1m[34mtitle[0m [1m[35mtype[0m
[authors]: [1m[31m_id[0m [1m[33mbirth_date[0m [1m[32mgender[0m [1m[36mhome_town[0m [1m[34mname[0m [1m[35msurname[0m
[departments]: [1m[31m_id[0m [1m[33mfloor[0m [1m[32mfree_spots[0m [1m[36mname[0m [1m[34mroom[0m
[limited_events]:
[messages]:
[roles]: [1m[31m_id[0m [1m[33mbirth_date[0m [1m[32mcurriculum[0m [1m[36mdate_start[0m [1m[34memail[0m [1m[35mgender[0m [1m[37mhometown[0m [1m[90mname[0m [1m[91mphone_number[0m [1m[93msurname[0m
[rooms]:
[suppliers]:
[surveys]: [1m[31m_id[0m [1m[33maccompanying_persons_visit[0m [1m[32mdate_of_compilation[0m [1m[36mevaluation_of_experience[0m [1m[34mevaluation_of_facility[0m [1m[35mevaluation_of_visit[0m [1m[37mnumber_of_visits[0m [

## Setting right `_id` values
Some collections have a wrong primary key.

In [44]:
collections_id = {"workshops": "title", "artworks": "title", "departments": "name"}
for coll, key in collections_id.items():
    for doc in db[coll].find({}):
        id = doc.pop(key)
        db[coll].delete_one({"_id": doc["_id"]})
        db[coll].insert_one({**doc, "_id": id})
        cprint(f"[{coll}]:", f"red:{doc['_id']}", "-->", f"green:{id}")
    print("-" * 30)

[workshops]: [1m[31m68b572f8845fcdbabc1c8941[0m --> [1m[32mBestiario immaginario[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c8942[0m --> [1m[32mConosci Louis?[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c8943[0m --> [1m[32mContemporaneamente di-segno[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c8944[0m --> [1m[32mDi-segno[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c8945[0m --> [1m[32mIl Museo delle meraviglie[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c8946[0m --> [1m[32mImpronte[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c8947[0m --> [1m[32mLe cose raccontano storie[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c8948[0m --> [1m[32mLibri tattili[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c8949[0m --> [1m[32mMini corso di ceramica[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c894a[0m --> [1m[32mRi-tratto con tatto[0m
[workshops]: [1m[31m68b572f8845fcdbabc1c894b[0m --> [1m[32mRicordi da toccare[0m
[workshops]: [1m[31m68b572f8845fcdbabc1

---
## Dump Final Database

In [45]:
%%capture
!mongodump --host localhost:27017 --db omero_museum --out "../backup/2_migration_preprocessed"