# JSON-Unterstützung in Datenbanken

## JSON - Das Quiz

## JSON-Unterstützung in dokumentenorientierten Datenbanken (am Beispiel von MongoDB)

TODO: MongoDB beschreiben.

TODO: Beispiel beschreiben: Nutzer:innenprofile in einer Stellenbörse.
Wir benutzen dafür die MongoDB-Instanz `demo-mongo`.

### Datenbankverbindungen verwalten

Das Paket `pymongo` steht unter der Apache Software License 2.0.
Es erlaubt die Nutzung von MongoDB in Python.

Die wichtigste Klasse des Pakets heißt `pymongo.MongoClient`.
Sie ermöglicht den Zugriff auf MongoDB-Instanzen.

#### Beispiel

In [1]:
!pip install pymongo~=4.1.1

Collecting pymongo~=4.1.1
  Downloading pymongo-4.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (471 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.3/471.3 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: pymongo
Successfully installed pymongo-4.1.1


In [2]:
import pymongo
demo_mongo_client = pymongo.MongoClient("mongodb://demo-mongo")

### Dokumente organisieren

Jede MongoDB-Instanz verwaltet ihre Dokumente in einer dreistufigen Hierarchie:

1. MongoDB-Instanzen enthalten Datenbanken.
2. Datenbanken enthalten Collections.
3. Collections enthalten Dokumente.

Datenbanken und Collections müssen nicht explizit angelegt werden.
Sie werden automatisch angelegt, wenn zum ersten Mal auf sie zugegriffen wird.

#### Beispiel

Wir verwalten die Dokumente unserer Stellenbörse in der Datenbank `employment`:

In [3]:
employment = demo_mongo_client["employment"]

Die Nutzer:innenprofile speichern wir in der Collection `users`:

In [4]:
users = employment["users"]

Zu Beginn unseres Beispiels stellen wir sicher, dass die Collection `users` leer ist: 

In [5]:
users.drop()

### Dokumente einfügen

In [6]:
profiles.insert_one({
    "_id": 251,
    "first_name": "Bill",
    "last_name": "Gates",
    "summary": "Co-chair of the Bill & Melinda Gates... Active blogger.",
    "region": "Greater Seattle Area",
    "industry": "Philanthropy",
    "positions": [
        {"job_title": "Co-chair", "organization": "Bill & Melinda Gates Foundation"},
        {"job_title": "Co-founder, Chairman", "organization": "Microsoft"}
    ],
    "education": [
        {"school_name": "Harvard University", "start": 1973, "end": 1975},
        {"school_name": "Lakeside School, Seattle"}
    ],
    "contact_info": {
        "blog": "https://www.gatesnotes.com/",
        "twitter": "https://twitter.com/BillGates"
    }
})

NameError: name 'profiles' is not defined

### Dokumenten abfragen

#### Alle Dokumente abfragen

In [None]:
documents = profiles.find({})
documents

In [None]:
list(documents)

#### Felder einschränken

In [None]:
# Lade nur den Nachnamen und die Organisationen:
documents = profiles.find({}, {"first_name": 1, "positions.organization": 1})
list(documents)

In [None]:
# Lade nur die Attribute first_name und last_name *ohne den Primärschlüssel*:
names_only = {"_id": 0, "first_name": 1, "last_name": 1}
documents = profiles.find({}, names_only)
list(documents)

#### Dokumente einschränken

In [None]:
# Alle Dokumente mit dem Primärschlüssel _id=251:
documents = profiles.find({"_id": 251}, names_only)
list(documents)

In [None]:
# Alle Dokumente mit dem Primärschlüssel _id=12:
documents = profiles.find({"_id": 12}, names_only)
list(documents)

In [None]:
# Alle Dokumente mit dem Attribut last_name="Gates":
documents = profiles.find({"last_name": "Gates"}, names_only)
list(documents)

In [None]:
# Alle Dokumente mit dem Twitter-Account "https://twitter.com/BillGates":
documents = profiles.find(
    {"contact_info.twitter": "https://twitter.com/BillGates"}, names_only
)
list(documents)

In [None]:
# Alle Dokumente, deren Attribut education
# das Element '{"school_name": "Lakeside School, Seattle"}' enthält:
documents = profiles.find(
    {"education": {"$all": [{"school_name": "Lakeside School, Seattle"}]}},
    {"_id": 0, "first_name": 1, "last_name": 1, "education": 1}
)
list(documents)

In [None]:
# Alle Dokumente, deren Attribut positions
# ein Objekt mit dem Attribut organization="Microsoft" enthält:
documents = profiles.find({"positions.organization": "Microsoft"}, names_only)
list(documents)

### Dokumente verändern

In [None]:
# Ändere das Attribut "first_name" des Profils mit dem Primärschlüssel _id=251:
profiles.update_many({"_id": 251}, {"$set": {"first_name": "William"}})
documents = profiles.find({"_id": 251}, names_only)
list(documents)

### Dokumente löschen

In [None]:
# Lösche alle Profile mit last_name="Gates":
profiles.delete_many({"last_name": "Gates"})
documents = profiles.find({"_id": 251}, names_only)
list(documents)

### Referenzen zwischen Dokumenten

In [None]:
industries = db["industries"]
industries.drop()
regions = db["regions"]
regions.drop()

In [None]:
# Füge JSON-Dokumente zu den Collections industries und regions hinzu:
industries.insert_many([
    # Das Attribut _id dient automatisch als Primärschlüssel:
    {"_id": 43, "name": "Financial Services", "description": "Banking, etc."},
    {"_id": 48, "name": "Construction"},
    {"_id": 131, "name": "Philanthropy"}
])
regions.insert_many([
    {"_id": "us:7", "name": "Greater Boston Area"},
    {"_id": "us:91", "name": "Greater Seattle Area", "state": "Washington"}
])

In [None]:
# Füge ein JSON-Dokument mit Referenzen zu anderen Dokumenten hinzu:
import bson.dbref
profiles.insert_one({
    "_id": 251,
    "first_name": "Bill",
    "last_name": "Gates",
    "summary": "Co-chair of the Bill & Melinda Gates... Active blogger.",
    "region": bson.dbref.DBRef(collection=regions.name, id="us:91"),
    "industry": bson.dbref.DBRef(collection=industries.name, id="131"),
    "positions": [
        {"job_title": "Co-chair", "organization": "Bill & Melinda Gates Foundation"},
        {"job_title": "Co-founder, Chairman", "organization": "Microsoft"}
    ],
    "education": [
        {"school_name": "Harvard University", "start": 1973, "end": 1975},
        {"school_name": "Lakeside School, Seattle"}
    ],
    "contact_info": {
        "blog": "https://www.gatesnotes.com/",
        "twitter": "https://twitter.com/BillGates"
    }
})

# Lade Nachname und Region des Profils mit dem Primärschlüssel _id=251:
documents = list(profiles.find({"_id": 251}, {"_id": 0, "last_name": 1, "region": 1}))
documents

In [None]:
# Finde den Namen der Region:
db.dereference(documents[0]["region"])

## JSON-Unterstützung in relationalen Datenbanken (am Beispiel von PostgreSQL)

In [None]:
!pip install psycopg[binary]~=3.0.13

In [None]:
import psycopg.types.json

conn = psycopg.connect("postgresql://postgres@postgresql", autocommit=True)
cur = conn.cursor()

cur.execute("drop table if exists profiles")
cur.execute("create table profiles(doc jsonb)")

### Dokumente einfügen

In [None]:
document = {
    "_id": 251,
    "first_name": "Bill",
    "last_name": "Gates",
    "summary": "Co-chair of the Bill & Melinda Gates... Active blogger.",
    "region": "Greater Seattle Area",
    "industry": "Philanthropy",
    "positions": [
        {
            "job_title": "Co-chair",
            "organization": "Bill & Melinda Gates Foundation"
        },
        {"job_title": "Co-founder, Chairman", "organization": "Microsoft"}
    ],
    "education": [
        {"school_name": "Harvard University", "start": 1973, "end": 1975},
        {"school_name": "Lakeside School, Seattle"}
    ],
    "contact_info": {
        "blog": "https://www.gatesnotes.com/",
        "twitter": "https://twitter.com/BillGates"
    }
}
cur.execute(
    "insert into profiles (doc) values (%s)",
    (psycopg.types.json.Jsonb(document),)
)

### Dokumente abfragen

#### Alle Dokumente abfragen

In [None]:
# Lade alle Dokumente der Tabelle profiles:
cur.execute("select doc from profiles")
cur.fetchall()

#### Felder einschränken

In [None]:
# Lade nur den Nachnamen und die Organisationen:
cur.execute("""
    select
        doc['last_name'],
        jsonb_path_query_array(doc, '$.positions[*].organization')
    from profiles
""")
cur.fetchall()

#### Dokumente einschränken

In [None]:
# Alle Dokumente mit dem Attribut last_name="Gates":
cur.execute("""
    select
        doc['last_name'],
        jsonb_path_query_array(doc, '$.positions[*].organization')
    from profiles
    where doc['last_name'] = '"Gates"'
""")
cur.fetchall()

In [None]:
# Alle Dokumente mit dem Attribut last_name="Jobs":
cur.execute("""
    select
        doc['last_name'],
        jsonb_path_query_array(doc, '$.positions[*].organization')
    from profiles
    where doc['last_name'] = '"Jobs"'
""")
cur.fetchall()

In [None]:
# Alle Dokumente mit dem Twitter-Account "https://twitter.com/BillGates":
cur.execute("""
    select
        doc['last_name'],
        jsonb_path_query_array(doc, '$.positions[*].organization')
    from profiles
    where doc['contact_info']['twitter'] = '"https://twitter.com/BillGates"'
""")
cur.fetchall()

In [None]:
# Alle Dokumente, deren Attribut education
# das Element '{"school_name": "Lakeside School, Seattle"}' enthält:
cur.execute("""
    select
        doc['last_name'],
        jsonb_path_query_array(doc, '$.positions[*].organization'),
        doc['education']
    from profiles
    where doc['education'] @> '[{"school_name": "Lakeside School, Seattle"}]'
""")
cur.fetchall()

In [None]:
# Alle Dokumente, deren Attribut positions
# ein Objekt mit dem Attribut organization="Microsoft" enthält:
cur.execute("""
    select
        doc['last_name'],
        doc['positions']
    from profiles
    where doc['positions'] @> '[{"organization": "Microsoft"}]'
""")
cur.fetchall()

### Dokumente verändern

In [None]:
# Ändere das Attribut "first_name" des Profils mit dem Primärschlüssel _id=251:
cur.execute("""
    update profiles
    set doc['first_name'] = '"William"'
    where doc['_id'] = '251'
""")
cur.execute("""
    select doc['first_name'], doc['last_name']
    from profiles
    where doc['_id'] = '251'
""")
cur.fetchall()

### Dokumente löschen

In [None]:
# Lösche alle Profile mit last_name="Gates":
cur.execute("""
    delete from profiles
    where doc['last_name'] = '"Gates"'
""")
cur.execute("""
    select
        doc['last_name'],
        jsonb_path_query_array(doc, '$.positions[*].organization')
    from profiles
    where doc['_id'] = '251'
""")
cur.fetchall()

### Referenzen zwischen Dokumenten

In [None]:
# Füge JSON-Dokumente zu den Collections industries und regions hinzu:
cur.execute("drop table if exists industries")
cur.execute("create table industries (id int primary key, doc jsonb)")
cur.executemany(
    "insert into industries (id, doc) values (%s, %s)",
    [
        (43, psycopg.types.json.Jsonb(
            {"name": "Financial Services", "description": "Banking, etc."})),
        (48, psycopg.types.json.Jsonb({"name": "Construction"})),
        (131, psycopg.types.json.Jsonb({"name": "Philanthropy"}))
    ]
)
cur.execute("select id, doc from industries")
cur.fetchall()

In [None]:
cur.execute("drop table if exists regions")
cur.execute("create table regions (id text primary key, doc jsonb)")
cur.executemany(
    "insert into regions (id, doc) values (%s, %s)",
    [
        ("us:7", psycopg.types.json.Jsonb({"name": "Greater Boston Area"})),
        ("us:91", psycopg.types.json.Jsonb(
            {"name": "Greater Seattle Area", "state": "Washington"}))
    ]
)
cur.execute("select id, doc from regions")
cur.fetchall()

In [None]:
cur.execute("drop table if exists profiles")
cur.execute("""
    create table profiles (
        id integer primary key,
        industry_id integer references industries(id),
        region_id text references regions(id),
        doc jsonb
    )
""")
cur.execute(
    "insert into profiles (id, industry_id, region_id, doc) values (%s, %s, %s, %s)",
    (251, 131, "us:91", psycopg.types.json.Jsonb({
        "first_name": "Bill",
        "last_name": "Gates",
        "summary": "Co-chair of the Bill & Melinda Gates... Active blogger.",
        "positions": [
            {"job_title": "Co-chair",
             "organization": "Bill & Melinda Gates Foundation"},
            {"job_title": "Co-founder, Chairman", "organization": "Microsoft"}
        ],
        "education": [
            {"school_name": "Harvard University", "start": 1973, "end": 1975},
            {"school_name": "Lakeside School, Seattle"}
        ],
        "contact_info": {
            "blog": "https://www.gatesnotes.com/",
            "twitter": "https://twitter.com/BillGates"
        }
    }))
)
cur.execute("select id, industry_id, region_id, doc from profiles")
cur.fetchall()

In [None]:
# Lade Nachname und Region des Profils mit dem Primärschlüssel _id=251:
cur.execute("""
    select p.doc['last_name'], r.doc['name']
    from profiles p join regions r on p.region_id = r.id
    where p.id = 251
""")
cur.fetchall()

In [None]:
cur.close()
conn.close()

## Zusammenfassung

TODO