# Feb 7th - Document-oriented DB 1

#### Literature
* K. Banker, P. Bakkum, S. Verch, D. Garrett  _"MongoDB in Action, Second Edition"_
    - Chapter 1
    - Chapter 2
    - Chapter 3
* https://github.com/datsoftlyngby/soft2019spring-databases/blob/master/literature/session_2.zip
* An important part of databases are *transactions*. In case you have not read about the **ACID** criteria, this blog is both correct and short: https://blog.yugabyte.com/a-primer-on-acid-transactions/.

#### Handin

See end of this document

#### Study activity

  * Read 3 hrs (notice - a lot to read)
  * Exercises 5 hrs

# Follow up on last hand-in
(Some examples from the student hand-ins, with many good ideas and illustrative problems)


```python
def add(key, value):
    bytesData = toBytes(key + ',' + value)
    file.write(bytesData)
    _hashMap[key] = _hashMap[sorted(_hashMap.keys())[-1]] + sys.getsizeof(bytesData)
```

```java
public void SetNewEntry(String key, String value) throws IOException {
        if (readContentMap.containsKey(key)) {
            System.out.println("DB already contains the key of " + key);
        } else if(!readContentMap.containsKey(key)){
            writeMap.put(key, value);
            WriteFile();
            ReadFile();
        }
    }
```

```java
public void set(String key, String value) throws IOException {
        File f = new File(fileName);
        if (!f.exists()) {
            f.createNewFile();
        }
        try (RandomAccessFile out = new RandomAccessFile(f, "rw")) {
            out.seek(out.length());
            byte[] keyBytes = key.getBytes(encoding);
            byte[] valueBytes = value.getBytes(encoding);
            out.writeInt(keyBytes.length);
            out.write(keyBytes, 0, keyBytes.length);
            long offset = out.getFilePointer();
            out.writeInt(valueBytes.length);
            out.write(valueBytes, 0, valueBytes.length);                      
            store.put(key, (int)offset);
        }
    }
```


# Data Models


  > Data models are perhaps the most important part of developing software, because they have such a profound effect: not only on how the software is written, but also on how we think about the problem that we are solving.
  >
  > Martin Kleppmann, _Designing Data-Intensive Applications
  
  (_His usage of "data model" is what some call "database schema"_)

## What was the Data Model in the Last Lecture and Exercise?

# Object relational mismatch

```java
class Person {
    String name;
    int yearBorn;
    Image avatar;
    HashMap<String, String> socialMedia;
    List<String> posts;
    List<Person> friends;
}
```

## In class exercise: Make a relational model witch can Person objects

## The Document Data Model


A document is essentially a set of property names and their values.

```json
{ name: "Þjóðbjörg", born:1998, ...}
```

The values can be simple data types, such as _strings_, _numbers_, and _dates_. 

But these values can also be arrays and even other documents.

Objects: {key<sub>0</sub>:value<sub>0</sub>, key<sub>1</sub>:value<sub>1</sub>, ...}<br>
Arrays: \[value<sub>0</sub>, value<sub>1</sub>, value<sub>2</sub>, ...\]



```javascript
{
  _id: ObjectID('4bd9e8e17cefd644108961bb'),     // _id field, primary key
  title: 'Adventures in Databases',
  url: 'http://example.com/databases.txt',
  author: 'msmith',
  vote_count: 20,
  tags: ['databases', 'mongodb', 'indexing'],    // Tags stored as array of strings
  image: {                                       // Attribute pointing to another document
    url: 'http://example.com/db.jpg',
    caption: 'A database.',
    type: 'jpg',
    size: 75381,
    data: 'Binary'
  },
  comments: [                                    // Comments stored as array of comment objects
    {
      user: 'bjones',
      text: 'Interesting article.'
    },
    {
      user: 'sverch',
      text: 'Color me skeptical!'
    }
  ]
}
```

Note, a JSON document needs double quotes everywhere except for numeric values. The listing shows the JavaScript version of a JSON document. Internally, MongoDB stores documents in a format called _Binary JSON_, or **BSON**.

## The Document Data Model in MongoDB

On top of documents, MongoDB has the concept of _collections_. _Collections_ can be considered as grouped _documents_.

_Collections_ are similar to tables in the relational world.

The document-oriented data model naturally represents data in an aggregate form, allowing you to work with an object holistically.

## A few admin infos


* Please pick up student cards at the reception. **You need to bring id with a picture for that**.
* Open house - Thursday February 21<sup>st</sup> at 15:00 - 18:00
* Open house - Thursdat May 2<sup>nd</sup> at 15:00 - 18:00


# Architecture of Databases System for Experimenting

## A Database Container

![](images/DB_containers_internal.png)

```bash
$ docker run --rm --publish=27017:27017 --name dbms -d mongo:latest
$ docker run -it --link dbms:mongo --rm mongo sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'
```


**OBS** Do not do this in production. This is a setup that we will use for experimentation only!

In [1]:
%%bash
#"docker run --rm --publish=27017:27017 --name dbms -d mongo:latest
#docker run -i --link dbms:mongo --rm mongo sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'

f6a5072cadd2ad9b89ccbaba1df0518414fc0a8d26412274d53979d2a49a428f


## A Database Container

What is the issue with such a setup?

## Containerized DB Setup for Production

![](images/DB_containers_external.png)


```bash
$ docker run --rm -v $(pwd)/data:/data/db --publish=27017:27017 --name dbms -d mongo:latest
$ docker run -it --link dbms:mongo --rm mongo sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'
```

## Containerized DB Setup for Production


What is the advantage of such a setup?

## Starting a MongoDB Instance for the Lectures

  * Via a container, see https://hub.docker.com/_/mongo/:
  ```bash
  docker run --rm --publish=27017:27017 --name dbms -d mongo:latest
  ```
  ```bash
  docker run --rm -v $(pwd)/data:/data/db --publish=27017:27017 --name dbms -d mongo:latest
  ```
  * Installation in the provided VM, see https://github.com/datsoftlyngby/soft2019spring-databases
  * Installation of MongoDB on the host machine, see https://docs.mongodb.com/manual/administration/install-community/



## Connecting to MongoDB

  * Via a Mongo shell installed in a container: 
  ```bash
  docker run -it --link dbms:mongo --rm mongo sh -c 'exec mongo "$MONGO_PORT_27017_TCP_ADDR:$MONGO_PORT_27017_TCP_PORT/test"'```
  * Via the Mongo shell installed on a host: 
  ```bash
  mongo --host 127.0.0.1:27017
  ```
  * Via the GUI client RoboMongo, see https://robomongo.org/download
![](https://robomongo.org/static/screens-transparent-6e2a44fd.png)
  * Via your own application, see in the end of the lecture.
  
  * DataGrip (free for students)
  * 

# The MongoDB Query Language

  > MongoDB queries are represented as a JSON-like structure, just like documents. To build a query, you specify a document with properties you wish the results to match. MongoDB treats each property as having an implicit boolean AND. It natively supports boolean OR queries, but you must use a special operator ($or) to achieve it. In addition to exact matches, MongoDB has operators for greater than, less than, etc.
  >
  > https://www.safaribooksonline.com/library/view/mongodb-and-python/9781449312817/ch02s06.html
  
The MongoDB documentation calls the Query Language itself _Query Documents_, see https://docs.mongodb.com/manual/tutorial/query-documents/.

## Hey, I think I know everything you want to tell us here! 

![](http://static3.businessinsider.com/image/4fbfb86becad044879000001-506-253/suddenly-startups-have-gotten-very-boring.jpg)

Cool! Then I would like you to ask for your help. I would like those who know Mongo well to check out MySQL 8.

In particular:

  * Figure out how to use the JSON support in MySQL 8.0
  * Figure out how to do JSON queries
  * Figure out how to do queries that merge SQL and JSON

See:
  * https://mysqlserverteam.com/json_table-the-best-of-both-worlds/
  


## Switching to a collection / Creating a new collection

In [12]:
import pymongo
from pymongo import MongoClient
client = MongoClient()
db = client.testDB
users = db.users
print("Done")

Done


## Select All Documents in a Collection

To select all documents in the collection, pass an empty document as the query filter parameter to the `find` method.

In [13]:
import pprint

def pp(obj):
    pprint.pprint(obj)
    
def ppall(col):
    for p in col:
        pp( p )
ppall( users.find({} ) )

{'_id': ObjectId('5c5bd9dc3d356cdb51342701'), 'age': 25, 'username': 'Møller'}
{'_id': ObjectId('5c5bd9e33d356cdb51342702'), 'age': 22, 'username': 'Hansen'}
{'_id': ObjectId('5c5bd9e83d356cdb51342703'), 'age': 24, 'username': 'Nielsen'}


```bash
db.users.find({})
``` 

is synonymous to `db.users.find()`. 

However the former is more explicit and preferred.

In SQL the above query correpsonds to


```sql
SELECT * FROM users
```

## Inserting Data

To be able to query some data in the following, let's first create some in the database.

In [14]:
res = users.insert_one({"username": "Larsen", "age": 21})
pp( res )

<pymongo.results.InsertOneResult object at 0x108561488>


In [15]:
ppall( users.find({}) )

{'_id': ObjectId('5c5bd9dc3d356cdb51342701'), 'age': 25, 'username': 'Møller'}
{'_id': ObjectId('5c5bd9e33d356cdb51342702'), 'age': 22, 'username': 'Hansen'}
{'_id': ObjectId('5c5bd9e83d356cdb51342703'), 'age': 24, 'username': 'Nielsen'}
{'_id': ObjectId('5c5bf6803d356cdb51342705'), 'age': 21, 'username': 'Larsen'}


## Deleting Documents

### Delete a Single Document 

In [16]:
res = users.delete_many({"username": "Møller"})
print("Deleted: " + str(res.deleted_count) )

Deleted: 1


### Delete all Documents of a Collection

In [17]:
users.delete_many({}).deleted_count

3

In [18]:
ppall( users.find({}) )

### Deleting an Entire Collection

In [22]:
db.users.drop()

### What is the _id field?

The `_id` value can be considered a document’s primary key. Every MongoDB document requires an `_id`.

If none is present at creation time, a special MongoDB ObjectID will be generated and added to the document.

In [23]:
res = db.users.insert_one({"_id": 177, "username": "Hansen", "age": 22})
res.inserted_id

177

In [24]:
db.users.insert_one({"username": "Nielsen", "age": 24})

<pymongo.results.InsertOneResult at 0x108561408>

In [25]:
ppall( db.users.find({}) )

{'_id': 177, 'age': 22, 'username': 'Hansen'}
{'_id': ObjectId('5c5bf7cb3d356cdb51342708'), 'age': 24, 'username': 'Nielsen'}


In [26]:
users.count_documents({})

2

## Query Documents in a Collection

### Matching Selector

A query selector is a document that is used to match against all documents in the collection. 

It specifies the _equality condition_, i.e. fields and values, which must be equal in the documents you want to select.

To specify equality conditions, use `<field>:<value>` expressions in the query selector document:

```
{ <field1>: <value1>, ... }
```

In [27]:
res = users.find({"username": "Hansen"})
ppall( res )

{'_id': 177, 'age': 22, 'username': 'Hansen'}


That query is equivalent to the following SQL query:

```SQL
SELECT * FROM users WHERE username = "Hansen"
```

In [30]:
res = db.users.find({ "username": "Hansen",
                "age" : 22 })
ppall( res )

{'_id': 177, 'age': 22, 'username': 'Hansen'}


The matching selector with various fields and values:

```javascript
db.users.find({ username: "Hansen",
                "age" : 22 })
```

is eqiuvalent to the following with an explicit conjunction (`$and`).

In [31]:
ppall( db.users.find({ "$and": [ { "username": "Hansen" },
                        { "age": 22 } ] }) )

{'_id': 177, 'age': 22, 'username': 'Hansen'}


In [33]:
ppall( db.users.find({ "$or": [ { "username": "Nielsen" }, 
                       { "age": 22 } ]}) )

{'_id': 177, 'age': 22, 'username': 'Hansen'}
{'_id': ObjectId('5c5bf7cb3d356cdb51342708'), 'age': 24, 'username': 'Nielsen'}


## The `$in` Operator More Compactly as Containment Check

In [34]:
ppall( db.users.find( 
    { "username": { "$in": [ "Nielsen", "Hansen" ] } } ) )

{'_id': 177, 'age': 22, 'username': 'Hansen'}
{'_id': ObjectId('5c5bf7cb3d356cdb51342708'), 'age': 24, 'username': 'Nielsen'}


## Regular Expressions in Matching Queries

You can use reqular expressions in you matching queries in either of the two following forms:

```
db.users.find({ username: /en$/ })
db.users.find({ username: { $regex: "en$" } })
```
(Notice: the first form is language dependent, the last is generic)

In [35]:
ppall( users.find({ "username": {"$regex": "sen$"} }) )

{'_id': 177, 'age': 22, 'username': 'Hansen'}
{'_id': ObjectId('5c5bf7cb3d356cdb51342708'), 'age': 24, 'username': 'Nielsen'}


## Range Queries

The Mongo query language supports the following query operators: `$eq`, `$gt`, `$gte`, `$in`, `$lt`, `$lte`, `$ne`, `$nin`

In [36]:
ppall( users.find( { "username": {"$regex": "en$"}, 
                 "age": { "$lte": 24 } } ))

{'_id': 177, 'age': 22, 'username': 'Hansen'}
{'_id': ObjectId('5c5bf7cb3d356cdb51342708'), 'age': 24, 'username': 'Nielsen'}


## Updating Documents

Generally, there are two types of updates with different semantics and use cases:

  * Updating a single document or many documents, i.e., modification of corresponding fields and values.
  * Replacement of old documents with new ones.
  


### Operator Update

In [37]:
res = users.update( { "username": "Nielsen" }, 
                 { "$set": { "country": "Denmark" } } )
pp( res )

{'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}


  


In [38]:
res = users.update_one( { "username": "Nielsen" }, 
                 { "$set": { "country": "Denmark" } } )
res.raw_result

{'n': 1, 'nModified': 0, 'ok': 1.0, 'updatedExisting': True}

In [42]:
ppall( db.users.find({}) )

{'_id': 177, 'age': 22, 'username': 'Hansen'}
{'_id': ObjectId('5c5bf7cb3d356cdb51342708'),
 'age': 24,
 'country': 'Denmark',
 'username': 'Nielsen'}


### Replacement Update

In [43]:
ppall( users.update( { "username": "Nielsen" }, 
                 { "country": "Canada" } ))

'n'
'nModified'
'ok'
'updatedExisting'


  


In [44]:
ppall (db.users.find( {} ) )

{'_id': 177, 'age': 22, 'username': 'Hansen'}
{'_id': ObjectId('5c5bf7cb3d356cdb51342708'), 'country': 'Canada'}


In [97]:
ppall( users.find( { "country": "Canada" } ))

{'_id': ObjectId('5c5954553d356cc77adcbd73'), 'country': 'Canada'}


Let's add the username back to the record for our example.

In [45]:
res = db.users.update( { "country": "Canada" }, 
                 { "$set": { "username": "Nielsen" } } )
ppall(res)

'n'
'nModified'
'ok'
'updatedExisting'


  


In [46]:
ppall( db.users.find( { "country": "Canada" } ) )

{'_id': ObjectId('5c5bf7cb3d356cdb51342708'),
 'country': 'Canada',
 'username': 'Nielsen'}


### Removing a field from a document

Value can be removed as with the help of the `$unset` operator.

In [47]:
res = db.users.update( { "username": "Nielsen" }, 
                 { "$unset": { "country": 1 } } )
ppall(res)

'n'
'nModified'
'ok'
'updatedExisting'


  


In [49]:
ppall( db.users.find( { "username": "Nielsen" } ) )

{'_id': ObjectId('5c5bf7cb3d356cdb51342708'), 'username': 'Nielsen'}


### Complex Updates

In [50]:
ppall( db.users.update( { "username": "Nielsen" }, 
                 {  "$set": {
                      "favorites": { 
                        "restaurant": [ "La Petanque", "Hija de Sanchez" ], 
                        "cafe": [ "Paludan Bog & Café", "Café Retro", "Conditori La Glace" ] 
                      }
                    } 
                  }))

'n'
'nModified'
'ok'
'updatedExisting'


  """


In [51]:
ppall(db.users.update( { "username": "Hansen" }, 
                 {  "$set": {
                      "favorites": { 
                        "cafe": [ "Vaffelbageren", "Café BoPa", "Conditori La Glace" ] 
                      }
                    } 
                 }))

'n'
'nModified'
'ok'
'updatedExisting'


  after removing the cwd from sys.path.


In [52]:
ppall(db.users.find({}))

{'_id': 177,
 'age': 22,
 'favorites': {'cafe': ['Vaffelbageren', 'Café BoPa', 'Conditori La Glace']},
 'username': 'Hansen'}
{'_id': ObjectId('5c5bf7cb3d356cdb51342708'),
 'favorites': {'cafe': ['Paludan Bog & Café',
                        'Café Retro',
                        'Conditori La Glace'],
               'restaurant': ['La Petanque', 'Hija de Sanchez']},
 'username': 'Nielsen'}


In [53]:
ppall(users.find( { "favorites.cafe": "Conditori La Glace" } ))

{'_id': 177,
 'age': 22,
 'favorites': {'cafe': ['Vaffelbageren', 'Café BoPa', 'Conditori La Glace']},
 'username': 'Hansen'}
{'_id': ObjectId('5c5bf7cb3d356cdb51342708'),
 'favorites': {'cafe': ['Paludan Bog & Café',
                        'Café Retro',
                        'Conditori La Glace'],
               'restaurant': ['La Petanque', 'Hija de Sanchez']},
 'username': 'Nielsen'}


#### Adding Elements to Nested Sets

Suppose we know that any user who likes _Café Retro_ also likes _Lagkagehuset_ and that our database shall reflect this fact.

##### What does the `false`and the `true` mean?

```javascript
> db.users.update
function (query, obj, upsert, multi) {
...
}
```
(`Upsert` - An operation that inserts rows into a database table if they do not already exist, or updates them if they do`)

In [110]:
ppall( db.users.update( { "favorites.cafe": "Café Retro" }, 
                 { "$addToSet": { "favorites.cafe": "Lagkagehuset" } }, 
                 False, True))

'n'
'nModified'
'ok'
'updatedExisting'


  This is separate from the ipykernel package so we can avoid doing imports until


# Querying with Indexes

Let's start with creating a large collection of numbers.


In [54]:
res = db.numbers.drop()
pp(res)

None


In [55]:
for i in range(0,20000): 
    db.numbers.insert_one( { "num": i } ); 

In [56]:
db.numbers.count_documents({})

20000

In [57]:
res = db.numbers.find( {} )
ppall( res.limit(5) )

{'_id': ObjectId('5c5c01843d356cdb51342709'), 'num': 0}
{'_id': ObjectId('5c5c01843d356cdb5134270a'), 'num': 1}
{'_id': ObjectId('5c5c01843d356cdb5134270b'), 'num': 2}
{'_id': ObjectId('5c5c01843d356cdb5134270c'), 'num': 3}
{'_id': ObjectId('5c5c01843d356cdb5134270d'), 'num': 4}


In [59]:
ppall( db.numbers.find( { "num": { "$gt": 20, "$lt": 30 } } ) )

{'_id': ObjectId('5c5c01843d356cdb5134271e'), 'num': 21}
{'_id': ObjectId('5c5c01843d356cdb5134271f'), 'num': 22}
{'_id': ObjectId('5c5c01843d356cdb51342720'), 'num': 23}
{'_id': ObjectId('5c5c01843d356cdb51342721'), 'num': 24}
{'_id': ObjectId('5c5c01843d356cdb51342722'), 'num': 25}
{'_id': ObjectId('5c5c01843d356cdb51342723'), 'num': 26}
{'_id': ObjectId('5c5c01843d356cdb51342724'), 'num': 27}
{'_id': ObjectId('5c5c01843d356cdb51342725'), 'num': 28}
{'_id': ObjectId('5c5c01843d356cdb51342726'), 'num': 29}


## Execution Statistics

When any database receives a query, it must plan out how to execute it. This is called a _query plan_.

The `explain` method describes query paths and allows developers to diagnose slow operations by determining which indexes a query has used.

In [60]:
db.numbers.find( { 
                   "num": { 
                     "$gt": 19995 
                   } 
                 } ).explain()["executionStats"]

{'executionSuccess': True,
 'nReturned': 4,
 'executionTimeMillis': 10,
 'totalKeysExamined': 0,
 'totalDocsExamined': 20000,
 'executionStages': {'stage': 'COLLSCAN',
  'filter': {'num': {'$gt': 19995}},
  'nReturned': 4,
  'executionTimeMillisEstimate': 10,
  'works': 20002,
  'advanced': 4,
  'needTime': 19997,
  'needYield': 0,
  'saveState': 156,
  'restoreState': 156,
  'isEOF': 1,
  'invalidates': 0,
  'direction': 'forward',
  'docsExamined': 20000},
 'allPlansExecution': []}

## Creating an Index in MongoDB

On top of the indexes that you create manually, every collection in MongoDB has an index on the `_id` field, which is created automatically for every collection.

MongoDB indexes use a _B-tree_ data structure. We will return to this datastructure later.

In [61]:
db.numbers.create_index( "num" )

'num_1'

In [62]:
ppall( db.numbers.list_indexes()) # getIndexes

{'key': SON([('_id', 1)]),
 'name': '_id_',
 'ns': 'testDB.numbers',
 'v': 2}
{'key': SON([('num', 1)]),
 'name': 'num_1',
 'ns': 'testDB.numbers',
 'v': 2}


In [63]:
db.numbers.find( { 
                   "num": { 
                     "$gt": 19995 
                   } 
                 } ).explain()["executionStats"]

{'executionSuccess': True,
 'nReturned': 4,
 'executionTimeMillis': 1,
 'totalKeysExamined': 4,
 'totalDocsExamined': 4,
 'executionStages': {'stage': 'FETCH',
  'nReturned': 4,
  'executionTimeMillisEstimate': 0,
  'works': 5,
  'advanced': 4,
  'needTime': 0,
  'needYield': 0,
  'saveState': 0,
  'restoreState': 0,
  'isEOF': 1,
  'invalidates': 0,
  'docsExamined': 4,
  'alreadyHasObj': 0,
  'inputStage': {'stage': 'IXSCAN',
   'nReturned': 4,
   'executionTimeMillisEstimate': 0,
   'works': 5,
   'advanced': 4,
   'needTime': 0,
   'needYield': 0,
   'saveState': 0,
   'restoreState': 0,
   'isEOF': 1,
   'invalidates': 0,
   'keyPattern': {'num': 1},
   'indexName': 'num_1',
   'isMultiKey': False,
   'multiKeyPaths': {'num': []},
   'isUnique': False,
   'isSparse': False,
   'isPartial': False,
   'indexVersion': 2,
   'direction': 'forward',
   'indexBounds': {'num': ['(19995, inf.0]']},
   'keysExamined': 4,
   'seeks': 1,
   'dupsTested': 0,
   'dupsDropped': 0,
   'seenInval

In [64]:
for i in range(20000,50000): 
    db.numbers.insert_one( { "num": i } ); 
    

In [68]:
db.numbers.find( { 
                   "num": { 
                     "$gt": 49995 
                   } 
                 } ).explain()["executionStats"]

{'executionSuccess': True,
 'nReturned': 4,
 'executionTimeMillis': 0,
 'totalKeysExamined': 4,
 'totalDocsExamined': 4,
 'executionStages': {'stage': 'FETCH',
  'nReturned': 4,
  'executionTimeMillisEstimate': 0,
  'works': 5,
  'advanced': 4,
  'needTime': 0,
  'needYield': 0,
  'saveState': 0,
  'restoreState': 0,
  'isEOF': 1,
  'invalidates': 0,
  'docsExamined': 4,
  'alreadyHasObj': 0,
  'inputStage': {'stage': 'IXSCAN',
   'nReturned': 4,
   'executionTimeMillisEstimate': 0,
   'works': 5,
   'advanced': 4,
   'needTime': 0,
   'needYield': 0,
   'saveState': 0,
   'restoreState': 0,
   'isEOF': 1,
   'invalidates': 0,
   'keyPattern': {'num': 1},
   'indexName': 'num_1',
   'isMultiKey': False,
   'multiKeyPaths': {'num': []},
   'isUnique': False,
   'isSparse': False,
   'isPartial': False,
   'indexVersion': 2,
   'direction': 'forward',
   'indexBounds': {'num': ['(49995, inf.0]']},
   'keysExamined': 4,
   'seeks': 1,
   'dupsTested': 0,
   'dupsDropped': 0,
   'seenInval

# Connect to MongoDB from a Java Maven Project

https://mongodb.github.io/mongo-java-driver/3.0/driver/getting-started/installation-guide/

```xml
<dependency>
    <groupId>org.mongodb</groupId>
    <artifactId>mongodb-driver</artifactId>
    <version>3.9.1</version>
</dependency>
```

Based on https://mongodb.github.io/mongo-java-driver/3.0/driver/getting-started/quick-tour/

```java
package dk.cphbusiness.db.meassurements;

import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;

public class MongoTest {

    public static void main(String[] args) {
        MongoClientURI connStr = new MongoClientURI("mongodb://localhost:27017");
        MongoClient mongoClient = new MongoClient(connStr);

        MongoDatabase db = mongoClient.getDatabase("test-database");
        MongoCollection<Document> collection = db.getCollection("tweets");

        Document myDoc = collection.find().first();
        System.out.println(myDoc.toJson());
    }
}
```

# Importing Data

You can either write a program which inserts documents into a database or you use MongoDB's CLI import tool.

```bash
mongoimport --drop --db social_net --collection tweets --type csv --headerline --file testdata.manual.2009.06.14.csv
```


In [69]:
%%bash
mongoimport --help

bash: line 1: mongoimport: command not found


In [70]:
db.books.drop()
"all dropped"

'all dropped'

In [71]:
from urllib.request import urlopen
import json
from bson.json_util import loads

link = "https://raw.githubusercontent.com/ozlerhakan/mongodb-json-files/master/datasets/catalog.books.json"
f = urlopen(link)
myfile = f.read()
allBooks = myfile.decode("utf-8")
count = 0
for line in allBooks.splitlines():
    jsonbook = loads(line)
    #print( str(count) +": " + str(jsonbook) )
    db.books.insert_one(jsonbook)
    count = count + 1
db.books.count_documents({})

431

In [73]:
ppall(db.books.find( {"title": {"$regex": "Android"}, 
                    {"title":1, "authors":1, "_id":0}}) ) # project 

{'_id': 1,
 'authors': ['W. Frank Ableson', 'Charlie Collins', 'Robi Sen'],
 'categories': ['Open Source', 'Mobile'],
 'isbn': '1933988673',
 'longDescription': 'Android is an open source mobile phone platform based on '
                    'the Linux operating system and developed by the Open '
                    'Handset Alliance, a consortium of over 30 hardware, '
                    'software and telecom companies that focus on open '
                    'standards for mobile devices. Led by search giant, '
                    'Google, Android is designed to deliver a better and more '
                    'open and cost effective mobile experience.    Unlocking '
                    "Android: A Developer's Guide provides concise, hands-on "
                    'instruction for the Android operating system and '
                    'development tools. This book teaches important '
                    'architectural concepts in a straightforward writing style '
                    

# Handin

See exercise at:

https://github.com/datsoftlyngby/soft2019spring-databases/blob/master/assignments/assignment3.md


# Video

  * https://www.youtube.com/watch?v=1sLjWlWvCsc

# Literature

  * http://www.redbook.io/pdf/redbook-5th-edition.pdf
  * http://15721.courses.cs.cmu.edu/spring2016/papers/whatgoesaround-stonebraker.pdf
  * https://docs.mongodb.com/manual/tutorial/query-documents/
  * https://docs.mongodb.com/manual/reference/
  
  * https://docs.mongodb.com/manual/indexes/
  * https://docs.mongodb.com/manual/reference/program/mongoimport/
