# Why do we need yet another database?


## MongoDB’s key features

A database is defined in large part by its data model. In this section, you’ll look at the document data model, and then you’ll see the features of MongoDB that allow you to operate effectively on that model. This section also explores operations, focusing on MongoDB’s flavor of replication and its strategy for scaling horizontally.


### Document data model



Document databases offer the most immediately familiar paradigm for developers used to working with hierarchically structured documents. Document databases store and retrieve documents, just like an electronic filing cabinet. Documents tend to comprise maps and lists, allowing for natural hierarchies—much as we’re used to with formats like JSON and XML.


A document is essentially a **set of property names and their values**. The values can be simple data types, such as strings, numbers, and dates. But these values can also be arrays and even other JSON documents. 

Note, a JSON document needs double quotes everywhere except for numeric values. The following listing shows the JavaScript version of a JSON document where double quotes aren’t necessary.

```javascript
{
  _id: ObjectID('4bd9e8e17cefd644108961bb'),     // _id field, primary key
  title: 'Adventures in Databases',
  url: 'http://example.com/databases.txt',
  author: 'msmith',
  vote_count: 20,
  tags: ['databases', 'mongodb', 'indexing'],    // Tags stored as array of strings
  image: {                                       // Attribute pointing to another document
    url: 'http://example.com/db.jpg',
    caption: 'A database.',
    type: 'jpg',
    size: 75381,
    data: 'Binary'
  },
  comments: [                                    // Comments stored as array of comment objects
    {
      user: 'bjones',
      text: 'Interesting article.'
    },
    {
      user: 'sverch',
      text: 'Color me skeptical!'
    }
  ]
}
```

Internally, MongoDB stores documents in a format called Binary JSON, or **BSON**. BSON has a similar structure but is intended for storing many documents.

Where relational databases have tables, MongoDB has **collections**. In other words, MySQL (a popular relational database) keeps its data in tables of rows, while MongoDB keeps its data in collections of documents, which you can think of as a group of documents.

A document-oriented data model naturally represents **data in an aggregate form**, allowing you to work with an object holistically: all the data representing a post, from comments to tags, can be fitted into a single database object.

You’ve probably noticed that in addition to providing a richness of structure, **documents needn’t conform to a prespecified schema**. With a relational database, you store rows in a table. Each table has a strictly defined schema specifying which columns and types are permitted. If any row in a table needs an extra field, you have to alter the table explicitly.



#### Schema-less Model Advantages

This lack of imposed schema confers some advantages. 

  * Application code, and not the database, enforces the data’s structure. This can speed up initial application development when the schema is changing frequently.
  * A schema-less model allows you to represent data with truly variable properties. 







### Ad hoc queries

Not all databases support dynamic queries. For instance, key-value stores are queryable on one axis only: the value’s key. Like many other systems, key-value stores sacrifice rich query power in exchange for a simple scalability model. 

One of MongoDB’s design goals is to preserve most of the query power that’s been so fundamental to the relational database world.

To see how MongoDB’s query language works, let’s take a simple example involving posts and comments. Suppose you want to find all posts tagged with the term poli- tics having more than 10 votes. A SQL query would look like this:

```sql
SELECT * FROM posts
  INNER JOIN posts_tags ON posts.id = posts_tags.post_id
  INNER JOIN tags ON posts_tags.tag_id == tags.id
  WHERE tags.text = 'politics' AND posts.vote_count > 10;  
```


The equivalent query in MongoDB is specified using a document as a matcher. The special $gt key indicates the greater-than condition:

```js
db.posts.find({'tags': 'politics', 'vote_count': {'$gt': 10}});
```

**OBS**: the two queries assume a different data model. The SQL query relies on a strictly normalized model, where posts and tags are stored in distinct tables, whereas the MongoDB query assumes that tags are stored within each post document. But both queries demonstrate an ability to query on arbitrary combinations of attributes, which is the essence of ad hoc query ability.


### Indexes

A critical element of ad hoc queries is that they search for values that you don’t know when you create the database. As you add more and more documents to your database, searching for a value becomes increasingly expensive; it’s a needle in an ever-expanding haystack. 

Indexes in MongoDB are implemented as a B-tree data structure. B-tree indexes, also used in many relational databases, are optimized for a variety of queries, including range scans and queries with sort clauses.

Most databases give each document or row a primary key, a unique identifier for that datum. The primary key is generally indexed automatically so that each datum can be efficiently accessed using its unique key, and MongoDB is no different. But not every database allows you to also index the data inside that row or document. These are called secondary indexes. Many NoSQL databases, such as HBase, are considered key-value stores because they don’t allow any secondary indexes. This is a significant feature in MongoDB; by permitting multiple secondary indexes MongoDB allows users to optimize for a wide variety of queries.

With MongoDB, you can create up to 64 indexes per collection. The kinds of indexes supported include all the ones you’d find in an RDMBS; ascending, descending, unique, compound-key, hashed, text, and even geospatial indexes are supported.


### Replication

MongoDB provides database replication via a topology known as a replica set. Replica sets distribute data across two or more machines for redundancy and automate failover in the event of server and network outages. Additionally, replication is used to scale database reads. If you have a read-intensive application, as is commonly the case on the web, it’s possible to spread database reads across machines in the replica set cluster.

Replica sets consist of many MongoDB servers, usu- ally with each server on a separate physical machine; we’ll call these nodes. At any given time, one node serves as the replica set primary node and one or more nodes serve as secondaries. Like the master-slave repli- cation that you may be familiar with from other data- bases, a replica set’s primary node can accept both reads and writes, but the secondary nodes are read- only. What makes replica sets unique is their support for automated failover: if the primary node fails, the cluster will pick a secondary node and automatically promote it to primary. When the former primary comes back online, it’ll do so as a secondary.


### Speed and durability

In the realm of database systems there exists an inverse relationship between *write speed* and *durability*. Write speed can be understood as the volume of inserts, updates, and deletes that a database can process in a given time frame. Durability refers to level of assurance that these write operations have been made permanent.

For instance, suppose you write 100 records of 50KB each to a database and then immediately cut the
power on the server. Will those records be recoverable when you bring the machine back online? The answer
depends on your database system, its configuration, and the hardware hosting it.

In MongoDB’s case, users control the speed and durability trade-off by choosing write semantics and deciding whether to enable journaling. 

You can configure MongoDB to **fire-and-forget**, sending off a write to the server without waiting for an acknowledgment. You can also configure MongoDB to guarantee that a write has gone to **multiple replicas before** considering it committed. For high-volume, low-value data (like clickstreams and logs), fire-and-forget-style writes can be ideal. For important data, a safe mode setting is necessary. 

Since MongoDB v2.0, **journaling** is enabled by default. With journaling, every write is flushed to the journal file every 100 ms. If the server is ever shut down uncleanly (say, in a power outage), the journal will be used to ensure that MongoDB’s data files are restored to a consistent state when you restart the server. This is the safest way to run MongoDB.

### Scaling

The easiest way to scale most databases is to upgrade the hardware. If your application is running on a single node, it’s usually possible to add some combination of faster disks, more memory, and a beefier CPU to ease any database bottlenecks. The technique of augmenting a single node’s hardware for scale is known as **vertical scaling**, or scaling up. Vertical scaling has the advantages of being simple, reliable, and cost-effective up to a certain point, but eventually you reach a point where it’s no longer feasible to move to a better machine.

It then makes sense to consider **scaling horizontally**, or scaling out. Instead of beefing up a single node, scaling horizontally means distributing the database across multiple machines. A horizontally scaled architecture can run on many smaller, less expensive machines, often reducing your hosting costs. What’s more, the distribution of data across machines mitigates the consequences of failure. 

MongoDB was designed to make horizontal scaling manageable. It does so via a range-based partitioning mechanism, known as sharding, which automatically manages the distribution of data across nodes.

# Comparison to other types of databases.


The following is an excerpt from: Ian Robinson, Jim Webber, Emil Eifrem. *"Graph Databases."*


Volume has become the principal driver behind the adoption of NOSQL stores by organizations. Volume may be defined simply as the size of the stored data.

As is well known, large datasets become unwieldy when stored in relational databases. In particular, query execution times increase as the size of tables and the number of joins grow (so-called join pain). 

But volume isn’t the only problem modern web-facing systems have to deal with. Besides being big, today’s data often changes very rapidly. Velocity is the rate at which data changes over time.


There is another aspect to velocity, which is the rate at which the structure of the data changes. In other words, in addition to the value of specific properties changing, the overall structure of the elements hosting those properties can change as well. This commonly occurs for two reasons. The first is fast-moving business dynamics.



## ACID versus BASE
When we first encounter NOSQL it’s often in the context of what many of us are already familiar with: relational databases. Although we know the data and query model will be different (after all, there’s no SQL), the **consistency models used by NOSQL stores can also be quite different from those employed by relational databases**. Many NOSQL databases use different consistency models to support the differences in volume, velocity, and variety of data discussed earlier.


The **ACID** guarantees provide us with a safe environment in which to operate on data:

  * **Atomic**
    All operations in a transaction succeed or every operation is rolled back.
  * **Consistent**
    On transaction completion, the database is structurally sound.
  * **Isolated**
    Transactions do not contend with one another. Contentious access to state is moderated by the database so that transactions appear to run sequentially.
  * **Durable**
    The results of applying a transaction are permanent, even in the presence of failures.

These properties mean that once a transaction completes, its data is consistent (so-called write consistency) and stable on disk (or disks, or indeed in multiple distinct memory locations). 




In the NOSQL world, ACID transactions have gone out of fashion as stores loosen the requirements for immediate consistency, data freshness, and accuracy in order to gain other benefits, like scale and resilience. Instead of using ACID, the term **BASE** has arisen as a popular way of describing the properties of a more optimistic storage strategy:

  * **Basic availability** The store appears to work most of the time.
  * **Soft-state** Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
  * **Eventual consistency** Stores exhibit consistency at some later point (e.g., lazily at read time).
  
The BASE properties are considerably looser than the ACID guarantees, and there is no direct mapping between them. A BASE store values availability (because that is a core building block for scale), but does not offer guaranteed consistency of replicas at write time. BASE stores provide a less strict assurance: that data will be consistent in the future, perhaps at read time (e.g., Riak), or will always be consistent, but only for certain processed past snapshots (e.g., Datomic).



# A Popular GUI Client - Robomongo

Robomongo (https://robomongo.org) is a cross-platform MongoDB manager.

![](https://robomongo.org/static/screens-transparent-6e2a44fd.png)

You can create a connection to the MongoDB on the VM by creating a new connection with the address `localhost:27017`.



# MongoDB through the JavaScript shell


## Starting the shell

See the instructions in the Vagrant provision script, for an example of how to install MongoDB on a Linux system.

On the VM you can start the MongoDB shell by running the mongo executable:

```bash
mongo
```

### Databases, Collections, and Documents

Imagine we are crating a small social network application and our database has to store different types of documents, like *users*, *comments*, etc.. Those are stored in separate places, i.e., **collections**, which are similar to tables in an RDBMS. MongoDB divides collections into separate databases. Unlike the usual overhead that databases produce in the SQL world, databases in MongoDB are just **namespaces** to distinguish between collections. 


the subsequent tutorial exercises under the same namespace, let’s start by switching to the tutorial database:


```mongo
> use class
switched to db class
```    
    
Why does MongoDB have both databases and collections? The answer lies in how MongoDB writes its data out to disk. All collections in a database are grouped in the same files, so it makes sense, from a memory perspective, to keep related collections in the same database. You might also want to have different applications access the same collections and, it’s also useful to keep your data organized so you’re prepared for future requirements.


### Inserts and Queries

```mongo
> db.users.insert({username: "Møller"})
WriteResult({ "nInserted" : 1 })
```

```mongo
> db.users.find()
{ "_id" : ObjectId("58de3ef059f6af55dbf09bbc"), "username" : "Møller" }
```

#### `_id` Fields in MongoDB

Note that an `_id` field has been added to the document. You can think of the `_id` value as the document’s primary key. Every MongoDB document requires an `_id`, and if one isn’t present when the document is created, a special MongoDB ObjectID will be generated and added to the document at that time.

Let’s continue for now by adding a second user to the collection:

```mongo
> db.users.insert({username: "Hansen"})
WriteResult({ "nInserted" : 1 })
```

There should now be two documents in the collection. Go ahead and verify this by running the count command:Hansen

```mongo
> db.users.count()
2
> db.users.find()
{ "_id" : ObjectId("58de3ef059f6af55dbf09bbc"), "username" : "Møller" }
{ "_id" : ObjectId("58de3fbf59f6af55dbf09bbd"), "username" : "Hansen" }
```


You can also pass a simple query selector to the find method. A query selector is a document that is used to match against all documents in the collection. To query for all documents where the username is `Hansen`, you pass a simple document that acts as your query selector like this:

```mongo
> db.users.find({username: "Hansen"})
{ "_id" : ObjectId("58de3fbf59f6af55dbf09bbd"), "username" : "Hansen" }
```


You can also specify multiple fields in the query predicate, which creates an implicit **AND** among the fields. For example, you query with the following selector:

```mongo
> db.users.find({ _id: ObjectId("58de3fbf59f6af55dbf09bbd"), username: "Hansen" })
{ "_id" : ObjectId("58de3fbf59f6af55dbf09bbd"), "username" : "Hansen" }
```

You can also use MongoDB’s `$and` operator explicitly. The previous query is identical to

```mongo
> db.users.find({ $and: [ { _id: ObjectId("58de3fbf59f6af55dbf09bbd") }, { username: "Hansen" } ] })
{ "_id" : ObjectId("58de3fbf59f6af55dbf09bbd"), "username" : "Hansen" }
```

Selecting documents with an **OR** is similar: just use the `$or` operator.

```mongo
> db.users.find({ $or: [ { username: "Møller" }, { username: "Hansen" } ]})
{ "_id" : ObjectId("58de3ef059f6af55dbf09bbc"), "username" : "Møller" }
{ "_id" : ObjectId("58de3fbf59f6af55dbf09bbd"), "username" : "Hansen" }
```

This example is different than previous ones, because it does not just insert or search for a specific document. Rather, the query itself is a document. The idea of representing commands as documents is used often in MongoDB and may come as a surprise if you’re used to relational databases.

### Updating Documents

There are two general types of updates, with different properties and use cases:
  * Apply modification operations to a document or documents
  * Replace an old document with a new one
  
#### Operator Update

The first type of update involves passing a document with some kind of operator description as the second argument to the update function. Here we see an example of how to use the `$set` operator, which sets a single field to the specified value.

```mongo
> db.users.update({username: "Møller"}, {$set: {country: "Denmark"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.users.find({username: "Møller"})
{ "_id" : ObjectId("58de3ef059f6af55dbf09bbc"), "username" : "Møller", "country" : "Denmark" }
```

#### Replacement Update

Another way to update a document is to replace it rather than just set a field. This is sometimes mistakenly used when an operator update with a `$set` was intended.

```mongo
> db.users.update({username: "Møller"}, {country: "Canada"})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.users.find({username: "Møller"})
> db.users.find({country: "Canada"})
{ "_id" : ObjectId("58de3ef059f6af55dbf09bbc"), "country" : "Canada" }
```

In this case, the document is **replaced** with one that only contains the country field, and the username field is removed because the first document is used only for matching and the second document is used for replacing the document that was previously matched. The `_id` is the same, yet data has been replaced in the update. Be sure to use the `$set` operator if you intend to add or set fields rather than to replace the entire document. 


To  Add the username back to the record:

```mongo
> db.users.update({country: "Canada"}, {$set: {username: "Møller"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.users.find({country: "Canada"})
{ "_id" : ObjectId("58de3ef059f6af55dbf09bbc"), "country" : "Canada", "username" : "Møller" }
```

A value can be removed as easily using the `$unset` operator:

```mongo
> db.users.update({username: "Møller"}, {$unset: {country: 1}} )
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.users.find({username: "Møller"})
{ "_id" : ObjectId("58de3ef059f6af55dbf09bbc"), "username" : "Møller" }
```

#### Updating Complex Data

You are representing your data with documents, which can contain complex data structures. Let’s suppose that, in addi tion to storing profile information, your users can store lists of their favorite things. A document representation might look something like this:

```javascript
{
  username: "Møller",
  favorites: {
    restaurant: ["La Petanque", "Hija de Sanchez"],
    cafe: ["Paludan Bog & Café", "Café Retro", "Conditori La Glace"]
  }
}
```


```mongo
> db.users.update( {username: "Møller"}, { $set: {favorites: { restaurant: ["La Petanque", "Hija de Sanchez"], cafe: ["Paludan Bog & Café", "Café Retro", "Conditori La Glace"] }} })
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.users.update( {username: "Hansen"}, { $set: {favorites: { cafe: ["Vaffelbageren", "Café BoPa", "Conditori La Glace"] }} })
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.users.find().pretty()
{
	"_id" : ObjectId("58de3ef059f6af55dbf09bbc"),
	"username" : "Møller",
	"favorites" : {
		"restaurant" : [
			"La Petanque",
			"Hija de Sanchez"
		],
		"cafe" : [
			"Paludan Bog & Café",
			"Café Retro",
			"Conditori La Glace"
		]
	}
}
{
	"_id" : ObjectId("58de3fbf59f6af55dbf09bbd"),
	"username" : "Hansen",
	"favorites" : {
		"cafe" : [
			"Vaffelbageren",
			"Café BoPa",
            "Conditori La Glace"
		]
	}
}
```

```mongo
> db.users.find({"favorites.cafe": "Conditori La Glace"})
{ "_id" : ObjectId("58de3ef059f6af55dbf09bbc"), "username" : "Møller", "favorites" : { "restaurant" : [ "La Petanque", "Hija de Sanchez" ], "cafe" : [ "Paludan Bog & Café", "Café Retro", "Conditori La Glace" ] } }
{ "_id" : ObjectId("58de3fbf59f6af55dbf09bbd"), "username" : "Hansen", "favorites" : { "cafe" : [ "Vaffelbageren", "Café BoPa", "Conditori La Glace" ] } }
```



#### More Advanced Updates

To see a more involved example, suppose you know that any user who likes `"Café Retro"` also likes `"Lagkagehuset"` and that you want to update your database to reflect this fact. How would you represent this as a MongoDB update?
You could conceivably use the `$set` operator again, but doing so would require you to rewrite and send the entire array of movies. Because all you want to do is to add an ele- ment to the list, you’re better off using either `$push` or `$addToSet`. Both operators add an item to an array, but the second does so uniquely, preventing a duplicate addition. This is the update you’re looking for:

```mongo
> db.users.update( {"favorites.cafe": "Café Retro"}, {$addToSet: {"favorites.cafe": "Lagkagehuset"} }, false, true)
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

```

The third argument, `false`, controls whether an upsert is allowed. This tells the update operation whether it should insert a document if it doesn’t already exist, which has different behavior depending on whether the update is an operator update or a replacement update.
The fourth argument, `true`, indicates that this is a multi-update. By default, a MongoDB update operation will apply only to the first document matched by the query selector. If you want the operation to apply to all documents matched, you must be explicit about that.


### Deleting data

Delete all contents of a collection:

```mongo
> db.foo.remove()
```

Note that the remove() operation doesn’t actually delete the collection; it merely removes documents from a collection. You can think of it as being analogous to SQL’s `DELETE` command.

You often need to remove only a certain subset of a collection’s documents, and for that, you can pass a query selector to the `remove()` method. If you want to remove all users whose favorite café is `"Café Retro"`, you would do:


```mongo
> db.users.remove({"favorites.cafe": "Café Retro"})
WriteResult({ "nRemoved" : 1 })
```


To delete a collection along with all of its indexes, use the `drop()` method:

```mongo
> db.users.drop()
```

### Getting `help`

```mongo
> help
```


```mongo
> db.users.help()
```


### Creating and Querying with Indexes

#### Creating a large collection

An indexing example makes sense only if you have a collection with many documents. So you’ll add 20,000 simple documents to a numbers collection. Because the MongoDB shell is also a JavaScript interpreter, the code to accomplish this is simple:

```mongo
> for(i = 0; i < 20000; i++) { db.numbers.save({num: i}); }
WriteResult({ "nInserted" : 1 })
```


That’s a lot of documents, so don’t be surprised if the insert takes a few seconds to complete. Once it returns, you can run a couple of queries to verify that all the documents are present:
       
```mongo       
> db.numbers.count()
20000
> db.numbers.find()
{ "_id" : ObjectId("58de4ec559f6af55dbf09bbe"), "num" : 0 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bbf"), "num" : 1 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc0"), "num" : 2 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc1"), "num" : 3 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc2"), "num" : 4 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc3"), "num" : 5 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc4"), "num" : 6 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc5"), "num" : 7 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc6"), "num" : 8 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc7"), "num" : 9 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc8"), "num" : 10 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bc9"), "num" : 11 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bca"), "num" : 12 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bcb"), "num" : 13 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bcc"), "num" : 14 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bcd"), "num" : 15 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bce"), "num" : 16 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bcf"), "num" : 17 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bd0"), "num" : 18 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bd1"), "num" : 19 }
Type "it" for more
> db.numbers.find({num: 500})
{ "_id" : ObjectId("58de4ec559f6af55dbf09db2"), "num" : 500 }
```

#### Range Queries

More interestingly, you can also issue range queries using the special `$gt` and `$lt` operators. They stand for greater than and less than, respectively. Here’s how you query for all documents with a num value greater than 199,995:

```mongo
> db.numbers.find( {num: {"$gt": 19995 }} )
{ "_id" : ObjectId("58de4ecb59f6af55dbf0e9da"), "num" : 19996 }
{ "_id" : ObjectId("58de4ecb59f6af55dbf0e9db"), "num" : 19997 }
{ "_id" : ObjectId("58de4ecb59f6af55dbf0e9dc"), "num" : 19998 }
{ "_id" : ObjectId("58de4ecb59f6af55dbf0e9dd"), "num" : 19999 }
```

You can also combine the two operators to specify upper and lower boundaries:
 
```mongo
> db.numbers.find( {num: {"$gt": 20, "$lt": 25 }} )
{ "_id" : ObjectId("58de4ec559f6af55dbf09bd3"), "num" : 21 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bd4"), "num" : 22 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bd5"), "num" : 23 }
{ "_id" : ObjectId("58de4ec559f6af55dbf09bd6"), "num" : 24 }
```     

`$gt` and `$lt` are only two of a host of operators that comprise the MongoDB query language. Others include `$gte` for greater than or equal to, `$lte` for (you guessed it) less than or equal to, and `$ne` for not equal to. You’ll see other operators and many more example queries in later chapters.


#### Indexing and `explain()`

If you’ve spent time working with relational databases, you’re probably familiar with SQL’s `EXPLAIN`, an invaluable tool for debugging or optimizing a query. When any database receives a query, it must plan out how to execute it; this is called a query plan. `EXPLAIN` describes query paths and allows developers to diagnose slow operations by determining which indexes a query has used. Often a query can be executed in multiple ways, and sometimes this results in behavior you might not expect. `EXPLAIN` explains. MongoDB has its own version of `EXPLAIN` that provides the same service. To get an idea of how it works, let’s apply it to one of the queries you just issued. Try running the following on your system:

```mongo
> db.numbers.find({num: {"$gt": 19995}}).explain("executionStats")
{
	"cursor" : "BasicCursor",
	"isMultiKey" : false,
	"n" : 4,
	"nscannedObjects" : 20000,
	"nscanned" : 20000,
	"nscannedObjectsAllPlans" : 20000,
	"nscannedAllPlans" : 20000,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 156,
	"nChunkSkips" : 0,
	"millis" : 5,
	"allPlans" : [
		{
			"cursor" : "BasicCursor",
			"isMultiKey" : false,
			"n" : 4,
			"nscannedObjects" : 20000,
			"nscanned" : 20000,
			"scanAndOrder" : false,
			"indexOnly" : false,
			"nChunkSkips" : 0
		}
	],
	"server" : "ubuntu-xenial:27017",
	"filterSet" : false,
	"stats" : {
		"type" : "COLLSCAN",
		"works" : 20002,
		"yields" : 156,
		"unyields" : 156,
		"invalidates" : 0,
		"advanced" : 4,
		"needTime" : 19997,
		"needFetch" : 0,
		"isEOF" : 1,
		"docsTested" : 20000,
		"children" : [ ]
	}
}
```


What this collection needs is an index. You can create an index for the num key within the documents using the `createIndex()` method. Try entering the following index creation code:

```mongo
> db.numbers.createIndex({num: 1})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.numbers.getIndexes()
[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "users.numbers"
	},
	{
		"v" : 1,
		"key" : {
			"num" : 1
		},
		"name" : "num_1",
		"ns" : "users.numbers"
	}
]
```

The collection now has two indexes. The first is the standard `_id` index that’s automatically built for every collection; the second is the index you created on `num`. The indexes for those fields are called `_id_` and `num_1`, respectively. If you don’t provide a name, MongoDB sets hopefully meaningful names automatically.
If you run your query with the `explain()` method, you’ll now see the dramatic difference in query response time, as shown in the following listing.

```mongo
> db.numbers.find({num: {"$gt": 19995 }}).explain("executionStats")
{
	"cursor" : "BtreeCursor num_1",
	"isMultiKey" : false,
	"n" : 4,
	"nscannedObjects" : 4,
	"nscanned" : 4,
	"nscannedObjectsAllPlans" : 4,
	"nscannedAllPlans" : 4,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"num" : [
			[
				19995,
				Infinity
			]
		]
	},
	"allPlans" : [
		{
			"cursor" : "BtreeCursor num_1",
			"isMultiKey" : false,
			"n" : 4,
			"nscannedObjects" : 4,
			"nscanned" : 4,
			"scanAndOrder" : false,
			"indexOnly" : false,
			"nChunkSkips" : 0,
			"indexBounds" : {
				"num" : [
					[
						19995,
						Infinity
					]
				]
			}
		}
	],
	"server" : "ubuntu-xenial:27017",
	"filterSet" : false,
	"stats" : {
		"type" : "FETCH",
		"works" : 5,
		"yields" : 0,
		"unyields" : 0,
		"invalidates" : 0,
		"advanced" : 4,
		"needTime" : 0,
		"needFetch" : 0,
		"isEOF" : 1,
		"alreadyHasObj" : 0,
		"forcedFetches" : 0,
		"matchTested" : 0,
		"children" : [
			{
				"type" : "IXSCAN",
				"works" : 4,
				"yields" : 0,
				"unyields" : 0,
				"invalidates" : 0,
				"advanced" : 4,
				"needTime" : 0,
				"needFetch" : 0,
				"isEOF" : 1,
				"keyPattern" : "{ num: 1.0 }",
				"isMultiKey" : 0,
				"boundsVerbose" : "field #0['num']: (19995.0, inf.0]",
				"yieldMovedCursor" : 0,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0,
				"matchTested" : 0,
				"keysExamined" : 4,
				"children" : [ ]
			}
		]
	}
}
```



# Connect to MongoDB from a Java Maven Project

https://mongodb.github.io/mongo-java-driver/3.0/driver/getting-started/installation-guide/


```xml
<dependency>
    <groupId>org.mongodb</groupId>
    <artifactId>mongodb-driver</artifactId>
    <version>3.0.4</version>
</dependency>
```




Based on https://mongodb.github.io/mongo-java-driver/3.0/driver/getting-started/quick-tour/

```java
package dk.cphbusiness.db.meassurements;

import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;

/**
 *
 * @author Helge
 */
public class MongoTest {
      
    public static void main(String[] args) {
        MongoClientURI connStr = new MongoClientURI("mongodb://localhost:27017");
        MongoClient mongoClient = new MongoClient(connStr);

        MongoDatabase db = mongoClient.getDatabase("test-database");
        MongoCollection<Document> collection = db.getCollection("tweets");
        
        Document myDoc = collection.find().first();
        System.out.println(myDoc.toJson());
    }
}
```

# Importing Data

You can either write a program which inserts documents into a database or you use MongoDB's CLI import tool.

```bash
mongoimport --drop --db social_net --collection tweets --type csv --headerline --file testdata.manual.2009.06.14.csv
```

### References

This lecture is almost entirely based on chapter one of *"MongoDB in Action, Second Edition"* by Kyle Banker, Peter Bakkum, Shaun Verch, Doug Garrett. Additionally, it incorporates parts of "Appendix A. NOSQL Overview" from *"Graph Databases"* by Ian Robinson, Jim Webber, Emil Eifrem.