# Non Relational Databases

### Objectives:
* Install `mongo` and `pymongo`.
* *Compare and Contrast* SQL and noSQL.
* *Perform* basic operations using Mongo.

## Resources

* [The Little MongoDB Book](http://openmymind.net/mongodb.pdf)
* [PyMongo tutorial](http://api.mongodb.org/python/current/tutorial.html)

## Installing Mongo and PyMongo

### Mongo

You may need to substitute your package manager if using another operating system.

1. Install MongoDB: `brew install mongodb`
2. Start MongoDB: `brew services start mongodb`

#### Do *not* run services as `root`.  Ever.  Even if someone tells you to.

### PyMongo
2. Install PyMongo: `conda install pymongo`

### SQL vs NoSQL

NoSQL does not stand for 'No SQL'. SQL is useful for many things, it's not going away.

> NoSQL ==> "Not Only SQL"

It's a different Paradigm to deal with messy data that does not lend itself to an RDBMS.  It's also very useful as a quick and painless solution to data storage, where a full relational database model takes much thought and investment.

### Mongo Clients

The command line program we use to interact with mongo is a *client*.  It's only job is to send messages to another program, a *server*, which holds all our data and knows how to operate on it.

The command line Mongo client is written in javascript, so interacting with mongo with this client looks like writing javascript code.

<img src="images/client-server.png" width = 500>

There are other clients.  Late on we will use `pymongo` to interact with our databases from python.

## Javascript Object Notation

Javascript Object Notation, or JSON, is a simple data storage and communication protocall.  It was designed by [Douglas Crockford](https://en.wikipedia.org/wiki/Douglas_Crockford) based on the notation Javascipt uses for objects.

It is meant as a lightweight alternative to XML, and has become very popular.

```javascript
{
    name: "TwilightSparkle",
    friends: ["Applejack", "Fluttershy"],
    age: 16,
    gender: "f",
    wings: true,
    horn: true,
    residence: {
        town: "Ponyville",
        address: "15 Gandolfini Lane"}
}
```

It is very similar to nested python data structures like dictionaries and lists but it is important to realize:

**JSON data is TEXT**

JSON is meant to pass data between different programs, it is a **communication protocall**.

A couple important points about json:

  - You cannot use single quotes to enclose strings in a json file of stream, always use **double quotes**.
  - Things that look like `[a, sequence, of, things]` are called **arrays**, and there are only square brackets.
  - Booleans are spelled `true` and `false`.
  - Keys can be **symbols**, they do not have to be strings.

## Working with Mongo DB

### MongoDB Concepts

#### What is it about? 

* MongoDB is a document-oriented database, an alternative to RDBMS, used for storing semi-structured data.
* JSON-like objects form the data model, rather than RDBMS tables.
* No schema, No joins, No transactions.
* Sub-optimal for complicated queries.

#### Structure of the database.

* MongoDB is made up of **databases** which contain within them **collections** (collections are analogous to tables in a SQL type database system).
* A collection is made up of documents (documents are analogous to rows or records).
* Each document is a JSON object made up of key-value pairs (key value paris are analagous to columns and thier data).

So a RDBMS defines columns at the table level, document oriented database defines its fields at a document level.

### Importing Data into Mongo

I created a `unicorns.json` file that can be imported into MongoDB.

```
mongoimport --db unicorns --collection unicorns < unicorns.json
```

**Note**: If you are using linux, you max need to add a switch: `--jsonArray`.

Now start mongo. 

```
mongo
```

You'll be dropped into a shell, similar to bash and python.

### Playing Around

A MongoDB contains a collection of databases, so lets check that the `unicorns` database exists.

```
show dbs
```

To use the `unicorns` database, we simply do the following:

```
use unicorns
```

As mentioned, a database is made of `collection`s, which are containers for the actual stored data.  A `collection` would be analagous to a `table` in a classical relational database, but can contain much more flexible data than a table.


```
db.getCollectionNames()
```

### Inserting Data

To put new data into our database, use the `insert` method.

```javascript
db.unicorns.insert({
    name: "Applejack",
    age: 15,
    friends: ["TwilightSparkle", "Fluttershy"],
    wings: false,
    horn: false
})

db.unicorns.insert({
    name: "Fluttershy",
    age: 15,
    friends: ["Applejack", "TwilightSparkle"],
    wings: true,
    horn: false
})
                 
```

## Querying Data

Without any arguments, `find` dumps all the data in the collection

```javascript
db.unicorns.find()
```

`find` is much more flexible than just that though:

```javascript
// find by single field
db.unicorns.find({name: 'TwilightSparkle'})

// find by presence of field
db.unicorns.find({friends: {$exists : true}})

// find by value in array
db.unicorns.find({friends: 'TwilightSparkle'})

// To return only certain fields
// This says, return only the names of unicorns who are friends with
// twilight sparkle.
db.unicorns.find({friends: 'TwilightSparkle'}, {name: true})
```

**Excercise**: Try to find all the unicorns with wings.  Then find only the friends of unicorns with wings.

## Updating Data

The `$set` operator in mongo sets the value of an attribute in a document.
Note that the dollar signed is used in mongo to denote an **operator**, which you can think of as a command that instructs mongo the behave in a requested way.

```javascript
// Replaces friends array
db.unicorns.update({
    name: 'TwilightSparkle'}, {
    $set: {
        friends: ['Shutterfly', 'Rarity', 'Applejack']}})
```

If you would like to *change* the value of an array, use `$push`

```javascript
// Adds to friends array
db.unicorns.update({
    name: 'Applejack'}, {
    $push: {
        friends: 'Rarity'}})
```

It is important to use the `$set` and `$push` operators, the default behaviour of `update` is to **replace the entire document**.

```javascript
// Replaces the TwighlightSparkle data completely!
// It will no longer have even a name field after this!
db.unicorns.update({
    name: 'TwilightSparkle'}, {
    friends: ['Shutterfly', 'Rarity', 'Applejack']})
```


An `upsert` either creates a document (when it does not already exist) or inserts into an existing document.

```
// Upsert: This one is created
db.unicorns.update({
    name: "Rarity"}, {
    $push: {
        friends: {
            $each: ["TwilightSparkle", "Applejack", "Fluttershy"]}}}, {
    upsert: true})

// Upsert: This one is updated
db.unicorns.update({
    name: "Fluttershy"}, {
    $push: {
        friends: {
            $each: ["Rarity", "PrincessCelestia"]}}}, {
    upsert: true})
```

Note: Syntax highlighting is not enabled for this last code block due to [this bug](https://github.com/jupyter/notebook/issues/2667) in jupyter notebook.

**Excercise**: Enter a unicorn named RainbowDash into the database that is friends with TwilightSparkle, Rarity, and Applejack.

## Deleting Data

**Don't run this one!**

```javascript
db.unicorns.remove({})
```

## PyMongo


`pymongo` allows python to connect to and manipulate MongoDB.

In [1]:
from pymongo import MongoClient
import pprint

In [2]:
# Connect to the hosted MongoDB instance
client = MongoClient('mongodb://localhost:27017/')

In [3]:
db = client.unicorns

In [4]:
# Create a collection called unicorn
unicorns = db.unicorns

In [5]:
unicorns.insert_one({
    'name': 'RainbowDash', 
    'age': 16, 
    'friends': ['TwilightSparkle', 'Applejack', 'Rarity']})

<pymongo.results.InsertOneResult at 0x10fa9a8c8>

In [6]:
unicorns.find().count()

6

In [7]:
pprint.pprint(unicorns.find_one())

{'_id': ObjectId('5b060cfb6d4f1312238c271e'),
 'age': 34,
 'horn': True,
 'name': 'Nightmare Moon',
 'wings': True}


In [8]:
rarity = unicorns.find_one({'name': 'Rarity'})
pprint.pprint(rarity)

None


The same selector strategies can be used for more complex queries in `pymongo`

In [10]:
friend_of_twilight = unicorns.find_one({'friends': 'TwilightSparkle'})
pprint.pprint(friend_of_twilight)

{u'_id': ObjectId('596d2c4d3b80fb5c849ab0f0'),
 u'age': 34,
 u'friends': [u'TwilightSparkle'],
 u'horn': True,
 u'name': u'PrincessCelestia',
 u'wings': True}


To get multiple results back, use `find`, which returns an iterator.

In [9]:
friends_of_twilight = unicorns.find({'friends': 'TwilightSparkle'})
for friend in friends_of_twilight:
    pprint.pprint(friend)

{'_id': ObjectId('5b060cfb6d4f1312238c2720'),
 'age': 34,
 'friends': ['TwilightSparkle'],
 'horn': True,
 'name': 'PrincessCelestia',
 'wings': True}
{'_id': ObjectId('5b0624210402bd591ac92858'),
 'age': 15.0,
 'friends': ['TwilightSparkle', 'Fluttershy'],
 'horn': False,
 'name': 'Applejack',
 'wings': False}
{'_id': ObjectId('5b0624210402bd591ac92859'),
 'age': 15.0,
 'friends': ['Applejack', 'TwilightSparkle'],
 'horn': False,
 'name': 'Fluttershy',
 'wings': True}
{'_id': ObjectId('5b062668ad6c292660d702cf'),
 'age': 16,
 'friends': ['TwilightSparkle', 'Applejack', 'Rarity'],
 'name': 'RainbowDash'}


In [12]:
young_unicorns = unicorns.find({'age': {'$lt': 16}})
for unicorn in young_unicorns[:2]:
    pprint.pprint(unicorn)

{u'_id': ObjectId('596d307dc12460fb29b091ed'),
 u'age': 15.0,
 u'friends': [u'TwilightSparkle', u'Fluttershy'],
 u'horn': False,
 u'name': u'Applejack',
 u'wings': False}
{u'_id': ObjectId('596d3086c12460fb29b091ee'),
 u'age': 15.0,
 u'friends': [u'Applejack', u'TwilightSparkle'],
 u'horn': False,
 u'name': u'Fluttershy',
 u'wings': True}


**Exercise:** Find all the unicorns that have a horn and wings.