## Chapter 7: NoSQL Data Analytics

For the first 50+ years of computing, database developers have made extensive use of SQL relational (table-based) databases. However, the explosive growth of Web- and mobile-based applications have introduced challenges to traditional databases:

    •	The need to store documents, videos, and less “structured” data
    •	The need to scale database storage and processing power up and down based upon demand
    •	The continued need for high performance and reliability

In response to these demands, NoSQL databases have emerged, so named because they do not use SQL. Instead, NoSQL databases use query mechanisms that are not SQL. Further, unlike relational databases that store data with tables, NoSQL databases store data in a less structured way, often using the JavaScript Object Notation (JSON) to store objects. This chapter examines NoSQL database processing through the popular MongoDB database.

MongoDB is an open-source, object-oriented database created to provide a fast, scalable solution for Web applications. MongoDB stores data as JSON objects. The three primary components of a MongoDB database include:

    •	Database: stores data within named collections
    •	Collection: similar to a relational table in that it groups or collects objects
    •	Document: similar to a record in that it contains name-value pairs

This notebook performs operations on a database that has been pre-built for you. The commands used to build this database can be found in the chapter07-database.js script file located in the chapter folder.

*Note: None of the code found in this notebook is valid in the Jupyter environment. You are to run the code in the MongoDB Server instance you prepared using the instructions in *Getting Started with MongoDB*.

# Querying a MongoDB Collection


A MongoDB collection is similar to a table within a relational database(such as MySQL) in that it groups data objects—which MongoDB refers to as documents. A MongoDB database may store many different collections, such as Customers, Products, and Orders.

Before accessing collections, we must first open the database we intend to run queries on. We can do this by way of the use command. 

The use command in MongoDB is unique, with respect to relationable database management systems like SQL, in that it is also the way by which a database is created. By specifying a non-existent database, in the form of use DATABASE_NAME, and then adding data to it, a new database is created.

In this case, we are connecting to the existing database MyBusiness. To do so, we issue the following command:

In [None]:
use MyBusiness

To refer to a specific collection within the current database, you use the notation db.collectionName, where db represents the current database. For example, the statement db.Collection.find() displays all document(records) within the Employees collection, as demonstrated in the following example:

*Note: When you issue MongoDB queries, keep in mind that the statements are case dependent—meaning, db.collection.find() with a lowercase f for find is correct and db.collection.Find() is not.*

In [None]:
############################
# Chapter 7 / Deliverable 1
############################

db.Employees.find({ "Position": "Supervisor" })

Likewise, the following query would list only those documents with the field value State equal to Arizona:

In [None]:
db.Employees.find({ "State": "Arizona" })

If the field that you want to update is part of a nested object, you use dot notation to refer to the field:

    db.collection.find({ "Somefield.NestedField": "value" })

There will often be times when you want to return specific fields for a query result. While the first set of brackets in the find() statement are reserved for queries (as seen in our previous examples), the second is used for projection, which allows us to specify or restrict specific fields in our output.
To direct MongoDB to return a field, you follow the field name with a 1. By default, MongoDB will always return the \_id field. By assigning it a value of 0, you remove it from the result.

The following query uses find to display only employee namees.

In [None]:
db.Employees.find({}, { "Name": 1, "_id": 0 })

If these key-values were specified in the first set of brackets, the request would effectively be "return records where Name = 1, and id = 0." Instead, our statement leaves the  first (query) brackets empty to select all documents in the Employees collection, and in the second, specifies the fields to project to output.

One more example, the following query displays student names and phone numbers:

In [None]:
############################
# Chapter 7 / Deliverable 2
############################

db.Students.find({}, { "Name": 1, "Phone": 1, "_id": 0 })

# Using Relational Operators within a Query

As you perform MongoDB queries, there will be times when you will want to retrieve documents for which the field values are equal to, greater than or equal to, less than or equal to(and so on) a specified value. In such cases, you will use the MongoDB relational operators.

The following query uses $ne to list employees who do not live in Arizona:

In [None]:
db.Employees.find({ "State": { $ne: "Arizona" }})

In a similar way, the following query uses the \$gte operator to display orders for which the price is greater than or equal to \\$39.99:

In [None]:
############################
# Chapter 7 / Deliverable 3
############################

db.Sales.find({ "Total": { $gte: 39.99 }})

# Using Logical Operators

As your MongoDB queries become more complex, there will be many times when you must specify multiple conditions, such as the students who are taking a specific course and who have a GPA greater than 3.0. To specify compound conditions, you use the MongoDB logical operators.

The following query, uses the logical $or operator to display employees who work in Arizona or AZ:

In [None]:
db.Employees.find({ $or: [{ "State": "Arizona"} , { "State": "AZ" }]})

To display the names and phone numbers for such employees, you can modify the query as follows:

In [None]:
############################
# Chapter 7 / Deliverable 4
############################

db.Employees.find({ $or: [{ "State": "Arizona"} , { "State": "AZ" }]}, { Name:1, Phone:1, _id:0 })

The following query lists supervisors who live in Arizona:

In [None]:
db.Employees.find({ $and: [{ "State": "Arizona"} , { "Position": "Supervisor" }]}, { Name:1, Phone:1, _id:0 })

This query selects for sales team members who do not live in California:

In [None]:
db.Employees.find({ $and: [{"State": { $ne: "California"}} , { "Position": "Sales" }]}, { Name:1, State:1, _id:0})

And finally, the following complex query selects for supervisors who work in Arizona:

In [None]:
############################
# Chapter 7 / Deliverable 5
############################

db.Employees.find({ $and: [{ "State": "Arizona" ,  "Position": "Supervisor" }]}, { Name:1, Phone:1, _id:0 })

# Sorting Your Query Results	

Often, when you perform a database query, you will want to sort your results based on a specific field in ascending (lowest to highest) or descending (highest to lowest) order. To sort your MongoDB query results, you use the *sort* method, using JSON to specify the field upon which you want to sort, as well as the sort order. For example, the following query would sort the Employees collection based on the State field:

In [None]:
db.Employees.find().sort({ "State": 1 })  

The value 1 within the JSON specifies that you want to sort the data in ascending order. To sort in descending order, specify the value -1. Take the following example, which selects for Name and State, and then sorts by State in descending order.

In [None]:
db.Employees.find({}, { "Name":1, "State":1, "_id":0 }).sort({ "State": -1 })

The previous query sorted the documents based on a single field. Often there will be times when you will want to sort documents based on two or more fields. In such cases, you comma separate the fields. The following query, for example, will sort the Employees collection by State and then by Position:  

In [None]:
############################
# Chapter 7 / Deliverable 6
############################

db.Employees.find().sort({"State": 1, "Position": 1})

# Limiting the Number of Documents a Query Returns

If you are working with a large collection, there will be times when you will want to limit the number of documents your query returns. You may, for example, only want to quickly view a collection’s general format, as opposed to waiting for all the records to display. To limit the number of documents a query displays, you use the limit method. The following query, for example, limits the results to output to 3 documents:

In [None]:
############################
# Chapter 7 / Deliverable 7
############################

db.Students.find().limit(3)

# Grouping MongoDB Query Results

Often, to analyze data, you will want to perform such operations on groups of data, such as grouping sales by region, students by year, and so on. To help you perform such grouping operations, MongoDB provides the \$group operator. The following query, for example, uses \\$group to display a count of the number documents in the SalesTeam collection:

In [None]:
############################
# Chapter 7 / Deliverable 8
############################

db.SalesTeam.aggregate(
   [
      {
        $group : {
           _id : null,
           count: { $sum: 1 }
        }
      } 
   ]
)

To create an aggregation in MongoDB you use the aggregate method which will process the documents in a collection as an array, moving through each to perform your calculation. Within the method call, the \$group operator specifies the field upon which you want to group and the operation, in this case, \\$sum, which simply adds 1 to the accumulated value for each document—counting the records.

As you analyze MongoDB data, there will be many times when you will need to perform aggregation operations, such as summing and averaging data, determining the minimum or maximum value, or calculating a standard deviation. To help you perform such operations, MongoDB provides the following operators:

   * \$avg
   * \$sum
   * \$max
   * \$min
   * \$stdDevPop
   * \$stdDevSamp


The following query, for example, uses $avg to calculate the average product price:

In [None]:
############################
# Chapter 7 / Deliverable 9
############################

db.Products.aggregate(
   [
     {
       $group:
         {
           _id: "1",
           avgPrice: { $avg: "$Price" }
         }
     }
   ]
)

In a similar way, the following queries display the maximum product price:

In [None]:
#############################
# Chapter 7 / Deliverable 10
#############################

db.Products.aggregate(
   [
     {
       $group:
         {
           _id: "1",
           MaxPrice: { $max: "$Price" }
         }
     }
   ]
)

This query sums the company’s sales:

In [None]:
db.Sales.aggregate(
   [
     {
       $group:
         {
           _id: "1",
           TotalSales: { $sum: "$Total" }
         }
     }
   ]
)

And finally, the following query uses $group to group sales by customer:

In [None]:
db.Sales.aggregate([{ $group: { _id: "$CustomerID", total: { $sum: "$Total" }}}, { $sort: { total: -1 } }])

# Inserting Data into a MongoDB Collection

MongoDB databases store data as documents within a collection. To insert a document (record) into a collection, you use the insert method, specifying the object’s name-value pairs using JSON. The following query, for example, will insert a document into the Students collection:

In [None]:
db.Students.insert( { "StudentID": 601, "Name": "Alvin Dawson", "Year": "F", "Phone": "555-1232", "Email": "AD@gmail.com" });

When you insert document values, the order in which you specify the JSON name-value pairs does not matter. The following queries, with the order of the name-value pairs changed, are each valid queries:

In [None]:
db.Students.insert({ "StudentID": 601, "Year": "F", "Phone": "555-1312", "Email": "MR@gmail.com", "Name": "Manny Reece"  });
db.Students.insert({ "Name": "Ben Salmon", "StudentID": 701, "Year": "F", "Phone": "555-1362", "Email": "BS@gmail.com" });

MongoDB is often described as “less structured” than a traditional relational database. As such, when you insert a document within a collection, you do not need to provide a fixed number of name-value pairs. Instead, you can specify name-value pairs for all the collection fields, some of the fields, or for even more fields than you thought the collection contained. For example, the following queries will each create documents within the Employees collection:

In [None]:
#############################
# Chapter 7 / Deliverable 11
#############################

db.Employees.insert( { "_id": 50101, "Name": "Julie Adams", "Position": "Sales", "Phone": "555-1316", "Region": "West" } );
db.Employees.insert( { "_id": 52345, "Name": "Bobby Lewis", "Phone": "555-1217", "Region": "East" } );
db.Employees.insert( { "_id": 50301, "Name": "Marvin Train", "Position": "Supervisor", "Phone": "555-3218" } );

As you can see, each insert operation specifies a different number of name-value pairs. One even specifies extra fields. MongoDB will insert each of these documents into the collection.

# Updating a MongoDB Document

Just as there will be times when you must insert new documents into a collection, there will also be times when you must update an existing document. To update a MongoDB document, you use the update method, using JSON to specify the fields and values you desire. The following query, for example, updates the employee with the ID 12345 to the position of supervisor:

In [None]:
db.Employees.update({ "_id": 12345}, { $set: { "Position": "Supervisor"}} )

The query uses \$set to specify the field to set. If you do not specify \\$set, MongoDB would update the remaining fields.

In a similar way, the following query updates employees who have the abbreviation AZ within the State field to use instead the complete state name:

In [None]:
db.Employees.update({ "State": "AZ" }, { $set: { "State": "Arizona"}} )

Finally, this query uses $set to update multiple fields for employee 10301:

In [None]:
#############################
# Chapter 7 / Deliverable 12
#############################

db.Employees.update({ "_id": 10301 }, { $set: { "State": "Montana", "Phone": "777-3333" }} )

# Deleting MongoDB Documents 

Just as there will be times when you insert or update documents within a collection, there will also be times when you must delete documents. To delete one or more documents, you use the deleteOne or deleteMany methods. The following query, for example, deletes the employee with the ID 12345:

In [None]:
db.Employees.deleteOne({ "_id": 12345})

Likewise, this query will delete all employees that work in California:

In [None]:
db.Employees.deleteMany( { "State": "California" } )

Be careful when you use the deleteMany method. If you specify delete without any parameter, it will delete all records within the collection:

```json
db.Employees.deleteMany({ })
```

# Creating and Dropping MongoDB Collections and Databases
As you have learned, a MongoDB database stores objects within a named collection. To create a collection, you use the db.createCollection(“collection name”) query. For example, to create a collection named PartTimeEmployees, you would use:

In [None]:
db.createCollection("PartTimeEmployees")

To delete (drop) a collection, you use the drop method. The following query, for example, drops the Students collection:

In [None]:
db.Students.drop() 

Likewise, to delete the current database, you use the db.dropDatabase() method. Be careful when you execute the db.dropDatabase() method, as MongoDB will delete the database even if it contains collections.