Skip to content

Latest commit

 

History

History
1541 lines (1069 loc) · 62.6 KB

README.md

File metadata and controls

1541 lines (1069 loc) · 62.6 KB

MongoDB Best Practices

alt text


Index


Introduction

MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling.

A record in MongoDB is a document, which is a data structure composed of field and value pairs. The values of fields may include other documents, arrays, and arrays of documents.

The advantages of using documents are:

  • Documents (i.e. objects) correspond to native data types in many programming languages.
  • Embedded documents and arrays reduce need for expensive joins.
  • Dynamic schema supports fluent polymorphism.

Documents are stored in collections. A collection in MongoDb is comparable to a table in a relational database, with the difference that the documents in a collection does not need to have the same scema.

Every document has the field "_id". If this field is not provided by the user, it will be automatically generated.


Best Practices

Create indexes

One of the most important things in any data base system are indexes. They are very important to improve query performance, so indexes should be created to support queries. On the other hand, indexes spend space and, more important, maintaining much indexes makes lower inserting performance, so don’t create indexes that queries do not use.

MongoDb always create a default unique _id index for every collection. This index cannot be dropped.

When you create indexes you can define order ascending or descending. Next index types are supported:

  • Single field: an index over a single field of the documents.
  • Compound index: an index over multiple fields. The order can be defined for each of the fields.
  • Multikey index: a special type of index generated by MongoDb when you create an index where at least one field is an array.
  • Geospatial index: index for geospatial coordinate data. MongoDB support 2 types of geospatial index:
    • 2d index uses planar geometry. You can find more information about this type of index here
    • 2dsphere index uses spherical geometry. You can find more information about this type of index here
  • Text index: index to support text queries over string content. It does not store language-specific stop words like “the”, “a”, “or”, etc.
  • Hashed index: indexes the hash of the value of a field instead the value itself. Due to their nature, these indexes only support equality matches and cannot support range-based queries.

Moreover, indexes can have special properties:

  • Unique: If an index is defined as unique, duplicate values for the indexed field will be rejected.
  • Sparse: If an index is defined as sparse, it will skip documents that do not have the indexed field .
  • TTL: It will remove documents from a collection after a certain amount of time.

When you create indexes, you must keep in mind the next limitations:

  • A single collection can have no more than 64 indexes.
  • Fully qualified index names, which includes the namespace and the dot separators (i.e. ..$), cannot be longer than 128 characters.
  • There can be no more than 31 fields in a compound index.
  • Queries cannot use both text and Geospatial Indexes.
  • Fields with 2dsphere Indexes can only hold Geometries.
  • For a compound multikey index, each indexed document can have at most one indexed field whose value is an array. As such, you cannot create a compound multikey index if more than one to-be-indexed field of a document is an array. Or, if a compound multikey index already exists, you cannot insert a document that would violate this restriction. So be careful with it.
  • The total size of an index entry must be less than 1024 bytes.

Always use replica sets

A replica set in MongoDB is a group of mongod processes that maintain the same data set.

Replication provides high availability of your data if node fails in your cluster. Replica Set provides automatic failover mechanism. If your primary node fails, a secondary node will be elected as primary and your cluster will remain functional.
In some cases, replication can provide increased read capacity as clients can send read operations to different servers.

In a replica set there always is a primary node that receives write operations and some secondaries that replicate the data thorught an oplog, applying operations asynchronously.
You should replicate with at least 3 nodes in any MongoDB deployment.

If the primary node goes down, a secondary must be elected to be the new primary. This is done with an election where each node emites a vote. Therefore, it is highly recommended to have an odd number of nodes.
If you have an even number of nodes in your replica set, and you do not want to increase your hardware for break down elections, use an arbiter node. An arbiter is a node of the replica set that do not maintain data but takes part in the election. An arbiter do not maintain data, so it is not elegible to be the primary, thus you do not need a great hardware on it.
Anyway, remember that, although a replica set can have up to 50 members (12 members before version 3.0), there can be no more than seven voting members in a replica set.

In any cases you may want to have secondary nodes in a replica set that are not be able to convert to primary members. In these cases yo must set priority 0 for those nodes.

Aditionally, you could want to have some special node for dedicated task. In this case you should also mark the node as hidden. Doing it, you maintain that node invisible to client applications. Furthermore, in a sharded cluster, balancers do not interact with hidden members. So a hidden member will not receive read operations.

This is an example of hidding the member at the index 0 in the members array of a replica set:

cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[0].hidden = true
rs.reconfig(cfg)

Often, human errors take place. To help to recover these errors when they are detected in a reasonable time, you can set a delayed member in your replica set.
A delayed member is a hidden member that apply de oplog operations with a delay. Therefore you could recover from it the data that you had in realtime nodes the "delayed" time ago.
So, you should apply big enought delay to your expected recover or maintenance duration. On the other hand, the delay must not be big enought to exceed the capacity of the oplog.
Nevertheless, delayed members are not recommended in sharded clusters because of the possible chunk migrations during the delay.

Turn journaling on by default

MongoDB supports write-ahead journaling of operations to facilitate crash recovery and node durability.

Journaling basically holds write-ahead redo logs, in the event of crash, the journal files will be used for recovery and this enables data consistency and durability.

Journal process differs depending on storage engine. Journaling is recommended storage engines that make use of disk, like MMAPv1 or the newer WiredTiger.

Nevertheless, for the new In-Memory Storage Engine since MongoDB Enterprise version 3.2.6, there is no separate journal, because its data is kept in memory.

Your working set should fit in memory (for MMAPv1)

The working set is the set of data and indexes accessed during normal operations. Proper capacity planning is important for a highly performant application. Being able to keep the working set in memory is an important factor in overall cluster performance.
MongoDB works best when the data set can reside in memory. Nothing performs better than a MongoDB instance that does not require disk I/O. Whenever possible select a platform that has more available RAM than your working data set size.

If you notice the number of page faults increasing, there is a very high probability that your working set is larger than your available RAM. When this happens, you should increase your instance RAM size. If you can do it, consider using sharding to increase the amount of available RAM in a cluster.

This is important if you are using MMAPv1 storage engine.

Scale up if your metrics show heavy use

If your instance shows a load over 60% - 65%, you should consider scaling up. Your load should be consistently below this threshold during normal operations. This also impacts recovery and vertical scaling scenarios.

At the moment you identify that you wanted to scale, you should consider sharding. By sharding, MongoDB distributes the data across sharded cluster.

When to shard

In addition to mentioned previously, you should consider sharding if you anticipate a large data set.
Sharding may also help write performance so it is also possible that you may elect to shard even if your data set is small but requires a high amount of updates or inserts.

However, sharding may not be the solution to a bad performance. If you have worse performance than expected, you should reconsider your schema and indexes.

When you decide to shard, a very important decision you must take is the choice of the shard key, since the distribution of the data will depend on it, and affects the overall efficiency and performance of operations within the sharded cluster.
Furthermore shard key is inmutable, so once you create a shard key you will not be able to change it.

To shard a collection use the method sh.shardCollection() specifying the target collection and the shard key.

All sharded collections must have an index that supports the shard key; i.e. the index can be an index on the shard key or a compound index where the shard key is a prefix of the index. So if you want to shard a non empty collection, you previously must create the index. However, if the collection is empty, the index will be created if it does not exist.

The choice of a shard key will depend on your data and their nature. Based on it, you should keep in mind some important criteria.

The first (and obvious) consideration is that every document in the collection must contain the fields that be part of the shard key, otherwise it would be impossible to know where the document should be localized.

For the choice of the shard key, you will have to consider the importance of distribuiton of data, write performance and read performance, and which is more important to you. Depending on it, to choose a shard key as good as possible try to:

  • Create a shard key easily divisible. For it, consider fields (or combination of fields) that can take a large number of distinct values, and that are not expected to have the same value in many different documents. This will provide distribution of data.
  • Create a shard key with high degree of randomness. In this way write operations will be distributed among the cluster, preventing that a single shard becomes a bottleneck. This will provide write performance.
    • Do not create monotonically changing shard key. A shard key on a value that always increases or that always decreases is more likely to distribute inserts to a single shard within the cluster, becoming it a bottleneck.
    • If the nature of your data makes dificult to choose an enought random shard key or you are considering to select a monotically changing field because of other selecting criteria, you may consider to use a hashed index for the shard key. In this case you can only use a single field.
  • Create a shard key that targets a single shard. Thus, balancer can easily redirect queries to specific shard. This will provide read performance.
  • Create a shard key based on a compound index. Selecting a group of fields as the shard key instead of a single field, will facilitate you to get a more ideal shard key.

In addition, when you are going to create the shard key, you must keep in mind the next limitations:

  • If the shard key index is not a hasehd index, then the index, or the prefix part of the index corresponding to shard key must have ascending order.
  • A shard key index cannot be an index that specifies a multikey index, a text index or a geospatial index on the shard key fields.
  • As mentioned, shard key is inmutable.
  • Shard key value in a document is inmutable, this is, you cannot update the values of the shard key fields.
  • A shard key cannot exceed 512 bytes.

Hardware

Keep each mongo instance on its own machine

Mongo instances always try to use as resources as it can. So you should not run more than one instance on the same machine.
If you run more than one mongo instance in a single machine, all of that instances will contend for the same resources.

Don't run MongoDB on 32-bit systems

MongoDB has a 2GB data limit on 32-bit system and 32-bit systems has memory limitations too, so you should have mongo running on a 64-bit processor and not on 32-bit.

Furthermore, since version 3.0 MongoDB has not commercial support for 32bit platforms; and starting in MongoDB 3.2, 32-bit binaries are deprecated and will be unavailable in future releases.

Use Solid State Disks (SSD)

MongoDB has good results and a good price-performance ratio with SATA SSD.

Use SSD if available and economical. Spinning disks can be performant, but the capacity of SSD drives for random I/O operations works well with the update model of MMAPv1.

Commodity (SATA) spinning drives are often a good option, as the random I/O performance increase with more expensive spinning drives is not that dramatic (only on the order of 2x). But using SSD drives may be more effective in increasing I/O throughput.

Prefer local disks

Local disks should be used if possible as network storage can cause high latency and poor performance for your deployment.

With the MMAPv1 storage engine, the Network File System protocol (NFS) is not recommended as you may see performance problems when both the data files and the journal files are hosted on NFS. You may experience better performance if you place the journal on local or iscsi volumes.

With the WiredTiger storage engine, WiredTiger objects may be stored on remote file systems if the remote file system conforms to ISO/IEC 9945-1:1996 (POSIX.1). Because remote file systems are often slower than local file systems, using a remote file system for storage may degrade performance.

If you decide to use NFS, add the following NFS options to your /etc/fstab file: bg, nolock, and noatime.

Use RAID-10

With solid state drives seek-time is significantly reduced. SSDs also provide more gradual performance degradation if the working set no longer fits in memory.

If using disk arrays, the recommend is to use RAID-10, as RAID-5 and RAID-6 do not provide sufficient performance. RAID-0 offers good write performance but limited read performance and insufficient fault tolerance.

Separate Components onto Different Storage Devices

For improved performance, consider separating your database’s data, journal, and logs onto different storage devices, based on your application’s access and write pattern. Mount the components as separate filesystems and use symbolic links to map each component’s path to the device storing it.

For the WiredTiger storage engine, you can also store the indexes on a different storage device.

Swap

Assign swap space for your systems. Allocating swap space can avoid issues with memory contention and can prevent the OOM Killer on Linux systems from killing mongod.

For the MMAPv1 storage engine, the method mongod uses to map files to memory ensures that the operating system will never store MongoDB data in swap space. On Windows systems, using MMAPv1 requires extra swap space due to commitment limits.

For the WiredTiger storage engine, given sufficient memory pressure, WiredTiger may store data in swap space.

MongoDB and NUMA Hardware

Running MongoDB on a system with Non-Uniform Access Memory (NUMA) can cause a number of operational problems, including slow performance for periods of time and high system process usage.

When running MongoDB servers and clients on NUMA hardware, you should configure a memory interleave policy so that the host behaves in a non-NUMA fashion. MongoDB checks NUMA settings on start up when deployed on Linux (since version 2.0) and Windows (since version 2.6) machines. If the NUMA configuration may degrade performance, MongoDB prints a warning.

CPU

MongoDB will deliver better performance on faster CPUs. When running a MongoDB instance with the majority of the data being in memory, clock speed can have a major impact on overall performance.

For the MMAPv1 storage engine, the core speed is clearly more important than the number of cores. Increasing the number of cores will not improve performance significantly.

However the WiredTiger storage engine is multithreaded and can take advantage of additional CPU cores.

Monitoring and testing

An important thing to do once you have determined yor deployment stratedy is to test it with a data set similar to your production data.

Test within the context of your application and against traffic patterns that are representative of your production system. A test environment that does not resemble your production traffic will block you from discovering performance bottlenecks and architectural design flaws.

MongoDB provides some tools that will help you to test and monitor your deployment.

It provides some products like Mongo MMS, now called MongoDB Cloud Manager, a SaaS based tool that monitors your MongoDB cluster and makes it easy for you to see what's going on in a production deployment; or MongoDB Ops Manager, available in MongoDB Enterprise Advanced.

Some useful monitoring utilities in MongoDB are:

  • mongostat captures and returns the counts of database operations by type (e.g. insert, query, update, delete, etc.). These counts report on the load distribution on the server. More info about this utility here
  • mongotop tracks and reports the current read and write activity of a MongoDB instance, and reports these statistics on a per collection basis. More info about this utility here
  • HTTP console (Deprecated since version 3.2) is a HTTP interface for MongoDB that exposes diagnostic and monitoring information in a simple web page.

Also provides some useful commands:

  • serverStatus returns a general overview of the status of the database, detailing disk usage, memory use, connection, journaling, and index access. The command returns quickly and does not impact MongoDB performance. More info about this command here
  • dbStats returns a document that addresses storage use and data volumes. The dbStats reflect the amount of storage used, the quantity of data contained in the database, and object, collection, and index counters. More info about this command here
  • collStats provides statistics that resemble dbStats on the collection level, including a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about its indexes. More info about this command here
  • replSetGetStatus returns an overview of your replica set’s status. The replSetGetStatus document details the state and configuration of the replica set and statistics about its members. More info about this command here

There are also some third party tools:

  • Self Hosted (you must install, configure and maintain on your own servers). Most are open source.
    • mongodb-ganglia: Python script to report operations per second, memory usage, btree statistics, master/slave status and current connections.
    • gmond_python_modules: Parses output from the serverStatus and replSetGetStatus commands.
    • motop: Realtime monitoring tool for MongoDB servers. Shows current operations ordered by durations every second.
    • mtop: A top like tool.
    • mongo-munin: Retrieves server statistics.
    • mongomom: Retrieves collection statistics (sizes, index sizes, and each (configured) collection count for one DB).
    • nagios-plugin-mongodb: A simple Nagios check script, written in Python.
    • spm-agent-mongodb: Monitoring, Anomaly Detection and Alerting SPM monitors all key MongoDB metrics together with infrastructure incl. Docker and other application metrics e.g. Node.js, Java, NGINX, Apache, HAProxy or Elasticsearch. SPM is available On Premises and in the Cloud (SaaS) and provides correlation of metrics and logs.
  • Hosted (SaaS) Monitoring Tools. These are monitoring tools provided as a hosted service, usually through a paid subscription. More info here

MongoDB also can provide database metrics via SNMP, available for Linux and Windows, but only in MongoDB Enterprise Advanced.

Keep current with versions

Keep your version of MongoDB current. Each release has significant performance enhancements, improvements and fixes.


Storage Engine

This component is responsible of managing how data is stored, both in memory and on disk. There are many storage engines that you can choose to adapt to your application with the purpose of obtaining an improvement of performance. In MongoDB 3.2 the default storage engine is WiredTiger. It provides a document-level concurrency model, checkpointing, compression and other features. On lower versions the default storage engine is MMAPv1, which works well with high volumes of reads and writes, as well as in place updates.

WiredTiger Storage Engine

WiredTiger designed to maximize the performance of multi-core hardware and minimize the disk access thanks to the use of a compact file format and data compression. It provides a set of utilities for storage, which are detailed below:

####Document level concurrency

WiredTiger uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time, Which substantially increases the performance of MongoDB.

For most read and write operations, WiredTiger uses optimistic concurrency control. WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.

####Snapshots and Checkpoints

WT provides a MultiVersion Concurrency Control (MVCC) that uses a point-in-time snapshot of the data to present a consistent view of the in-memory data.

When a write operation arrives, WT writes all data in a snapshot to disk. This data now acts as a checkpoint that ensures the data files are consistent up to and including the last checkpoint. This makes possible that a checkpoint can be used for recovering purposes. MongoDB configures WT to create checkpoints at intervals of 60 seconds or 2 gigabytes of journal data.

The new checkpoint becomes accessible and permanent when WiredTiger’s metadata table is atomically updated to reference the new checkpoint. Once the new checkpoint is accessible, WT frees pages from the old checkpoints.

You can recover from the last checkpoint using WT, but all the data written since the last checkpoint will be lost. To can recovery this data you must to use journaling.

####Journal

WT uses a write-ahead transaction log in combination with checkpoints to ensure data durability. WT uses journaling to persist all write operations between checkpoints. The journal is compressed using snappy library. Journaling is important for standalone instances to avoid losing data when Mongo falls down between checkpoints. It isn’t as critical for replica set members because the replication process provides sufficient durability for our data.

####Compression

WT uses block compression with the snappy library for all collections and a prefix compression for all indexes. We can use for collections the zlib compression library too. Compression settings are also configurable on a per-collection and per-index basis during collection and index creation.

####Memory use

With WT, Mongo uses the internal cache of WT and the filesystem cache. By default, WT cache will use:

  • In version 3.0, 50% of RAM, or 1 GB, whichever is larger.
  • In version 3.2, 60% of RAM minus 1 GB, or 1GB, whichever is larger.
  • In version 3.4, 50% of RAM minus 1 GB, or 256MB, whichever is larger.

MMAPv1 Storage Engine

MMAPv1 is MongoDB’s original storage engine based on memory mapped files. It excels at workloads with high volume inserts, reads, and in-place updates.

####Journal

MongoDB, by default, records all modifications to an on-disk journal. MongoDB writes more frequently to the journal than it writes the data files (MongoDB writes to the data files on disk every 60 seconds and writes to the journal files roughly every 100 milliseconds). The journal allows MongoDB to successfully recover data from data files after a mongod instance exits without flushing all changes.

####Record Storage Characteristics

All records are contiguously located on disk, and when a document becomes larger than the allocated record, MongoDB must allocate a new record.

New allocations require MongoDB to move a document and update all indexes that refer to the document, which takes more time than in-place updates and leads to storage fragmentation. To avoid this situation we can use two different record allocation strategies:

  • Power of 2 sized allocations: Because documents in MongoDB may grow after insertion and all records are contiguous on disk, the padding can reduce the need to relocate documents on disk following updates. Relocations are less efficient than in-place updates and can lead to storage fragmentation. As a result, all padding strategies trade additional space for increased efficiency and decreased fragmentation.

    With the power of 2 sizes allocation strategy, each record has a size in bytes that is a power of 2 (e.g. 32, 64, 128, 256, 512 ... 2 MB). For documents larger than 2 MB, the allocation is rounded up to the nearest multiple of 2 MB.
    This allows the document to grow without having to reallocate it and to reduce fragmentation, because a new document can reuse freed records.

    This strategy works more efficient for insert/update/delete workloads.

  • No padding allocation: This strategy can be used for collections whose workloads do not change the document sizes, such as workloads that consist of insert-only operations or update operations that do not increase document size.

####Memory use

MMAPv1 uses all free memory on the machine as its cache. System resource monitors show that MongoDB uses a lot of memory, but its usage is dynamic. If another process suddenly needs half the server’s RAM, MongoDB will yield cached memory to the other process. Technically, the operating system’s virtual memory subsystem manages MongoDB’s memory. This means that MongoDB will use as much free memory as it can, swapping to disk as needed. Deployments with enough memory to fit the application’s working data set in RAM will achieve the best performance.

In-Memory Storage Engine

The in-memory storage engine is part of general availability (GA) in the 64-bit builds. This kind of memory doesn't persist the data on disk, including configuration data, indexes, user credentials, etc.

By avoiding disk I/O, the in-memory storage engine allows for more predictable latency of database operations.

####Concurrency

The in-memory storage engine uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time.

####Memory use

By default, the in-memory storage engine uses 50% of physical RAM minus 1 GB.

####Durability

The in-memory storage engine is non-persistent and does not write data to a persistent storage. That is non-persisted data includes application data and system data, such as users, permissions, indexes, replica set configuration, sharded cluster configuration, etc. As such, the concept of journal or waiting for data to become durable does not apply to the in-memory storage engine.

Write operations that specify a write concern journaled are acknowledged immediately. When a mongod instance shuts down, either as result of the shutdown command or due to a system error, recovery of in-memory data is impossible.

CRUD cheatsheet

We are going to take a look at the CRUD operations in MongoDB. Through these operations we can see and manipulate the content of our collections. A collection stores a series of documents that should be linked together (although MongoDB is schemaless, the collections should store documents with a same functional sense).

Here we have some records of the zips collection as an example:

[
	{"_id":"07017","city":"EAST ORANGE","loc":[-74.207723,40.769614],"pop":41737,"state":"NJ"},
	{"_id":"06040","city":"MANCHESTER","loc":[-72.52444,41.777732],"pop":51618,"state":"CT"},
	{"_id":"10011","city":"NEW YORK","loc":[-73.99963,40.740225],"pop":46560,"state":"NY"},
	{"_id":"10468","city":"BRONX","loc":[-73.900259,40.866231],"pop":65854,"state":"NY"},
	{"_id":"11225","city":"BROOKLYN","loc":[-73.954588,40.662776],"pop":66752,"state":"NY"},
	{"_id":"22110","city":"MANASSAS","loc":[-77.489474,38.768922],"pop":50680,"state":"VA"},
	{"_id":"37311","city":"CLEVELAND","loc":[-84.875006,35.131257],"pop":40633,"state":"TN"},
						.
						.
						.
	{"_id":"99205","city":"SPOKANE","loc":[-117.439912,47.69641],"pop":42032,"state":"WA"}						
]

MongoDB provides a series of commands for performing operations to create, delete, query and update documents. Please bear in mind that these operations are based on the version 3.2 of MongoDB. These operations are summarized below:

Create documents

This operation creates a document into a collection given. If the collection does not currently exist, insert operations will create the collection. When a document is created, each document stored in a collection requires a unique _id field that acts as a primary key. If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId for the _id field. All write operations in MongoDB are atomic on the level of a single document.

MongoDB provides the following methods for inserting documents into a collection:

db.collection.insertOne()

This operation inserts a single document into a collection.

db.getCollection('zips').insertOne({_id:"28015", city:"Madrid", loc:[40.418889,-3.691944], pop:6543031})

db.collection.insertMany()

It inserts a set of documents into a collection. These documents are informed within an array:

	db.getCollection('zips').insertMany([{"_id":"27001","city":"Lugo","loc":[43.011667,-7.557222],"pop":98134},
  	{"_id":"27002","city":"Lugo","loc":[43.011667,-7.557222],"pop":98134},
  	{"_id":"27003","city":"Lugo","loc":[43.011667,-7.557222],"pop":98134},
  	{"_id":"27004","city":"Lugo","loc":[43.011667,-7.557222],"pop":98134}])

db.collection.insert()

This operation inserts a single document or multiple documents into a collection. To insert a single document, pass a document to the method; to insert multiple documents, pass an array of documents to the method.

db.getCollection('zips').insert({_id:"28015", city:"Madrid", loc:[ 40.418889,-3.691944], pop: 6543031})
db.getCollection('zips').insert([{"_id":"27001","city":"Lugo","loc":[43.011667,-7.557222],"pop":98134},
 	{"_id":"27002","city":"Lugo","loc":[43.011667,-7.557222],"pop":98134},
  {"_id":"27003","city":"Lugo","loc":[43.011667,-7.557222],"pop":98134},
  {"_id":"27004","city":"Lugo","loc":[43.011667,-7.557222],"pop":98134}])

First of all, don’t worry about the data modeling of the examples. No matter if the collection zips is modeled appropriately.

On the other hand, you should bear in mind that there are other forms of creation of documents in a collection. The update operations with the option upsert with a true value persists the documents if they don’t exist.

Read documents

MongoDB provides the db.collection.find() method to read documents from a collection. The db.collection.find() method returns a cursor to the matching documents.

db.collection.find( <query filter>, <projection> )

To use the db.collection.find() method, you can specify the following optional fields:

  • a query filter to specify which documents to return.

  • a query projection to specify which fields from the matching documents to return. The projection limits the amount of data that MongoDB returns to the client over the network.

You can optionally add a cursor modifier to impose limits, skips, and sort orders:

  • sort

This operation specifies the order in which the query returns matching documents. You can specify in the sort parameter the field or fields to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively.

db.orders.find().sort( { amount: -1 } )

The following sample document specifies a descending sort by the age field and then an ascending sort by the posts field:

db.user.find().sort( { age : -1, posts: 1 }} )
  • limit

This operation specifies the maximum number of documents the cursor will return.

db.collection.find(<query>).limit(<number>)
  • skip

This operation controls where MongoDB begins returning results. This approach may be useful in implementing “paged” results. For use it, you must specify a numeric value.

Consider the following JavaScript function as an example of the skip function:

function printStudents(pageNumber, nPerPage) {
   print("Page: " + pageNumber);
   db.students.find().skip(pageNumber > 0 ? ((pageNumber-1)*nPerPage) : 0).limit(nPerPage).forEach( function(student) { print(student.name + "<p>"); } );
}

Now lets see a few uses of this find operation.

Select All Documents in a Collection

An empty query filter document ({}) selects all documents in the collection:

db.users.find( {} )

Specify Query Filter Conditions

We can use a query filter to specify condition for the query:

Specify Equality Condition
db.cities.find( { name: "Roma" } )
Specify Conditions Using Query Operators

A query filter document can use the query operators to specify conditions. Although you can express this query using the $or operator, use the $in operator rather than the $or operator when performing equality checks on the same field.

db.users.find( { status: { $in: [ "P", "D" ] } } )
Specify OR Conditions

Using the $or operator, you can specify a compound query that joins each clause with a logical OR conjunction so that the query selects the documents in the collection that match at least one condition.

db.zips.find({$or:[{pop: {$lt:2000}}, {state: "MA"}]})
Specify AND as well as OR Conditions

With additional clauses, you can specify precise conditions for matching documents.

db.getCollection('zips').find({ city:"BELCHERTOWN", $or: [ { pop: { $lt: 30000 } }, { state: "MA" } ]  })

Query on Embedded Documents

When the field holds an embedded document, a query can either specify an exact match on the embedded document or specify a match by individual fields in the embedded document using the dot notation.

Exact Match on the Embedded Document

To specify an exact equality match on the whole embedded document, use the query document { : } where is the document to match. Equality matches on an embedded document require an exact match of the specified , including the field order.

db.getCollection('zips').find({"_id":"97206","city":"PORTLAND","loc":[-122.59727,45.483995],"pop":43134,"state":"OR"})
Equality Match on Fields within an Embedded Document

Use the dot notation to match by specific fields in an embedded document. Equality matches for specific fields in an embedded document will select documents in the collection where the embedded document contains the specified fields with the specified values.

db.users.find( { "favorites.sports": "Football" } )

Query on Arrays

When the field holds an array, you can query for an exact array match or for specific values in the array. If the array holds embedded documents, you can query for specific fields in the embedded documents using dot notation.

Exact Match on an Array

If you want to match an array inside a document you have to inform the entire array.

db.employee.find({courses: ["Phyton","MongoDB"]})
Match an Array Element

Equality matches can specify a single element in the array to match. These specifications match if the array contains at least one element with the specified value.

db.employee.find({courses:"NodeJs"})
Match a Specific Element of an Array

Equality matches can specify equality matches for an element at a particular index or position of the array using the dot notation.

db.employee.find({"courses.0":"Polymer"})
Specify Multiple Criteria for Array Elements

Single element satisfies the criteria

Use $elemMatch operator to specify multiple criteria on the elements of an array such that at least one array element satisfies all the specified criteria.

db.employee.find({ certification_notes: { $elemMatch: { $gt: 5, $lte: 10 } } })

Combination of elements satisfies the criteria

If you don't use the $elemMatch operator one element can satisfy one condition and another element can satisfy the other condition, or a single element can satisfy both:

db.user.find({ finished: { $gt: 15, $lt: 20 })
Array of Embedded Documents

Take the following json example:

{
	"_id" : ObjectId("585a53a9bfd4a35c91fd9cac"),
	"address" : {
		"building" : "1007",
		"coord" : [
			-73.856077,
			40.848447
		],
		"street" : "Morris Park Ave",
		"zipcode" : "10462"
	},
	"borough" : "Bronx",
	"cuisine" : "Bakery",
	"grades" : [
		{
			"date" : ISODate("2014-03-03T00:00:00Z"),
			"grade" : "A",
			"score" : 2
		},
		{
			"date" : ISODate("2013-09-11T00:00:00Z"),
			"grade" : "A",
			"score" : 6
		},
		{
			"date" : ISODate("2013-01-24T00:00:00Z"),
			"grade" : "A",
			"score" : 10
		},
		{
			"date" : ISODate("2011-11-23T00:00:00Z"),
			"grade" : "A",
			"score" : 9
		},
		{
			"date" : ISODate("2011-03-10T00:00:00Z"),
			"grade" : "B",
			"score" : 14
		}
	],
	"name" : "Morris Park Bake Shop",
	"restaurant_id" : "30075445"
}

Match a field in the embedded document using the array index

If you know the array index of the embedded document, you can specify the document using the embedded document’s position using the dot notation.

db.restaurants.find( { 'grades.0.score': { $lte: 55 } } )

Combination of elements satisfies the criteria

If you do not know the index position of the document in the array, concatenate the name of the field that contains the array, with a dot (.) and the name of the field in the embedded document.

db.restaurants.find( { 'grades.score': { $lte: 55 } } )
Specify Multiple Criteria for Array of Documents

Simple element satisfies the criteria

Use $elemMatch operator to specify multiple criteria on an array of embedded documents such that at least one embedded document satisfies all the specified criteria.

db.users.find( { grades: { $elemMatch: { score: { $lte: 70 }, grade: "A" } } } )

Combination of elements satisfies the criteria

If you don't use the $elemMatch operator one element can satisfy one condition and another element can satisfy the other condition, or a single element can satisfy both:

db.users.find( { "grades.score": { $lte: 70 } , "grades.grade": "A"} ) 

Query operators

Comparison
  • $eq

Specifies equality condition. The $eq operator matches documents where the value of a field equals the specified value.

db.inventory.find( { qty: { $eq: 20 } } )
  • $gt

Selects those documents where the value of the field is greater than the specified value.

db.inventory.find( { qty: { $gt: 20 } } )
  • $gte

Selects the documents where the value of the field is greater than or equal to (i.e. >=) a specified value (e.g. value.)

db.inventory.find( { qty: { $gte: 20 } } )
  • $lt

Selects the documents where the value of the field is less than the specified value.

db.inventory.find( { qty: { $lt: 20 } } )
  • $lte

Selects the documents where the value of the field is less than or equal to the specified value.

db.inventory.find( { qty: { $lte: 20 } } )
  • $ne

Selects the documents where the value of the field is not equal (i.e. !=) to the specified value. This includes documents that do not contain the field.

db.inventory.find( { qty: { $ne: 20 } } )
  • $in

The $in operator selects the documents where the value of a field equals any value in the specified array.

db.inventory.update(
     { tags: { $in: ["appliances", "school"] } },
     { $set: { sale:true } }
   )
  • $nin

Selects the documents where the field value is not in the specified array or the field does not exist.

db.inventory.find( { qty: { $nin: [ 5, 15 ] } } )
Logical
  • $or

Joins query clauses with a logical OR, returning all documents that match the conditions of either clause.

db.inventory.find( { $or: [ { quantity: { $lt: 20 } }, { price: 10 } ] } )
  • $and

Joins query clauses with a logical AND, returning all documents that match the conditions of both clauses.

db.inventory.find( { $and: [ { price: { $ne: 1.99 } }, { price: { $exists: true } } ] } )
  • $not

Inverts the effect of a query expression and returns documents that do not match the query expression.

db.inventory.find( { price: { $not: { $gt: 1.99 } } } )
  • $nor

Joins query clauses with a logical NOR, returning all documents that fail to match both clauses.

db.inventory.find( { $nor: [ { price: 1.99 }, { sale: true } ]  } )
Element
  • $exists

Matches documents that have the specified field.

db.inventory.find( { qty: { $exists: true, $nin: [ 5, 15 ] } } )
  • $type

Selects documents if a field is of the specified type.

{ field: { $type: <BSON type number> | <String alias> } }
db.addressBook.find( { "zipCode" : { $type : "string" } } );
db.addressBook.find( { "zipCode" : { $type : 2 } } );
Evaluation
  • $mod

Performs a modulo operation on the value of a field and selects documents with a specified result.

db.inventory.find( { qty: { $mod: [ 4, 0 ] } } )
  • $regex

Selects documents where values match a specified regular expression.

db.products.find( { description: { $regex: /^S/, $options: 'm' } } )
  • $text

Performs text search.

{
  $text:
    {
      $search: <string>,
      $language: <string>,
      $caseSensitive: <boolean>,
      $diacriticSensitive: <boolean>
    }
}
db.articles.find( { $text: { $search: "coffee" } } )
db.articles.find( { $text: { $search: "\"coffee shop\"" } } )
db.articles.find( { $text: { $search: "leche", $language: "es" } })
db.articles.find( { $text: { $search: "Coffee", $caseSensitive: true } } )
  • $where

Matches documents that satisfy a JavaScript expression.

db.myCollection.find( { $where: "obj.credits == obj.debits" } );
db.myCollection.find( { $where: function() { return obj.credits == obj.debits; } } );
db.myCollection.find( { active: true, $where: function() { return obj.credits - obj.debits < 0; } } );
Geospatial
  • $geoWithin

Selects documents with geospatial data that exists entirely within a specified shape. When determining inclusion, MongoDB considers the border of a shape to be part of the shape, subject to the precision of floating point numbers.

db.places.find(
   {
     loc: {
       $geoWithin: {
          $geometry: {
             type : "Polygon" ,
             coordinates: [ [ [ 0, 0 ], [ 3, 6 ], [ 6, 1 ], [ 0, 0 ] ] ]
          }
       }
     }
   }
)
  • $geoIntersects

Selects documents whose geospatial data intersects with a specified GeoJSON object; i.e. where the intersection of the data and the specified object is non-empty. This includes cases where the data and the specified object share an edge.

db.places.find(
   {
     loc: {
       $geoIntersects: {
          $geometry: {
             type: "Polygon" ,
             coordinates: [
               [ [ 0, 0 ], [ 3, 6 ], [ 6, 1 ], [ 0, 0 ] ]
             ]
          }
       }
     }
   }
)
  • $near

Specifies a point for which a geospatial query returns the documents from nearest to farthest. The $near operator can specify either a GeoJSON point or legacy coordinate point.

db.places.find(
   {
     location:
       { $near :
          {
            $geometry: { type: "Point",  coordinates: [ -73.9667, 40.78 ] },
            $minDistance: 1000,
            $maxDistance: 5000
          }
       }
   }
)
  • $nearSphere

Returns geospatial objects in proximity to a point on a sphere. Requires a geospatial index. The 2dsphere and 2d indexes support $nearSphere.

db.places.find(
   {
     location: {
        $nearSphere: {
           $geometry: {
              type : "Point",
              coordinates : [ -73.9667, 40.78 ]
           },
           $minDistance: 1000,
           $maxDistance: 5000
        }
     }
   }
)

Array

  • $all

The $all operator selects the documents where the value of a field is an array that contains all the specified elements.

db.articles.find( { tags: { $all: [ [ "ssl", "security" ] ] } } )
  • $elemMatch

The $elemMatch operator matches documents that contain an array field with at least one element that matches all the specified query criteria.

db.scores.find(
   { results: { $elemMatch: { $gte: 80, $lt: 85 } } }
)
  • $size

The $size operator matches any array with the number of elements specified by the argument.

db.collection.find( { field: { $size: 2 } } );
Bitwise
  • $bitsAllSet

$bitsAllSet matches documents where all of the bit positions given by the query are set (i.e. 1) in field.

db.collection.find( { a: { $bitsAllSet: 35 } } )
  • $bitsAnySet

$bitsAnySet matches documents where any of the bit positions given by the query are set (i.e. 1) in field.

db.collection.find( { a: { $bitsAnySet: [ 1, 5 ] } } )
  • $bitsAllClear

$bitsAllClear matches documents where all of the bit positions given by the query are clear (i.e. 0) in field.

db.collection.find( { a: { $bitsAllClear: [ 1, 5 ] } } )
  • $bitsAnyClear

$bitsAnyClear matches documents where any of the bit positions given by the query are clear (i.e. 0) in field.

db.collection.find( { a: { $bitsAnyClear: [ 1, 5 ] } } )
Comments
  • $comment

The $comment query operator associates a comment to any expression taking a query predicate.

db.records.find(
   {
     x: { $mod: [ 2, 0 ] },
     $comment: "Find even values."
   }
)

Projection Operators

  • $

The positional $ operator limits the contents of an array from the query results to contain only the first element matching the query document. To specify an array element to update, see the positional $ operator for updates.

Use $ in the projection document of the find() method or the findOne() method when you only need one particular array element in selected documents.

db.students.find( { grades: { $elemMatch: {
		    mean: { $gt: 70 },
		    grade: { $gt:90 }
		  } } },
{ "grades.$": 1 } )
  • $elemMatch

The $elemMatch operator limits the contents of an array field from the query results to contain only the first element matching the $elemMatch condition.

db.schools.find( { zipcode: "63109" },
    { students: { $elemMatch: { school: 102 } } } )
  • $meta

The $meta projection operator returns for each matching document the metadata (e.g. "textScore") associated with the query.

db.collection.find(
   [query],
   { score: { $meta: "textScore" } }
)
  • _$slice

The $slice operator controls the number of items of an array that a query returns. For information on limiting the size of an array during an update with $push, see the $slice modifier instead.

db.collection.find( { field: value }, { array: {$slice: count } } );

Update documents

MongoDB defines a series of operations to update records on your collections.

db.collection.updateOne()

This operation updates a single document within the collection based on the filter.

db.collection.updateOne(
   <filter>,
   <update>,
   {
     upsert: <boolean>,
     writeConcern: <document>,
     collation: <document>
   }
)

Let's look each parameter in detail:

  • filter: The selection criteria for the update. It's the same query selectors as in the find() method are available. You can specify an empty document { } to update the first document returned in the collection.

  • update: The modifications to apply. Later we will go deeper into the update operators.

  • upsert: Optional. Defaults to false. When true, updateOne() either:

    • Creates a new document if no documents match the filter.
    • Updates a single document that matches the filter.
  • writeConcern: Optional. A document expressing the write concern. Write concern describes the level of acknowledgement requested from MongoDB for write operations. If you want to use the default write concern you only have to omit it.

    • The w option to request acknowledgment that the write operation has propagated to a specified number of mongod instances or to mongod instances with specified tags.
    • The j option to request acknowledgement that the write operation has been written to the journal.
    • The wtimeout option to specify a time limit to prevent write operations from blocking indefinitely.
{ w: <value>, j: <boolean>, wtimeout: <number> }
  • collation: Optional. Specifies the collation to use for the operation. Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.
{
   locale: <string>,
   caseLevel: <boolean>,
   caseFirst: <string>,
   strength: <int>,
   numericOrdering: <boolean>,
   alternate: <string>,
   maxVariable: <string>,
   backwards: <boolean>
}

db.collection.updateMany()

Updates multiple documents within the collection based on the filter.

db.collection.updateMany(
   <filter>,
   <update>,
   {
     upsert: <boolean>,
     writeConcern: <document>,
     collation: <document>
   }
)

db.collection.update()

Modifies an existing document or documents in a collection. The method can modify specific fields of an existing document or documents or replace an existing document entirely, depending on the update parameter.

By default, the update() method updates a single document. Set the Multi Parameter to update all documents that match the query criteria.

db.collection.update(
   <query>,
   <update>,
   {
     upsert: <boolean>,
     multi: <boolean>,
     writeConcern: <document>
   }
)

db.collection.replaceOne()

Replaces a single document within the collection based on the filter.

db.collection.replaceOne(
   <filter>,
   <replacement>,
   {
     upsert: <boolean>,
     writeConcern: <document>,
     collation: <document>
   }
)

Update operators

Fields
  • $inc

The $inc operator increments a field by a specified value and has the following form:

db.products.update(
   { sku: "abc123" },
   { $inc: { quantity: -2, "metrics.orders": 1 } }
)
  • $mul

Multiply the value of a field by a number.

db.products.update(
   { _id: 1 },
   { $mul: { price: 1.25 } }
)
  • $rename

The $rename operator updates the name of a field

db.students.update( { _id: 1 }, { $rename: { 'nickname': 'alias', 'cell': 'mobile' } } )
  • $setOnInsert

If an update operation with upsert: true results in an insert of a document, then $setOnInsert assigns the specified values to the fields in the document. If the update operation does not result in an insert, $setOnInsert does nothing.

db.products.update(
  { _id: 1 },
  {
     $set: { item: "apple" },
     $setOnInsert: { defaultQty: 100 }
  },
  { upsert: true }
)
  • $set

The $set operator replaces the value of a field with the specified value.

db.products.update(
   { _id: 100 },
   { $set:
      {
        quantity: 500,
        details: { model: "14Q3", make: "xyz" },
        tags: [ "coats", "outerwear", "clothing" ]
      }
   }
)
  • $unset

The $unset operator deletes a particular field.

db.products.update(
   { sku: "unknown" },
   { $unset: { quantity: "", instock: "" } }
)
  • $min

The $min updates the value of the field to a specified value if the specified value is less than the current value of the field.

db.scores.update( { _id: 1 }, { $min: { lowScore: 150 } } )
  • $max

The $max operator updates the value of the field to a specified value if the specified value is greater than the current value of the field.

db.scores.update( { _id: 1 }, { $max: { highScore: 870 } } )
  • $currentDate

The $currentDate operator sets the value of a field to the current date, either as a Date or a timestamp. The default type is Date.

db.users.update(
   { _id: 1 },
   {
     $currentDate: {
        lastModified: true,
        "cancellation.date": { $type: "timestamp" }
     },
     $set: {
        status: "D",
        "cancellation.reason": "user request"
     }
   }
)
Array
  • $

The positional $ operator identifies an element in an array to update without explicitly specifying the position of the element in the array. To project, or return, an array element from a read operation, see the $ projection operator.

When used with update operations, e.g. db.collection.update():

  • the positional $ operator acts as a placeholder for the first element that matches the query document.
  • the array field must appear as part of the query document.

For example, if you want to update 80 to 82 in the grades array in the first document, use the positional $ operator if you do not know the position of the element in the array:

db.students.update(
   { _id: 1, grades: 80 },
   { $set: { "grades.$" : 82 } }
)
  • $addToSet

The $addToSet operator adds a value to an array unless the value is already present, in which case $addToSet does nothing to that array. This operator can work with the $each modifier.

db.test.update(
   { _id: 1 },
   { $addToSet: {letters: [ "c", "d" ] } }
)
  • $pop

The $pop operator removes the first or last element of an array. Pass $pop a value of -1 to remove the first element of an array and 1 to remove the last element in an array.

db.students.update( { _id: 1 }, { $pop: { scores: -1 } } )
  • $pullAll

The $pullAll operator removes all instances of the specified values from an existing array. Unlike the $pull operator that removes elements by specifying a query, $pullAll removes elements that match the listed values.

db.survey.update( { _id: 1 }, { $pullAll: { scores: [ 0, 5 ] } } )
  • $pull

The $pull operator removes from an existing array all instances of a value or values that match a specified condition.

db.stores.update(
    { },
    { $pull: { fruits: { $in: [ "apples", "oranges" ] }, vegetables: "carrots" } },
    { multi: true }
)
  • $pushAll

Deprecated since version 2.4: Use the $push operator with $each instead.

  • $push

The $push operator appends a specified value to an array. If the value is an array, $push appends the whole array as a single element. To add each element of the value separately, use the $each modifier with $push.

The following example appends 89 to the scores array:

db.students.update(
   { _id: 1 },
   { $push: { scores: 89 } }
)

These are the modifiers that you can use with push:

  • $each: Appends multiple values to the array field. The following example appends each element of [ 90, 92, 85 ] to the scores array for the document where the name field equals joe:
db.students.update(
   { name: "joe" },
   { $push: { scores: { $each: [ 90, 92, 85 ] } } }
)
  • $slice: Limits the number of array elements. Requires the use of the $each modifier.

  • $sort: Orders elements of the array. Requires the use of the $each modifier.

The following $push operation uses the $each modifier to add multiple documents to the quizzes array, the $sort modifier to sort all the elements of the modified quizzes array by the score field in descending order, and the $slice modifier to keep only the first three sorted elements of the quizzes array.

db.students.update(
   { _id: 5 },
   {
     $push: {
       quizzes: {
          $each: [ { wk: 5, score: 8 }, { wk: 6, score: 7 }, { wk: 7, score: 6 } ],
          $sort: { score: -1 },
          $slice: 3
       }
     }
   }
)
  • $position: Specifies the location in the array at which to insert the new elements. Requires the use of the $each modifier. Without the $position modifier, the $push appends the elements to the end of the array.

The following operation updates the scores field to add the elements 20 and 30 at the array index of 2:

db.students.update(
   { _id: 1 },
   {
     $push: {
        scores: {
           $each: [ 20, 30 ],
           $position: 2
        }
     }
   }
)
Bitwise
  • $bit

The $bit operator performs a bitwise update of a field. The operator supports bitwise and, bitwise or, and bitwise xor (i.e. exclusive or) operations.

db.switches.update(
   { _id: 1 },
   { $bit: { expdata: { and: NumberInt(10) } } }
)
db.switches.update(
   { _id: 2 },
   { $bit: { expdata: { or: NumberInt(5) } } }
)
db.switches.update(
   { _id: 3 },
   { $bit: { expdata: { xor: NumberInt(5) } } }
)
Isolation
  • $isolated

Prevents a write operation that affects multiple documents from yielding to other reads or writes once the first document is written. By using the $isolated option, you can ensure that no client sees the changes until the operation completes or errors out.

db.foo.update(
    { status : "A" , $isolated : 1 },
    { $inc : { count : 1 } },
    { multi: true }
)

In this example, if you don't use the $isolated operator, the multi-update operation will allow other operations to interleave with its update of the matched documents.

Delete documents

Delete operations do not drop indexes, even if deleting all documents from a collection. All write operations in MongoDB are atomic on the level of a single document. MongoDB provides the following methods for document elimination:

db.collection.remove()

The db.collection.remove() method can have one of two syntaxes. The remove() method can take a query document and an optional justOne boolean.

  db.collection.remove(
      <query>,
      <justOne>
  )

Or the method can take a query document and an optional remove options document:

  db.collection.remove(
    <query>,
	   {
	     justOne: <boolean>,
	     writeConcern: <document>
	   }
	 )

And here we have an example:

	db.products.remove(
	   { qty: { $gt: 20 } },
	   {justOne:true, writeConcern: {w:"majority", wtimeout:5000 } }
  )

db.collection.deleteOne()

This operation removes a single document from a collection. deleteOne deletes the first document that matches the filter, so you must use a field that is part of a unique index such as _id for precise deletions.

  db.collection.deleteOne(
	   <filter>,
	   {
	      writeConcern: <document>
	   }
 	)

And here we have examples:

db.orders.deleteOne( { "productCode" : "78452dfa25564l") } );
 	db.orders.deleteOne(
    { "_id" : ObjectId("563237a41a4d68582c2509da") },
       { w : "majority", wtimeout : 100 }
  );

db.collection.deleteMany()

This method removes all documents that match the filter from a collection.

	db.collection.deleteMany(
	   <filter>,
	   {
	      writeConcern: <document>
	   }
	)
db.orders.deleteMany( { "stock" : "Brent Crude Futures", "limit" : { $gt : 48.88 } } );

References


BEEVA | Technology and innovative solutions for companies