### MongoDB
### Intro
In contrast to relational databases MongoDB is a document-oriented that has the following properties:
- Works Faster (than relational databases)
- Scalability (each document can be easily extended)
- High Availability (data is stored on different nodes)
- Cross - Platform ( Windows, Linux, Mac ...)

The main concept is a **collection and a document.**

### Collection
Collection is a group of MongoDB documents (similar to a table in a relational database). However, a collection allows storing objects with different structure and properties.
<br>
<img src="img/mongo_structure.jpg" alt="drawing" width="500"/>

### Document
It's similar to a row in a table in a relational database. A document can be considered as storage of keys and values. A key can be considered as a field name (e.g. ```'is_ordered': true```). Each document has an ```_id``` that is generated automatically. However, it can be defined by a user. 

Document is represented by a BSON (binary JSON). This type allows working with data types much faster (e.g. searching or processing). The main disadvantage of BSON is file size. However, it's compensated by the speed. 

MongoDB provides two types of data models: — Embedded data model and Normalized data model. Based on the requirement, you can use either of the models while preparing your document.
<br>
<img src="img/document_model.png" alt="drawing" width="500"/>

### Replicas
Replication is data synchronization on different servers (machines). It helps to avoid failures and provide high availability. In MongoDb there are **replicaset servers**. In MongoDB exists two replication models:
- **Master-Slave** (can be applied when we have more than 11 slaves)
- **Replica Set** (in a single set can be only 12 members)

A replica set is a group of mongod instances that host the same data set. In a replica, **one node is primary node** that receives all write operations. All other instances, such as **secondaries**, apply operations from the primary so that they have the same data set. Replica set can have only one primary node.
<br>
<img src="img/replica-mech.png" alt="drawing" width="500"/>
<br>
At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected. After the recovery of failed node, it again join the replica set and works as a secondary node.

In both models there is only **one node** that is responsible for **writing operations**. The rest nodes only read these operations and apply on itself. The main idea of replication to provide a data reservation.

Replicas reservation and data back-up are different:
- **Data Back-Up** - is a snapshot at some time
- **Replica** - is always up-to-date

Replica Set is based on two mechanisms:
- **OpLog** - makes replication possible (i.e. is a journal where all changes are stored
- **Heartbeat** - monitors the condition of the nodes and activates a procedure of a failure processing

Each member of a replica set sends hearbeats to other members (every 2 seconds). Replicas allow scaling readability by changing read load among the nodes.

### Sharding
Sharding is the process of storing data records across multiple machines and it is MongoDB's approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations.

**Why To Use Sharding**
1. In replication, all writes go to master node
2. Latency sensitive queries still go to master
3. Single replica set has limitation of 12 nodes
4. Memory can't be large enough when active dataset is big
5. Local disk is not big enough
6. Vertical scaling is too expensive

More Info: https://www.tutorialspoint.com/mongodb/mongodb_sharding.htm

### GridFS
The main problem of any database is storing data with large size. For example, SQL provides a special type called BLOB. MongoDB allows storing different objects but the document size is limited to 16 Mb. A special technology called **GridFS** allows storing data more than 16 Mb.

It allows storing and retrieving large files such as images, audio files, video files, etc. It is kind of a **file system** to store files but its data is stored within MongoDB collections. GridFS divides a file into chunks and stores each chunk of data in a separate document, each of maximum size 256 Kb. For example, a mp3 file can be divided into 40 documents (40 chunks of data)

GridFS consists of two main collections:
- **Files** - stores file names and metadata (e.g. file size)
- **Chunks** - stores file segments. Each segment has a size of 256 Kb

### Cursors
Each result returned by ```find()``` is a **cursor.** A cursor allows making data processing on returning objects.

A cursor can be created using: ```var cursor = db.col_name.find()```

### Aggregation
Distinguish the following types aggregation:
- **Pipeline** ( preferable method)
- **Map Reduce**
- **Single Purpose**
<br>
<img src="img/aggreagation.png" alt="drawing" width="500"/>
<br>

**Pipeline**

A document goes through a multistage conveyor which then process the data into an aggregated result. There are filters and functions for grouping and sorting by a single or many fields or arrays aggregation. The main command:

- ```db.collection.aggregate(pipeline, options)``` - the main command for a conveyor aggregation

The following operators may be included into ```pipeline:```
- ```{$match: {key:value}}``` - provides filtration by using a certain condition

- ```{$group: {key: {$agg_funct: value}}}``` - provides grouping and applies $agg_funct

- ```{$sort: {key: 1 or -1}}``` - sorts the result

- ```{$project: {key: 1 or 0, key: 1 or 0 ...}}``` - selects those fields that we need

- ```{$out: {'collection_name'}}``` - saves an aggregation result into a new collection (must be the last)

- ```{$unwind: {$field_name}}``` - each array element becomes a new document

- ```{$limit: value}``` - limits the final output

- ```{$addfield: {key: value}}``` - adds a new filed in the result 

- ```{$count: 'name'}``` - counts the number of documents

- ```{$lookup: {from: 'collection_to_join', localField: name, foreignField: name, as: name}``` - merges fields from two collections (fields that are going to be merged must have indexes for fast processing)


- ```{$sortByCount : '$key'}``` - allows grouping, counting and then sorting in DESC order


The following operators may be included into ```options:```
- ```{allowDiskUse: true}``` - if not enough RAM, disk storage will be used

**Examples**
```
db.authors.aggregate(
  { $match: { spec: "prog" }},
  { $project: { lvl: 1 } },
  { $group: { _id: 'level', level: { $sum: '$lvl' } } },
  { $sort: { lvl: 1 }}
)
```
**Important**
- ```$group``` - has a memory limitation. No more than 100 Mb. It can be solved using either ```allowDiskUse``` or ```$project``` to filter out unwanted fields

**Map Reduce**

This algorithm was introduced by Google for BigData processing. The concept is simple. There are two main functions:
- ```map``` - maps fields according to conditions, drops and then groups them
- ```reduce``` - rolls up values of grouped documents (makes aggregation)
<br>
<img src="img/map-reduce.png" alt="drawing" width="700"/>
<br>
The main advantage of this algorithm is that we can make parallel computations allowing processing BigData much faster

**Single Purpose Aggregation**

It is a collection aggregation using a certain key (a field name)

**Examples**
- ```db.users.count()``` - counts all documents in a collection
- ```db.products.distinct('name')``` - returns only unique names in a collection

### MongoDB Schema
A schema in a database determines the structure of a db. For example, in a relational database, we have to define a number of tables, primary and foreign keys. Type of relationships and so on. However, in MongoDB, there are no rules. But it doesn't mean that we should create a collection of unstructured documents. Instead, we have to determine the required fields, a number of collections and their structure.

For instance, we can define a collection structure like that (it's called reference type)
<img src="img/ref_type_col.png" alt="drawing" width="500"/>
<br>
Or may be a **nested architecture** might be better
<img src="img/embedded_col_type.png" alt="drawing" width="500"/>
<br>

Is must be noticed that collections can be validated using validators (i.e. obligatory fields can be defined, document type...). It means that any new document will be checked and inserted only if it meets the requirements defined by a validator.

### Document Links
Collection can be linked using document fields. There are two ways:

**Manual**

Fields reference ids of other documents.

```
db.companies.insert({'_id': 'Apple', 'year': '1974'})
db.users.insert({name: "Tom", age: 28, company: "Apple"})

user = db.users.findOne()
db.companies.findOne({_id: user.company})
```

**DBRef**

DBRef makes automatic linking between documents. It has the following syntax
```
apple = ({'name': 'apple', 'year': 1976})
db.companies.save(apple)

steve = ({'name': 'Steve', 'age': 25, company: new DBRef('companies', apple._id)})
db.users.save(steve)
```

```company``` in the collection users will contain (reference) ids of a company. Thus we can later easily find employees in Apple company.


### Indexation
Index is a **special data type** that stores a part of data collection in a form that is convenient for searching. Indexes allows ordering data by a field for fast searching. Without indexes, MongoDB has to look for a value by searching through the entire collection. 

Be default, ```_id``` is an index for any document. However, indexes can be different:

**Single Index**

Created only for a single field. For single index, the order isn't important.
```db.users.createIndex({email:1})``` - creates a single index

**Compound Index**

Created for several fields. Order is important.
```db.users.createIndex({city: 1, email: -1})``` - creates a compound index (frist city is sorted then email)

**Multikey Index**

Used for array elements indexing. An array usually consists of several elements, which then will be indexed and referenced to a single field. That's why it's called multikey.

**Indexes Properties**

- ```db.users.createIndex({email: 1}, {unique: true})``` - creates a unique index. email can't be indexed again

**Sparse Indexes**

Be default all indexes aren't sparsed (i.e. every document has an index). For example, some product may not be assigned to any category and will have ```null```. In this case, the sparse index is the case.
- ```db.users.createIndex({email: 1}, {sparse: true})``` - creates a sparse index

**TTL Indexes**

Created for documents that must be dropped after some time (e.g. logs, session info, ...)
- ```db.users.createIndex({email: 1}, {expireAfterSeconds: 120})``` - each document ordered by email will be dropped after 120 seconds

Each index can be named
- ```db.users.createIndex(  { email: 1 },  { name: 'catIdx' })```

**Index Effectiveness**

indexes allow increase data reading. However, operations such as data inserting will be slower (each document must be inserted in a collection and changed in a data structure). As a rule of thumb, drop indexes that aren't used.
- ```db.collection.dropIndex("catIdx")```

To list all indexes
- ```db.collection.getIndexes()```
- ```db.collection.dropIndexes()```


### Quieries Execution Time  
- ```db.users.find({}).explain('allPlansExecution')``` - find out how much time is needed to execute a query
- ```db.setProfilingLevel(1, time)``` - shows all queries that exceed provided time

### NoSQL Types
<img src="img/sql_no_sql.png" alt="drawing" width="700"/>
<br>

**Important**
-  If a **collection doesn't exist**, MongoDB creates the collection when you first store data for that collection
- Collection is a JavaScript Object (BSON)
- An ```_id``` is not obligatory. If not provided, it is created automatically by Mongo
- If **access a field/variable from an array** or a field object (this field has {}), use 'field_name.variable'
- Quotes are important
- Every record in a Collection is called a **Document**
- Documents are **elastic** (i.e. documents can have different structure what doesn't lead to any conflicts in a DB)
- Queries are **case sensitive**
- Mongo files have ```.js``` extension
- By default documents are sorted and returned in the order in which they were added into a db
- Several field names with the same name can't exist
- Documents can be nested
- To access embedded objects, use **the dot notation**
- It is recommended to use ```skip()``` when skipping few elements (i.e. skipping more than 100 documents decreases performance)
- If fields are going to be embedded, use ```field_name: {embedded elements}``` (documents update)
- ```True``` can't be used, only ```true``` 
- Array elements and embedded objects can be accessed using the dot notation
- To use ```count()``` with ```limit()``` and ```skip()``` use ```count(true)```
- If an updating document doesn't have an updating field (using $set) it will be created $
- MongoDB allows creating up to 64 indexes for a collection
- Capped collection guarantees order of documents. Capped collections can be restricted (size and number of documents)
- Order of $sort and $limit in the pipeline is important
- If there are embedded arrays, then $unwind must be used twice
- When referencing a field in an aggregate expression, you typically precede the field name with a dollar sign and enclose it in quotes.
- The same data can be considered as atomic and not depending on a case (e.g. address) 
- if you rarely use your collection for read operations, it makes sense not to use indexes
- ```db.collection.insert({})``` - allows inserting a single and multiple documents

### Main Commands
- ```db``` - shows the current db name
- ```use db_name``` - switches to a db
- ```show collections``` - shows all collections
- ```db.createCollection('collection_name')``` - creates a collection
- ```db.stats()``` - returns the statistics about the current db
- ```db.collection.stats()``` - returns the statistics about a collection




### Data Inserting 
There are several options how the data can be inserted:
- ```db.collection.insertOne({record_1})``` - only inserts a single document 
- ```db.collection.insertMany([{record_1}, {record_2}, {record_n}])``` - allows inserting many documents
- ```db.collection.insert({record or records})``` - combination of insertOne and insertMany
- ```load(path)``` - imports the data from a file

### Data Updating
- ```db.collection.updateOne({data_filter}, {$set: {field_name: new_value}})``` - updates a document
- ```db.collection.updateMany({data_filter}, {$set: {field_name: new_value}})``` - upddates several documents 
- ```db.collection.update({data_filter}, {$set: {field_name: new_value}})``` - similar to updateOne
- ```db.collection.update({data_filter}, {$unset: {field_name: value}})``` - unsets/drops a field in a document
- ```db.collection.update({data_fileter}, {$rename:{old_field_name: new_field_name}})``` - renames a field in a document

**Important**
- If ```$set``` isn't provided, the **entire document will be replaced** by a new value
- Values can be incremented using ```{$inc: {age: 5}``` (e.g. ```$inc``` will increase age on 5)

### Documents Deletion
- ```db.collection.deleteOne({data_filter})``` - deletes one document that meets a condition
- ```db.collection.deleteMany({data_filter})```- deletes many documents that meet a condition
- ```db.collection.remove({data_filter})``` - removes all documents that meet the condition
- ```db.collection.remove({data_filter}, {justOne:true})``` - removes only one document from the match

### Documents Replacement
- ```db.collection.replaceOne({data_filter}, {new fileds})``` - replace documents that meet the condition with new ones


### Data Querying
- ```db.collection.find({})``` - returns all documents from a collection (preferable)
- ```db.getCollection('collection_name').find()``` - queries the data (not preferable)
- ```db.col_name.find({}).pretty()``` - returns all documents from a collection in a pretty format


### Filtering Documents
- ```db.collection.find({filed: value})``` - returns a document that meets the condition
- ```db.collection.find({'field_name.field': value})``` - condition for a field object
- ```db.collection.find({$or: [{cond_1}, {cond_2}, ... {cond_n}]})``` - using ```OR``` operator
- ```db.collection.find({field_name: {$gt: value})``` - using ```$gt```


**Rows Filtering**
To specify what fields/columns must be returned by a query use the following syntax:
- ```db.collection.find({}, {field_name: 1 or 0})``` - 1 includes a field, 0 excludes a field

### Sorting
- ```db.collection.find().sort({field_name: -1/+1})``` - returns a sorted result
### Count
- ```db.collection.find().count()``` - counts documents
### Limit
- ```db.collection.find().limit()``` - limits the result