# Introduction
### NoSQL
Although relational databases are the most widely used tool for managing data today, this does not mean that they are the only solution available. An objective of relational databases is to allow the modeling of any use case through tables, which makes them very general. On the other hand, there are many more specific contexts where one does not need all that the relational model offers. Furthermore, in a highly specialized context, using a relational database may be less efficient precisely because of the generality of the relational model.
To address this problem, the NoSQL paradigm was born (from non-SQL, or Not only SQL)

|  | SQL | NoSQL |
| --- | --- | --- |
| Model | Relational | Non-Relational |
| Data | Structured Tables | Semi-Structured |
| Flexibility | Fixed or Predefined Schema | Dynamic Schema |
| Scale | Vertically by upgrading hardware | Horizontally by Data Partitioning |
| Language | SQL | Specific |
| Joins | Si | No |

### Distribution
Databases typically store data centrally, which compromises the scalability of the system to process high volumes of data. In this sense, Distributed Systems provide a powerful tool, although they cannot guarantee consistency, availability and partitioning at the same time (See [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem))

![image](https://overcoded.dev/static/0fac77a439e00e4a88142b031b872856/eea79/cap.png)

### Data Models
NoSQL models are various, and generally try to balance the complexity of operations they support with the size of data they can handle efficiently. Among the most popular are dictionary-based databases, column families databases, document databases, and graph databases.

![image](https://tech.ebayinc.com/assets/Uploads/Blog/2014/10/nosql_comparisons.png)

### MongoDB

[MongoDB](https://www.mongodb.com) is a source-available, cross-platform, NoSQL, Document database program. It uses JSON-like documents and stores in the form of [BSON](https://en.wikipedia.org/wiki/BSON).

Depending on the programming language used, there are multiple drivers available. In the case of Python (and this tutorial), we'll be focusing on [pymongo](https://www.mongodb.com/docs/drivers/pymongo), which is the official driver for synchronous applications and can be installed through `pip`. Check the full list of official drivers [here](https://www.mongodb.com/docs/drivers).

**Warning:** During this tutorial we'll be connecting to a MongoDB instance with version 6.0 (the latest at time of writing) and using pymongo 4.3.3. As MongoDB is usually updated, there might be some features here that won't work with older versions or that might have somewhat different behaviour depending on the version of the MonogDB instance we'll be connecting to. During this tutorial we'll try and point out the most egregious incompatibilities, but it is not within the scope to handle all of them, so be mindful of the error messages you might receive during regular use. 

## Getting started

If the requirements for this tutorial have been properly installed, you should already have `pymongo` version 4.3.3 installed in your environment. To check this, run the following (it should not raise any excpetions):

In [None]:
import pymongo

The tutorial has been set up to use an instance of MongoDB running inside a container. To have the instance up and running, use the following command from the root directory of the repository:

In [None]:
! docker compose up -d  # Depending on your version of docker compose, you might need to use the below command instead
# ! docker-compose up -d

The container will have an instance of MongoDB running that can be accessed through `localhost` at port 27017 (the typical default for MongoDB). For more details check the file `docker-compose.yml`. 

**Note** Any data written to the database will be removed once the container is stopped. This is not an issue for this tutorial, since we'll be creating the data to be queried every time. For the data in the container to be persistence, a volume should be explicitly declared in the docker compose file. 

### Connecting to a MongoDB instance

In order to connect to this particular instance we'll run the following:

In [None]:
from pymongo import MongoClient

# Credentials in this case are set in the compose file
client = MongoClient(host='localhost', port=27017, username='mongo', password='mongo')

# Check if the connection is established
client.admin.command('ping')

Alternatively, all the information can be put as a string, using a MongoDB URI:

In [None]:
client = MongoClient('mongodb://mongo:mongo@localhost:27017')
client.admin.command('ping')

### Accessing specific databases

Within any instance of MongoDB there can be multiple independent databases. Let's begin by listing the existing databases:

In [None]:
client.list_database_names()

The ones listed above are the default databases. In order to access the databases, the attribute (dot) notation can be used:

In [None]:
client.admin

As you can see, the `ping` commands used above were actually performed over the `admin` database.

Alternatively, a dictionary-like access can also be used to access a specific database. This is particularly useful for databases with names that are not valid Python variable names, since those cannot use the dot notation.

In [None]:
client['admin']

### Accessing specific collections

Within each database, we can have multiple collections. A collection in MongoDB is roughly equivalent to a table in a relational database.

To list all collections inside a database:

In [None]:
client.admin.list_collection_names()

As with databases, either the attribute notation or the dictionary-like access can be used to select a specific collection (in this case we use only the dictionary-like access, given that the collection names are not compatible with the attribute notation):

In [None]:
client.admin['system.users']

## Documents in MongoDB

Inside each collection, each entry is called a document. In MongoDB all documents are JSON-style, which in Python gets translated as dictionaries.

To access an individual document, we must do so from the collection (we'll later see more in depth the specific method):

In [None]:
collection = client.admin['system.users']
collection.find_one()

As can be seen above, the document itself can have nested fields, lists, etc., as long as it follows the key-value pair structure typical of JSON files.

### The `_id` field

The `_id` field is special in MongoDB and is always present in all documents.

When inserting a document it can either be explicitly set, or it can be left out for Mongo to set it. This field always acts as an index for the collection and *must* be unique within the collection. 

## Creating databases and collections

Let's once more list the databases present:

In [None]:
client.list_database_names()

If we wanted to access a new database we can do so in the usual fashion:  

In [None]:
db = client['library']
db

And the same for collections within the database:

In [None]:
collection = db['books']
collection

It is important to note that in MongoDB collections and databases are created lazily. The ones above haven't been created yet and will only be available once a document (at least) has been inserted in the collection. This is, of course, assuming you have write access to the MongoDB instance. Let's check the database is not present yet:

In [None]:
client.list_database_names()

By default, there is no restriction in the keys that need to be present in any document inside the collection. This makes MongoDB highly flexible, but at the same time prone to user error if care is not taken (e.g., misspelled keys, etc.). 

We'll now insert a document into our collection:

In [None]:
collection.insert_one({"title": "Hamlet", "author": "William Shakespear"})
collection.find_one()

Since we didn't explicitly provided the `_id` field, it was created for us. 

Now let's check again the existing databases:

In [None]:
client.list_database_names()

As a final excercise for this section, we'll insert a new document, this time with an explicit `_id`. The details on the methods used below will be explained in the next section:

In [None]:
collection.insert_one({"title": "Moby Dick", "_id": "sample_id"})
collection.find_one({"_id": "sample_id"})

## Summary

* Connections can be made using `MongoClient`. 
* Important concepts:
  * Database: Can contain a number of collections. There can be multiple *independent* databases within a single MongoDB instance
  * Collection: Roughly equivalent to SQL tables. There can be more than one collection in a database, these store documents
  * Document: An entry in a collection. They have a JSON style and are represented as dictionaries in Python
* Database and collection can be created lazily (i.e., they will actually created only when a document has been inserted)