# NoSQL: Basic queries on non-relational databases

## NoSQL: Review

A NoSQL database is designed to handle data that are not structured in tabular relations. The data can be stored in key-value pairs (like Python dictionaries), documents (JSON), or graphs.

<img src="../assets/nosql_vs_sql.jpeg"  width = 500 px></img>

Unlike relational databases (SQL), NoSQL does not (necessarily) employ tables and rows. Also they are much more flexible since the structure of the entries is not predefined. Therefore you can add data without defining anything (even not the table itself).

[This article](https://www.integrate.io/blog/the-sql-vs-nosql-difference/) gives a recap of the biggest differences between SQL and NoSQL database systems:
|SQL|NoSQL|
|---|---|
|SQL databases are relational| NoSQL databases are non-relational. In NoSQL you cannot make relationships between tables (`JOIN`)|
|SQL databases are table-based |NoSQL databases are document, key-value, graph, or wide-column stores.|
|SQL databases use structured query language and have a predefined schema.| NoSQL databases have dynamic schemas for unstructured data.|
|SQL databases are vertically scalable (by adding processing power)|NoSQL databases are horizontally scalable (by adding servers/machines).|
|SQL databases are better for multi-row transactions|NoSQL is better for unstructured data like documents or JSON.|

In the image, below you will find examples of popular SQL and NoSQL databases. 

<img src="../assets/popular_examples_nosql_sql.jpeg" width =500px></img>

In the next section, we'll cover an example using MongoDB. You will see that we can translate many of the basic SQL queries you know.

## MongoDB

<img src="https://upload.wikimedia.org/wikipedia/fr/thumb/4/45/MongoDB-Logo.svg/527px-MongoDB-Logo.svg.png" />

A lot of companies provide NoSQL architectures. One of the most popular is [MongoDB](https://www.mongodb.com/). A MongoDB database contains collections (tables) of documents (entries). The documents are stored in the JSON format which is very convenient to handle with Python!

The syntax of the queries is based on JavaScript. They basically look like python dictionaries.

Before diving in some exercises, you can have a look to [this quick intro](https://www.mongodb.com/docs/manual/tutorial/query-documents/) about MongoDB queries.

### Creating the database

We have created and filled a MongoDB database for you. You probably know already the database. It is the list of country leaders that you have used already in the Wikipedia project.

You can set it up by deploying the Docker image we have pre-built:

In [1]:
# You can run it from this notebook with:
!docker-compose up -d

# Or in your terminal with:
# docker-compose up -d


Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.12/site-packages/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/lib64/python3.12/http/client.py", line 1336, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.12/http/client.py", line 1382, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.12/http/client.py", line 1331, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.12/http/client.py", line 1091, in _se

### Conneting to the database

For using MongoDB through Python you will need to install the `pymongo` library.

In [2]:
from pymongo import MongoClient

# Creation of a MongoDB Client (by giving the host and the port)
client = MongoClient(host="localhost", port=27017)

# Instantiation of the database
db = client["becode"]

# Let's see which collections are in the database
db.list_collection_names()

['leaders']

### Basic Queries

**1. Show the first leader of the `leaders` collection by using the `find_one` method**

The corresponding `SQL` query would be:

```sql
SELECT * FROM leaders LIMIT 1;
```

In [3]:
db["leaders"].find_one()

{'_id': ObjectId('66bf5bbb624e2272be3418c4'),
 'id': 'Q7747',
 'first_name': 'Vladimir',
 'last_name': 'Putin',
 'birth_date': '1952-10-07',
 'death_date': None,
 'place_of_birth': 'Saint Petersburg',
 'wikipedia_url': 'https://ru.wikipedia.org/wiki/%D0%9F%D1%83%D1%82%D0%B8%D0%BD,_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B8%D1%87',
 'start_mandate': '2000-05-07',
 'end_mandate': '2008-05-07',
 'country': 'ru'}

**2. Show the first leader of the collection whose country is Belgium**

To that purpose we will use the query as first parameter of the `find_one` function. The query is formatted as a dictionary.

In SQL it would be like adding a `WHERE` condition to the query:

```sql
SELECT * FROM leaders WHERE country = 'be' LIMIT 1;
```

In [30]:
db["leaders"].find_one({"country": "be"})

{'_id': ObjectId('66bf5bbb624e2272be3418ef'),
 'id': 'Q12978',
 'first_name': 'Guy',
 'last_name': 'Verhofstadt',
 'birth_date': '1953-04-11',
 'death_date': None,
 'place_of_birth': 'Dendermonde',
 'wikipedia_url': 'https://nl.wikipedia.org/wiki/Guy_Verhofstadt',
 'start_mandate': '1999-07-12',
 'end_mandate': '2008-03-20',
 'country': 'be'}

**3. Select some fields to display**

Let's use the same query by displaying only the `first_name` and the `last_name` of the leader. It corresponds to a `SELECT` in SQL.

We will use the [project](https://www.mongodb.com/docs/manual/tutorial/project-fields-from-query-results/) as the second parameter of the function. It is also formatted as a dictionary whose the key contains the targeted field and the value is `1` if we want the field to be displayed

The corresponding SQL query would be:

```sql
SELECT first_name, last_name FROM leaders WHERE country = 'be' LIMIT 1;
```

In [5]:
db["leaders"].find_one({"country": "be"}, {"first_name": 1, "last_name": 1})

{'_id': ObjectId('66bf5bbb624e2272be3418ef'),
 'first_name': 'Guy',
 'last_name': 'Verhofstadt'}

We can also decide to not display a field. In that case we put `0` as value for the dictionary.

In [6]:
db["leaders"].find_one({"country": "be"}, {"wikipedia_url": 0, "id": 0})

{'_id': ObjectId('66bf5bbb624e2272be3418ef'),
 'first_name': 'Guy',
 'last_name': 'Verhofstadt',
 'birth_date': '1953-04-11',
 'death_date': None,
 'place_of_birth': 'Dendermonde',
 'start_mandate': '1999-07-12',
 'end_mandate': '2008-03-20',
 'country': 'be'}

**4. Find the distinct countries**

The SQL equivalent:

```sql
SELECT DISTINCT country FROM leaders;
```

In [7]:
db["leaders"].find().distinct("country")

['be', 'fr', 'ma', 'ru', 'us']

**5. Find all the leaders who are still alive**

We can assume that they have no `death_date`, isn't?

In [8]:
db["leaders"].find({"death_date": None}, {"last_name":1, "first_name":1, "country":1})

<pymongo.cursor.Cursor at 0x7f9a2eb64980>

As you see the `find` command returns a cursor. No worries we can process it by using a simple Python loop!

In [9]:
for leader in db["leaders"].find({"death_date": None}, {"last_name":1, "first_name":1, "country":1}):
    print(f"{leader['first_name']} {leader['last_name']} ({leader['country']})")

Vladimir Putin (ru)
George Bush (us)
Donald Trump (us)
Jimmy Carter (us)
Dmitry Medvedev (ru)
Mohammed None (ma)
Bill Clinton (us)
Joe Biden (us)
Guy Verhofstadt (be)
Yves Leterme (be)
Herman Van Rompaey (be)
Mohammed None (ma)
Elio Di Rupo (be)
Mohammed None (ma)
Mark Eyskens (be)
Alexander De Croo (be)
Sophie Wilmès (be)
Barack Obama (us)
François Hollande (fr)
Nicolas Sarkozy (fr)
Emmanuel Macron (fr)
Charles Michel (be)
Mohammed None (ma)
Donald Trump (us)
Jimmy Carter (us)
Barack Obama (us)
Vladimir Putin (ru)
Mohammed None (ma)
George Bush (us)
Mohammed None (ma)
Dmitry Medvedev (ru)
Guy Verhofstadt (be)
Yves Leterme (be)
Herman Van Rompaey (be)
Bill Clinton (us)
Mark Eyskens (be)
Elio Di Rupo (be)
Charles Michel (be)
François Hollande (fr)
Sophie Wilmès (be)
Nicolas Sarkozy (fr)
Joe Biden (us)
Alexander De Croo (be)
Emmanuel Macron (fr)
Mohammed None (ma)
Mohammed None (ma)
Mohammed None (ma)
Dmitry Medvedev (ru)
Barack Obama (us)
George Bush (us)
Bill Clinton (us)
Joe Biden (us

**6. Let's now insert the leader of tomorrow: you?**

As you know, MongoDB is flexible. It means that you can add entries although some fields are missing. Let's give a try:

In [10]:
you = {
    'first_name': 'ADD HERE YOUR FIRST NAME',
    'last_name': 'ADD HERE YOUR LAST NAME',
    'birth_date': 'ADD HERE YOUR BIRTH DATE',
    'country': 'ADD HERE YOUR COUNTRY CODE'
}
db["leaders"].insert_one(you)

InsertOneResult(ObjectId('66bf6aaa04f60ace1419f1dd'), acknowledged=True)

Let's have a look to your data!

In [11]:
db["leaders"].find_one({"first_name": "ADD HERE YOUR FIRST NAME"})

{'_id': ObjectId('66bf6aaa04f60ace1419f1dd'),
 'first_name': 'ADD HERE YOUR FIRST NAME',
 'last_name': 'ADD HERE YOUR LAST NAME',
 'birth_date': 'ADD HERE YOUR BIRTH DATE',
 'country': 'ADD HERE YOUR COUNTRY CODE'}

We can observe two things here:
- A field `_id` has been automatically added by Mongo. This field is incremental. That means that it will always be higher than the previous element of the collection.
- Some fields are missing (the `place_of_birth` for instance). This is a property of NoSQL. All fields are not mandatory!

We can check missing values by using the query `{"$exists":False}`:

In [12]:
for leader in db["leaders"].find({"place_of_birth":{"$exists":False}}):
    print(leader)

{'_id': ObjectId('66bf6aaa04f60ace1419f1dd'), 'first_name': 'ADD HERE YOUR FIRST NAME', 'last_name': 'ADD HERE YOUR LAST NAME', 'birth_date': 'ADD HERE YOUR BIRTH DATE', 'country': 'ADD HERE YOUR COUNTRY CODE'}


### Update data

Since your place of birth is missing in the data, let's add it now. The update function has two main arguments:
- a query that will select the entries to update
- an update operation. As always, it is formatted as a dictionary

In [13]:
db["leaders"].update_one({"first_name": "ADD HERE YOUR FIRST NAME"}, {"$set": {"place_of_birth": "ADD HERE YOUR PLACE OF BIRTH"}})

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

#### Remove data

Your dream is over ;-) Since you will not be a leader of tomorrow, we will remove you from the collection. The `delete` (or `delete_one`) function has one main argument: the query that will select the entries to be removed.

In [14]:
db["leaders"].delete_one({"first_name": "ADD HERE YOUR FIRST NAME"})

DeleteResult({'n': 1, 'ok': 1.0}, acknowledged=True)

## Your Turn!

Based on your knowledge and some Google search try to create the following queries:

- Remove the leaders who have an empty or null (`None`) last name
- Display all unique first names from the collection
- Transform all the dates of the dataset by a datetime object (they are currently strings which is not a good practice). You can use a python script that interacts with the DB instead of doing everything in a single query
- Display the 10 older leaders ordered by their birth date (search for how to sort and to use limits in MongoDB)
- Create a Python script that computes the numbers of leaders by country
- Do the same by using a MongoDB [aggregation pipeline](https://www.mongodb.com/docs/manual/aggregation/)

In [15]:
# Your code here (feel free to add some extra code blocks!)
db.leaders.delete_many({
    'last_name': { '$eq': 'None' }
    })


DeleteResult({'n': 45, 'ok': 1.0}, acknowledged=True)

In [44]:
db.leaders.distinct('first_name')

['Abraham',
 'Achille',
 'Adolphe',
 'Alain',
 'Albert',
 'Alexander',
 'Alexandre',
 'Aloïs',
 'Andrew',
 'Auguste',
 'Barack',
 'Benjamin',
 'Bill',
 'Boris',
 'Camille',
 'Charles',
 'Chester',
 'Clément',
 'Dmitry',
 'Donald',
 'Dwight',
 'Elio',
 'Emmanuel',
 'Franklin',
 'Frans',
 'François',
 'Félix',
 'Gaston',
 'George',
 'Georges',
 'Gerald',
 'Guy',
 'Gérard',
 'Harry',
 'Henri',
 'Henry',
 'Herbert',
 'Herman',
 'Hubert',
 'Jacques',
 'James',
 'Jean',
 'Jean-Baptiste',
 'Jean-Luc',
 'Jimmy',
 'Joe',
 'John',
 'Joseph',
 'Jules',
 'Leo',
 'Louis',
 'Lyndon',
 'Léon',
 'Marie',
 'Mark',
 'Martin',
 'Millard',
 'Napoléon',
 'Nicolas',
 'Patrice',
 'Paul',
 'Paul-Henri',
 'Pierre',
 'Pieter',
 'Prosper',
 'Raymond',
 'René',
 'Richard',
 'Ronald',
 'Rutherford',
 'Sophie',
 'Stephen',
 'Sylvain',
 'Theodore',
 'Thomas',
 'Théodore',
 'Ulysses',
 'Valéry',
 'Vincent',
 'Vladimir',
 'Walthère',
 'Warren',
 'Wilfried',
 'William',
 'Woodrow',
 'Yves',
 'Zachary',
 'Émile',
 'Étie

In [48]:
db.leaders.update_many(
    {}, 
    [{'$set': {'date_field': {'$dateFromString': {'dateString': '$date_field'}}}}]
)

UpdateResult({'n': 366, 'nModified': 366, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [49]:
for leader in db.leaders.find():
    print(leader)

{'_id': ObjectId('66bf5bbb624e2272be3418c4'), 'id': 'Q7747', 'first_name': 'Vladimir', 'last_name': 'Putin', 'birth_date': '1952-10-07', 'death_date': None, 'place_of_birth': 'Saint Petersburg', 'wikipedia_url': 'https://ru.wikipedia.org/wiki/%D0%9F%D1%83%D1%82%D0%B8%D0%BD,_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80_%D0%92%D0%BB%D0%B0%D0%B4%D0%B8%D0%BC%D0%B8%D1%80%D0%BE%D0%B2%D0%B8%D1%87', 'start_mandate': '2000-05-07', 'end_mandate': '2008-05-07', 'country': 'ru', 'date_field': None}
{'_id': ObjectId('66bf5bbb624e2272be3418c5'), 'id': 'Q11812', 'first_name': 'Thomas', 'last_name': 'Jefferson', 'birth_date': None, 'death_date': '1826-07-04', 'place_of_birth': 'Shadwell', 'wikipedia_url': 'https://en.wikipedia.org/wiki/Thomas_Jefferson', 'start_mandate': '1801-03-04', 'end_mandate': '1809-03-04', 'country': 'us', 'date_field': None}
{'_id': ObjectId('66bf5bbb624e2272be3418c6'), 'id': 'Q207', 'first_name': 'George', 'last_name': 'Bush', 'birth_date': '1946-07-06', 'death_date': Non

## Resources:
* [NoSQL Concepts (DataCamp)](https://www.datacamp.com/courses/nosql-concepts)
* [Introduction to MongoDB using Python (DataCamp)](https://www.datacamp.com/courses/introduction-to-using-mongodb-for-data-science-with-python)
* [Getting started with MongoDB](https://docs.mongodb.com/manual/tutorial/)
* [Python MongoDB Tutorial](https://www.mongodb.com/blog/post/getting-started-with-python-and-mongodb)
* [Introduction to MongoDB Learning Path](https://learn.mongodb.com/learning-paths/introduction-to-mongodb)
* [Build an App With Python, Flask, and MongoDB to Track UFOs](https://www.mongodb.com/developer/languages/python/flask-app-ufo-tracking/)