# MongoDB Tutorial: N MongoDB Query Patterns Everybody Must Know

## Table of contents

1. What is MongoDB and why should you care?
2. Core concepts around MongoDB
3. MongoDB setup and connecting to data sources
4. Elementary queries - selecting
    - Selecting all documents in MongoDB
    - Selecting based on a condition in MongoDB (Equality, unequality, ``$gt, $gte, $lt, $lte` operators)
    - Selecting with logical conditional operators in MongoDB (``$and, $or, $in, $nin` operators)
    - Counting documents in MongoDB
    
5. Querying for null or missing fields in MongoDB
6. Querying arrays in MongoDB
7. Projections, aka restricting fields
8. Conclusion

## H2: What is MongoDB and why should you care?

You've probably heard or even worked with relational databases. The row-table format is the most popular and intuitive structures to store information. But unfortunately, you can't store all data that comes you way in rows and tables. In fact, there are so many problems in the real-world that require non-relational databases. So, are there alternatives? 

The answer is YES! There are four types of databases that don't have any rows or tables. They are called NoSQL databases as you can't use SQL to query them. They are:

- Key-value databases
- Document databases
- Column family databases
- Graph databases

This article focuses on document databases and how to work with them using a server called MongoDB. But before we jump in the technical details, let's look at the use-cases of document databases. 

## H2: When to document databases?

One of the main use-cases for choosing document databases is when you have data that doesn't neatly fit into a pre-defined schema like a table. There are many processes or applications in industries that store these types of data. Here are some examples:

- Web and mobile apps: User profiles, preferences, content and interactions
- Content management systems: Storing a wider range of media such as text, images, video, GIFs, etc.
- E-commerce platforms: Product catalogs, customer information, order history, inventory, etc.
- Gaming: storing player profiles, leaderboard rankings
- Logging and data collection: large volumes of logs, events and metrics for analysis and so on.

Take a moment to think about how data collected from these industries would fit into tables. For example, e-commerce platforms would have a hard time storing product catalogs into a pre-defined schema. Different products have different attributes or worse, different number of attributes. Do you need 10 columns to store 10 physical attributes of drones of 100 different brands or just 5-6 to store book information? 

Table-based databases can't help you in such scenarios. By using document databases such as MongoDB, you gain the following benefits:

- No upfront development cost to design a schema
- Documents (data) can vary over time (including the data types, the number of attributes, etc.)
- Document databases avoid joins which results in much faster querying
- Intuitive for developers as document databases are mostly large JSON files, which are basically, humongous dictionaries for Pythonistas.
- Document DBs scale horizontally, which means they don't need increasingly more compute resources as database grows. 

Now, let's take a look at the core concepts around document databases and MongoDB.

## H2: Core concepts around MongoDB

## H2: MongoDB setup: connecting to data sources

For Windows, follow this [link](https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-windows/).

For Unix-like systems:

```shell
$ sudo apt-get install -y mongodb
```

```shell
$ pip install pymongo
$ pip install requests
```

In [1]:
import json

from pymongo import MongoClient

# Establish connection to MongoDB
client = MongoClient("localhost", 27017)
# Create a database named "drones"
drones = client["drones"]
# Create a collection named "races"
races = drones["races"]

# Load dataset into MongoDB
with open("data/drone_races.json", "r") as file:
    data = json.load(file)
    races.insert_many(data)

## H2: Elementary queries

### H3: Counting documents in MongoDB

In [3]:
races.count_documents({})

3000

You can use any of the conditions mentioned below

### H3: Extracting one document in MongoDB

In [6]:
from pprint import pprint

pprint(races.find_one())

{'_id': ObjectId('659d31e9255ec0cf4bab529d'),
 'laps': 3,
 'league': 'F1 Drones',
 'location': {'city': 'Ford',
              'country': 'United Kingdom',
              'date': 'error: invalid date "2024-10-25"',
              'venue': 'Manhattan Seas'},
 'name': 'Honorable',
 'pilots': {'drone': 'DJI3-old',
            'finishing_position': 66,
            'name': 'Kariotta Cow',
            'qualification_time': 27.39,
            'team': 'Sky Crusaders',
            'telemetry': {'altitude': 34.3,
                          'battery_voltage': 12.1,
                          'speed': 68.3,
                          'timestamp': 'error: invalid date '
                                       '"2024-10-25T14:09:26Z"'}},
 'sponsors': ['Fat Shark', 'DJI', 'Etisalat'],
 'weather_conditions': 'snowy'}


### H3: Selecting all documents in MongoDB

In [7]:
from pprint import pprint

for race in races.find():
    pprint(race)
    break

{'_id': ObjectId('659d31e9255ec0cf4bab529d'),
 'laps': 3,
 'league': 'F1 Drones',
 'location': {'city': 'Ford',
              'country': 'United Kingdom',
              'date': 'error: invalid date "2024-10-25"',
              'venue': 'Manhattan Seas'},
 'name': 'Honorable',
 'pilots': {'drone': 'DJI3-old',
            'finishing_position': 66,
            'name': 'Kariotta Cow',
            'qualification_time': 27.39,
            'team': 'Sky Crusaders',
            'telemetry': {'altitude': 34.3,
                          'battery_voltage': 12.1,
                          'speed': 68.3,
                          'timestamp': 'error: invalid date '
                                       '"2024-10-25T14:09:26Z"'}},
 'sponsors': ['Fat Shark', 'DJI', 'Etisalat'],
 'weather_conditions': 'snowy'}


### H3: Selecting based on a condition in MongoDB (Equality, unequality, `$gt, $gte, $lt, $lte` operators)

Equality, WHERE field = value

In [8]:
criteria = {"sponsors": "Fat Shark"}

fat_shark_races = races.count_documents(criteria)
fat_shark_races

2056

Greater than and less than operators

In [9]:
criteria = {"pilots.qualification_time": {"$lt": 10}}

quick_races = races.count_documents(criteria)
quick_races

1016

Check out: `$gt, $gte, $lte` operators as well

### H3: Selecting with logical conditional operators in MongoDB (`$and, $or, $in, $nin` operators)

Retrieve races with either "rainy" or "snowy" weather conditions:

In [29]:
criteria = {"weather_conditions": {"$in": ["rainy", "snowy"]}}

races.count_documents(criteria)

1213

In [27]:
criteria = {
    "$or": [
        {"location.country": "United Kingdom"},
        {"sponsors": "Etisalat"},
    ]
}
races.count_documents(criteria)

2065

$and operator

In [26]:
criteria = {
    "$and": [
        {"location.country": "United Kingdom"},
        {"sponsors": "Etisalat"},
    ]
}
races.count_documents(criteria)

159

Easier and operator

In [25]:
criteria = {
    "location.country": "United Kingdom",
    "sponsors": "Etisalat",
}

races.count_documents(criteria)

159

Find races not held in the UK or United states

In [24]:
criteria = {
    "location.country": {"$nin": ["United States", "United Kingdom"]}
}

races.count_documents(criteria)

142

## H2: Querying for null or missing values in MongoDB

Check if field or value exists:

In [23]:
criteria = {"location.district": {"$exists": True}}

races.count_documents(criteria)

0

In [30]:
criteria = {"laps": {"$exists": True}}

races.count_documents(criteria)

3000

In [31]:
criteria = {"pilots.finishing_position": None}

races.count_documents(criteria)

0

Examine the races with only a single sponsor, which means only one element in the `sponsors` field:

In [None]:
criteria = {"sponsors.1": {"$exists": False}}

races.count_documents(criteria)

## H2: Projections, aka restricting fields

In [39]:
criteria = {"pilots.telemetry.speed": {"$gte": 20}}
projection = {
    "sponsors": 1,
    "location.country": 1,
    "pilots.telemetry.speed": 1,
    "pilots.name": 1,
}

fast_pilots = races.find(criteria, projection)

for pilot in fast_pilots:
    pprint(pilot)
    break

{'_id': ObjectId('659d31e9255ec0cf4bab529d'),
 'location': {'country': 'United Kingdom'},
 'pilots': {'name': 'Kariotta Cow', 'telemetry': {'speed': 68.3}},
 'sponsors': ['Fat Shark', 'DJI', 'Etisalat']}


In [40]:
criteria = {"pilots.telemetry.speed": {"$gte": 20}}
projection = {
    "sponsors": 1,
    "location.country": 1,
    "pilots.telemetry.speed": 1,
    "pilots.name": 1,
    "_id": 0,
}

fast_pilots = races.find(criteria, projection)

for pilot in fast_pilots:
    pprint(pilot)
    break

{'location': {'country': 'United Kingdom'},
 'pilots': {'name': 'Kariotta Cow', 'telemetry': {'speed': 68.3}},
 'sponsors': ['Fat Shark', 'DJI', 'Etisalat']}


In [44]:
projection = {"_id": 0, "league": 0, "pilots": 0}

# Empty criteria for this one
races.find_one({}, projection)

{'name': 'Honorable',
 'location': {'venue': 'Manhattan Seas',
  'city': 'Ford',
  'country': 'United Kingdom',
  'date': 'error: invalid date "2024-10-25"'},
 'sponsors': ['Fat Shark', 'DJI', 'Etisalat'],
 'laps': 3,
 'weather_conditions': 'snowy'}

## Conclusion