## MongoDB 1

In [None]:
# import statements
import os
from pymongo import MongoClient
import bson
from datetime import datetime

### Connection establishment

In [None]:
client = MongoClient('mongodb://localhost:27017/')

In [None]:
client.server_info()

### MongoDB sample datasets

- Source: https://www.mongodb.com/docs/atlas/sample-data/sample-training/

### Lazy database creation

- creation of a reference to the database
- actual creation doesn't happen until you perform a write operation like inserting a document
- the database does not physically exist on the server until then

In [None]:
db = client.sample_training

In [None]:
# directory where the JSON files are stored
json_dir = 'sample_training'
json_files = [f for f in os.listdir(json_dir) if f.endswith(".json")]
collections = [f.replace(".json", "") for f in json_files]
collections

In [None]:
for idx, json_file in enumerate(json_files):
    with open(os.path.join(json_dir, json_file), 'r') as f:
        for line in f:
            data = bson.json_util.loads(line.strip())
            db[collections[idx]].insert_one(data)
        
        print(f"Loaded {json_file} into the '{collections[idx]}' collection.")

### Verify collection names

In [None]:
db.list_collection_names()

### MongoDB API: Querying documents

#### Select all documents in a collection `db.collection.find(query, projection, options)`

- retrieves all documents from a collection
- equivalent to `SELECT * FROM <TABLE>` SQL query
- creates a cursor for a query that can be used to iterate over results from MongoDB
- `query`:
    - selection filter
    - `{ <field1>: <value>, <field2>: {conditions} ... }`
- `projection`:
    - determines which fields are returned in the matching documents
    - `{ <field1>: <value>, <field2>: <value> ... }`
- documentation: https://www.mongodb.com/docs/manual/reference/method/db.collection.find/ 

Let's explore `trips` collection.

In [None]:
cursor = db.trips.find({})
cursor

In [None]:
cursor = db.trips.find({})
trips = list(cursor)
trips[:3]

Let's explore `inspections` collection.

In [None]:
cursor = db.inspections.find({})
trips = list(cursor)
trips[:3]

#### Q1: Find all trips taken by passengers born in 1988.

- equivalent to `SELECT * FROM <TABLE> WHERE <SOME COLUMN> = <SOME VALUE>` SQL query

In [None]:
trips = db.trips.find({
    'birth year': 1988
})
trips = list(trips)
trips[:3]

#### Q2: Find all inspection sectors.

- equivalent to `SELECT <SPECIFIC COLUMN> FROM <TABLE>` SQL query

In [None]:
cursor = db.inspections.find({}, {"sector": 1})
inspections = list(cursor)
inspections[:5]

What if you don't want your output to be cluttered with "_id" field values?

In [None]:
cursor = db.inspections.find({}, {
    "_id": 0, 
    "sector": 1
})
inspections = list(cursor)
inspections[:10]

#### Q3: Find all inspections that occurred in "Home Improvement Contractor - 100" and "Home Improvement Salesperson - 101" sectors.

- equivalent to `SELECT * FROM <TABLE NAME> WHERE <SOME COLUMN> in (<VALUE1>, <VALUE2>)`

In [None]:
cursor = db.inspections.find({
    "sector": {
        "$in": ["Home Improvement Contractor - 100", 
                "Home Improvement Salesperson - 101"]
    }
})
home_inspections = list(cursor)
home_inspections[:3]

#### Q4: Find all trips that have duration between 200 and 4000 taken by gender 1.

- equivalent to:
```
    SELECT * FROM <TABLE NAME>
    WHERE <SOME COLUMN1> = <SOME VALUE> AND
        <SOME COLUMN 2> >= <SOME VALUE1> AND <SOME COLUMN2> <= <SOME VALUE 2>
```

In [None]:
cursor = db.trips.find({
    "tripduration": {"$gte": 200, "$lte": 4000}, 
    "gender": 1
})
trips = list(cursor)
trips[:5]

#### Q5: Find all inspections that either occurred in Manhattan or Brooklyn.

- equivalent to:
```
    SELECT * FROM <TABLE NAME>
    WHERE <SOME COLUMN1> = <SOME VALUE> OR
        <SOME COLUMN 2> >= <SOME VALUE1> AND <SOME COLUMN2> <= <SOME VALUE 2>
```

In [None]:
cursor = db.inspections.find({
    "$or": [
        { "address.city": "MANHTTAN" },
        { "address.city": "BROOKLYN" }
    ]
})

ny_la_inspections = list(cursor)
ny_la_inspections

### Mongodb comparison operators

- `$eq`: Matches values that are equal to a specified value.
- `$gt`: Matches values that are greater than a specified value.
- `$gte`: Matches values that are greater than or equal to a specified value.
- `$in`: Matches any of the values specified in an array.
- `$lt`: Matches values that are less than a specified value.
- `$lte`: Matches values that are less than or equal to a specified value.
- `$ne`: Matches all values that are not equal to a specified value.
- `$nin`: Matches none of the values specified in an array.

Documentation: https://www.mongodb.com/docs/manual/reference/operator/query-comparison/

### `limit()` method

- specify the maximum number of documents the cursor will return
- documentation: https://www.mongodb.com/docs/manual/reference/method/cursor.limit/#mongodb-method-cursor.limit

#### Q6: Find the first five trips.

- equivalent to: `SELECT * FROM <TABLE NAME> LIMIT <N>`

In [None]:
five_trips = list(db.trips.find().limit(5))
five_trips

### Sorting using `sort` method

### `sort()` method

- Specify in the sort parameter the field or fields to sort by and a value of 1 or -1 to specify an ascending or descending sort respectively.
- documentation: https://www.mongodb.com/docs/manual/reference/method/cursor.sort/#mongodb-method-cursor.sort

### `$regex`
- documentation: https://www.mongodb.com/docs/manual/reference/operator/query/regex/

#### Q7: Find all inspections that occurred in 2015 and sort them by ascending order of `id`.

- equivalent to: `SELECT * FROM <TABLE NAME> WHERE <SOME COL> LIKE <SOME SEARCH TERM> ORDER BY <SOME COL> ASC`

In [None]:
inspections_2015 = db.inspections.find({
    "date": {"$regex": "2015"}
}).sort("id", 1)  # 1 for ascending order
list(inspections_2015)

Sort the same using descending order.

In [None]:
inspections_2015 = db.inspections.find({
    "date": {"$regex": "2015"}
}).sort("id", -1)  # 1 for ascending order
list(inspections_2015)

#### Q8: Find all inspections on all incorporated businesses.

In [None]:
inc_inspections = db.inspections.find({
    "business_name": {"$regex": "INC$|INC\\.$"}
})
list(inc_inspections)

### `findOne(query, projection, options)`

- Fetches the first document that matches the query
- documentation: https://www.mongodb.com/docs/manual/reference/method/db.collection.findOne/
- **IMPORTANT**: In Python API, you must replace camelcase with `_`. That is, method name is `find_one`.

#### Q9: Find the first trip.

In [None]:
trip = db.trips.find_one()
trip

### MongoDB shell `mongosh`

```
docker exec -it <container name> mongosh
show dbs
use sample_training
show collections
db.trips.find().limit(5).pretty()
```

### `db.collection.countDocuments(query, options)`

- Returns an integer for the number of documents that match the query of the collection or view.
- documentation: https://www.mongodb.com/docs/manual/reference/method/db.collection.countDocuments/

#### Q10: How many trips are in the trips collection?

In [None]:
db.trips.count_documents({})

#### Q11: How many trips were taken by people born after the year 1988?

In [None]:
db.trips.count_documents({ "birth year": { "$gt": 1988 } })