# MongoDB Filtering Operations

This notebook demonstrates various filtering techniques in MongoDB, from basic to complex queries.

In [1]:
import sys
!{sys.executable} -m pip install pandas pymongo --quiet

import pandas as pd
from pymongo import MongoClient
from datetime import datetime
import json
import time
import warnings
warnings.filterwarnings('ignore')

def print_mongo(obj):
    """Pretty print MongoDB output"""
    print(json.dumps(obj, indent=2, default=str))

def get_mongo_client(max_retries=5, retry_delay=5):
    """Connect to MongoDB with retry logic"""
    for attempt in range(max_retries):
        try:
            client = MongoClient('mongodb://admin:admin@router1:27017/businessdb?authSource=admin')
            client.admin.command('ping')
            print("Successfully connected to MongoDB")
            return client
        except Exception as e:
            print(f"Connection attempt {attempt + 1} failed: {e}")
            if attempt < max_retries - 1:
                print(f"Retrying in {retry_delay} seconds...")
                time.sleep(retry_delay)
            else:
                raise

client = get_mongo_client()
db = client.businessdb
org_collection = db["organizations"]

Successfully connected to MongoDB


## Simple Filters

### 1. Filtering by Country
This query demonstrates how to filter organizations based on a simple equality condition. It retrieves all organizations located in the USA.

In [2]:
usa_orgs = org_collection.find({"country": "Cote d'Ivoire"})
print("Organizations in Cote d'Ivoire:")
for org in usa_orgs:
    print_mongo(org)


Organizations in Cote d'Ivoire:
{
  "_id": "678ae70038e354c77751265b",
  "organizationId": "f8D4B99e11fAF5D",
  "name": "Odom Ltd",
  "website": "https://www.humphrey-hess.com/",
  "country": "Cote d'Ivoire",
  "description": "Advanced static process improvement",
  "founded": 2012,
  "industry": "Management Consulting",
  "numberOfEmployees": 1825
}
{
  "_id": "678ae70038e354c77751265a",
  "organizationId": "EF5B55FadccB8Fe",
  "name": "Charles-Phillips",
  "website": "https://bowman.com/",
  "country": "Cote d'Ivoire",
  "description": "Monitored client-server implementation",
  "founded": 2012,
  "industry": "Mental Health Care",
  "numberOfEmployees": 3450
}


### 2. Filtering by Industry
This query shows how to filter organizations by their industry type, specifically looking for software companies.

In [3]:
# Filter by industry
software_orgs = org_collection.find({"industry": "Human Resources / HR"})
print("\nOrganizations in Human Resources / HR:")
for org in software_orgs:
    print_mongo(org)


Organizations in Human Resources / HR:
{
  "_id": "678ae70038e354c777512649",
  "organizationId": "4D4d7E18321eaeC",
  "name": "Pineda-Cox",
  "website": "http://aguilar.org/",
  "country": "Bolivia",
  "description": "Fundamental asynchronous capability",
  "founded": 2010,
  "industry": "Human Resources / HR",
  "numberOfEmployees": 1312
}
{
  "_id": "678ae70038e354c777512661",
  "organizationId": "aAeb29ad43886C6",
  "name": "Potter-Walsh",
  "website": "http://thomas-french.org/",
  "country": "Turkey",
  "description": "Optional non-volatile open system",
  "founded": 2008,
  "industry": "Human Resources / HR",
  "numberOfEmployees": 6923
}


## Complex Filters

### 1. Multiple Field Criteria
This query combines multiple conditions using AND logic. It finds software companies in the USA that have more than 100 employees.

In [4]:
complex_filter = org_collection.find({
    "country": "China",
    "industry": "Public Safety",
    "numberOfEmployees": {"$gt": 5200}
})
print("\Public Safety organizations in China with more than 100 employees:")
for org in complex_filter:
    print_mongo(org)


\Public Safety organizations in China with more than 100 employees:
{
  "_id": "678ae70038e354c777512628",
  "organizationId": "0bFED1ADAE4bcC1",
  "name": "Hester Ltd",
  "website": "http://sullivan-reed.com/",
  "country": "China",
  "description": "Switchable scalable moratorium",
  "founded": 1971,
  "industry": "Public Safety",
  "numberOfEmployees": 5287
}


### 2. Nested Conditions with OR Logic
This query demonstrates the use of logical OR operations. It finds software companies that are either located in the USA OR have more than 500 employees.

In [15]:
nested_filter = org_collection.find({
    "$or": [
        {"country": "USA"},
        {"numberOfEmployees": {"$gt": 500}}
    ],
    "industry": "Plastics"
})
print("\Plastics organizations in USA or with more than 500 employees:")
for org in nested_filter:
    print_mongo(org)


\Plastics organizations in USA or with more than 500 employees:
{
  "_id": "678ae70038e354c777512626",
  "organizationId": "FAB0d41d5b5d22c",
  "name": "Ferrell LLC",
  "website": "https://price.net/",
  "country": "Papua New Guinea",
  "description": "Horizontal empowering knowledgebase",
  "founded": 1990,
  "industry": "Plastics",
  "numberOfEmployees": 3498
}
{
  "_id": "678ae70038e354c77751262e",
  "organizationId": "0B4F93aA06ED03e",
  "name": "Carr Inc",
  "website": "http://ross.com/",
  "country": "Kuwait",
  "description": "Distributed impactful customer loyalty",
  "founded": 1996,
  "industry": "Plastics",
  "numberOfEmployees": 8167
}
{
  "_id": "678ae70038e354c777512639",
  "organizationId": "c1Ce9B350BAc66b",
  "name": "Weiss and Sons",
  "website": "https://barrett.com/",
  "country": "Korea",
  "description": "Sharable optimal functionalities",
  "founded": 2011,
  "industry": "Plastics",
  "numberOfEmployees": 5984
}
{
  "_id": "678ae70038e354c77751265f",
  "organizat

### 3. Pattern Matching with Regex
This query shows how to use regular expressions for pattern matching in MongoDB. It finds organizations whose names start with 'My' (case-insensitive).

In [17]:
regex_filter = org_collection.find({"name": {"$regex": "^Ma", "$options": "i"}})
print("\nOrganizations with names starting with 'Ma':")
for org in regex_filter:
    print_mongo(org)



Organizations with names starting with 'Ma':
{
  "_id": "678ae70038e354c77751266f",
  "organizationId": "208044AC2fe52F3",
  "name": "Massey LLC",
  "website": "https://frazier.biz/",
  "country": "Suriname",
  "description": "Configurable zero administration Graphical User Interface",
  "founded": 1986,
  "industry": "Accounting",
  "numberOfEmployees": 5004
}
{
  "_id": "678ae70038e354c77751262a",
  "organizationId": "9eE8A6a4Eb96C24",
  "name": "Mayer Group",
  "website": "http://www.brewer.com/",
  "country": "Mauritius",
  "description": "Synchronized needs-based challenge",
  "founded": 1991,
  "industry": "Transportation",
  "numberOfEmployees": 7870
}
{
  "_id": "678ae70038e354c77751267c",
  "organizationId": "a0a6f9b3DbcBEb5",
  "name": "Mays-Preston",
  "website": "http://www.browning-key.com/",
  "country": "Mali",
  "description": "User-centric heuristic focus group",
  "founded": 2006,
  "industry": "Military Industry",
  "numberOfEmployees": 5786
}


### 4. Array Filtering with $elemMatch
This query demonstrates how to filter documents based on array field conditions. It finds organizations that have specific elements in their departments array matching multiple criteria.

In [32]:
org_collection.delete_one({"organizationId": "org999", "industry": "Software"})
org_collection.delete_one({"organizationId": "org111", "industry": "Software"})

sample_org = {
    "organizationId": "org999",
    "name": "Tech Solutions Inc",
    "industry": "Software",
    "departments": [
        {"name": "Engineering", "employees": 50, "location": "Floor 1"},
        {"name": "Marketing", "employees": 20, "location": "Floor 2"},
        {"name": "HR", "employees": 10, "location": "Floor 2"}
    ]
}
org_collection.insert_one(sample_org)

sample_org2 = {
    "organizationId": "org111",
    "name": "Infinity software",
    "industry": "Software",
    "departments": [
        {"name": "Engineering", "employees": 5, "location": "Floor 1"},
        {"name": "Marketing", "employees": 2, "location": "Floor 2"},
        {"name": "HR", "employees": 3, "location": "Floor 2"}
    ]
}
org_collection.insert_one(sample_org2)

array_filter = org_collection.find({
    "departments": {
        "$elemMatch": {
            "location": "Floor 2",
            "employees": {"$gt": 15}
        }
    }
})

print("Organizations with departments on Floor 2 having more than 15 employees:")
for org in array_filter:
    print_mongo(org)

org_collection.delete_one({"organizationId": "org999", "industry": "Software"})
org_collection.delete_one({"organizationId": "org111", "industry": "Software"})

Organizations with departments on Floor 2 having more than 15 employees:
{
  "_id": "678aeb572b0c9c2482d26be8",
  "organizationId": "org999",
  "name": "Tech Solutions Inc",
  "industry": "Software",
  "departments": [
    {
      "name": "Engineering",
      "employees": 50,
      "location": "Floor 1"
    },
    {
      "name": "Marketing",
      "employees": 20,
      "location": "Floor 2"
    },
    {
      "name": "HR",
      "employees": 10,
      "location": "Floor 2"
    }
  ]
}


DeleteResult({'n': 1, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1737157463, 45), 'signature': {'hash': b'\xb8\xaa\xd7=\xb1DT}\xc4\x1a\x80\xbe\x0f\xa4\xb1\x8bI\xdd\xe6N', 'keyId': 7461021512196161554}}, 'operationTime': Timestamp(1737157463, 45)}, acknowledged=True)