1. What are the key differences between SQL and NoSQL databases

SQL databases are relational, table-based, and have predefined schemas. They use SQL for queries and follow ACID principles strictly, making them ideal for structured data and complex joins. Examples include MySQL and PostgreSQL. NoSQL databases like MongoDB are non-relational, schema-flexible, and store data as documents, key-value pairs, or graphs. They prioritize horizontal scaling, high availability, and performance for unstructured or semi-structured data. SQL is best for applications needing strict consistency, while NoSQL suits rapidly evolving, large-scale applications.

2. What makes MongoDB a good choice for modern applications

MongoDB’s flexible, document-based model supports dynamic schemas, allowing developers to adapt to changing requirements quickly. It scales horizontally through sharding and ensures high availability via replication. Its aggregation pipeline offers advanced data processing without complex joins. MongoDB integrates well with modern languages, microservices, and cloud environments. Features like geospatial queries, full-text search, and ACID transactions add versatility, making it suitable for e-commerce, analytics, IoT, and real-time applications.

3. Explain the concept of collections in MongoDB

A collection in MongoDB is similar to a table in SQL but stores BSON documents. Collections are schema-less, allowing documents with different fields. They group related data, making queries efficient. Types include regular collections, capped collections (fixed-size), and system collections for internal use.

4. How does MongoDB ensure high availability using replication

MongoDB uses replica sets: a primary node handles writes, and secondary nodes maintain copies of the data. If the primary fails, an automatic election promotes a secondary to primary, ensuring minimal downtime and fault tolerance.

5. What are the main benefits of MongoDB Atlas

MongoDB Atlas is a managed cloud database service that automates backups, scaling, and monitoring. It offers global data distribution, integrated security, performance optimization tools, and built-in analytics, reducing administrative tasks and improving reliability.

6. What is the role of indexes in MongoDB, and how do they improve performance

Indexes allow MongoDB to locate documents quickly without scanning the entire collection. They improve query performance significantly. Types include single-field, compound, text, and geospatial indexes. However, excessive indexing can slow writes.

7. Describe the stages of the MongoDB aggregation pipeline

Common stages include $match (filter), $group (aggregate data), $project (reshape), $sort (order), $lookup (join), and $limit/$skip (pagination). Each stage processes documents and passes results to the next.

8. What is sharding in MongoDB? How does it differ from replication

Sharding splits data across multiple servers for scalability. Replication duplicates data across servers for fault tolerance. Sharding increases capacity; replication increases availability.

9. What is PyMongo, and why is it used

PyMongo is the official Python driver for MongoDB, enabling Python applications to connect to MongoDB, perform CRUD operations, run queries, and use aggregation pipelines.

10. What are the ACID properties in the context of MongoDB transactions

MongoDB supports ACID transactions across multiple documents and collections: Atomicity (all-or-nothing), Consistency (valid data), Isolation (transactions don’t interfere), and Durability (changes persist after commit).

11. What is the purpose of MongoDB’s explain() function

explain() reveals how a query is executed, showing index usage, scan type, and execution stats. It helps identify bottlenecks and optimize performance.

12. How does MongoDB handle schema validation

MongoDB supports JSON Schema validation at the collection level, enforcing rules like required fields and value types to ensure data consistency.

13. What is the difference between a primary and a secondary node in a replica set

The primary handles all writes and provides the latest data. Secondary nodes replicate from the primary and can handle reads if enabled. On failure, a secondary becomes primary.

14. What security mechanisms does MongoDB provide for data protection

MongoDB offers authentication, authorization, encryption (TLS/SSL, at-rest encryption), role-based access control, auditing, and IP whitelisting to protect data.

15. Explain the concept of embedded documents and when they should be used

Embedded documents store related data in the same document. They’re efficient when related data is always accessed together, reducing the need for joins.

16. What is the purpose of MongoDB’s $lookup stage in aggregation

$lookup performs a left outer join with another collection, adding matched documents as arrays to the source documents.

17. What are some common use cases for MongoDB

Content management, e-commerce, IoT, real-time analytics, geospatial applications, mobile app backends, and personalized recommendation systems.

18. What are the advantages of using MongoDB for horizontal scaling

Sharding distributes data across multiple servers, handling large volumes of traffic and storage without overloading a single machine.

19. How do MongoDB transactions differ from SQL transactions

MongoDB transactions are optimized for document-based operations and span multiple collections since version 4.0, whereas SQL transactions have long supported multi-table ACID operations.

20. What are the main differences between capped collections and regular collections

Capped collections have fixed sizes and overwrite oldest documents when full; regular collections grow without size limits.

21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline

$match filters documents early in the pipeline, reducing the number processed in later stages, similar to SQL’s WHERE clause.

22. How can you secure access to a MongoDB database

Enable authentication, use role-based access control, enforce TLS/SSL, restrict IP addresses, and enable auditing to monitor access.

23. What is MongoDB’s WiredTiger storage engine, and why is it important

WiredTiger, the default storage engine, offers document-level concurrency, compression, and efficient memory usage, improving performance and durability.

In [2]:
!pip install pymongo

Collecting pymongo
  Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.6/313.6 kB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.14.0
