# **MongoDB Assignment**

**Theoretical Questions**

**1. What are the key differences between SQL and NoSQL databases?**

SQL databases are relational, use tables with fixed schemas, and scale vertically, making them ideal for structured data and complex queries.

NoSQL databases are non-relational, can be document, key-value, graph, or wide-column stores, have flexible schemas, and scale horizontally, which is better for unstructured data, large datasets, and high availability.

**2. What makes MongoDB a good choice for modern applications?**

MongoDB is a good choice for modern applications due to its flexible, document-based data model, which allows for faster development and easier handling of evolving data structures.

**3. Explain the concept of collections in MongoDB.**

A collection in MongoDB is a grouping of related documents stored together within a database. Analogous to a table in relational databases, a collection does not enforce a rigid schema, meaning documents within the same collection can have different fields and structures.

**4. How does MongoDB ensure high availability using replication?**

MongoDB ensures high availability through replication using replica sets. A replica set is a group of mongod instances that maintain the same dataset, with one instance designated as the primary and the others as secondary members.

**5. What are the main benefits of MongoDB Atlas?**

The main benefits of MongoDB Atlas include its fully managed and automated cloud database, which simplifies deployment and management, allowing developers to focus on application development rather than infrastructure.

**6. What is the role of indexes in MongoDB, and how do they improve performance?**

ndexes in MongoDB are data structures that improve query performance by allowing MongoDB to quickly locate matching documents without scanning an entire collection.

They do this by storing a sorted copy of one or more fields, enabling faster data retrieval for queries that use those fields. While indexes speed up reads, they have a negative impact on write performance because each insert must also update the index.

7. Describe the stages of the MongoDB aggregation pipeline.

The MongoDB aggregation pipeline processes data through a sequence of stages to filter, transform, and aggregate documents into desired results. Documents pass from one stage to the next, with each stage performing an operation like filtering documents with match, restructuring them with project, grouping values with group, or sorting them with sort. This flow enables powerful, multi-step data manipulation within the database.

**7. What is sharding in MongoDB? How does it differ from replication?**

Sharding distributes a large dataset across multiple servers, while replication creates copies of data on different servers.

Sharding is used for horizontal scaling to handle big data and high throughput, whereas replication is used for high availability and fault tolerance by creating redundant copies of the data.

**9. What is PyMongo, and why is it used?**

PyMongo is the official MongoDB driver for Python, used to connect and interact with MongoDB databases from Python applications. It allows Python developers to work with the NoSQL database using familiar Python concepts like dictionaries and list.

**10. What are the ACID properties in the context of MongoDB transactions?**

In MongoDB, the ACID properties—Atomicity, Consistency, Isolation, and Durability—ensure that transactions are reliable and data integrity is maintained. Atomicity treats a transaction as a single unit of work that is either all committed or all rolled back, while Consistency ensures the database remains in a valid state before, during, and after a transaction. Isolation prevents concurrent transactions from interfering with each other, and Durability guarantees that committed changes are permanent.

**11.  What is the purpose of MongoDB’s explain() function?**

The explain() function in MongoDB provides detailed information on how MongoDB plans and executes a given query or aggregation pipeline. Its primary purpose is to help developers and database administrators understand, diagnose, and optimize query performance.

**12. How does MongoDB handle schema validation?**

MongoDB's schema validation allows users to enforce rules on the structure and content of documents within a collection, despite its flexible schema model. This helps maintain data consistency and prevent unintended schema changes.

**13. What is the difference between a primary and a secondary node in a replica set?**

A primary node in a replica set handles all write operations, while secondary nodes replicate the data from the primary and can be used to serve read operations. The primary is the single source of truth for writes, and if it fails, an election takes place to choose a new primary from the secondaries.

**14. What security mechanisms does MongoDB provide for data protection?**

MongoDB provides security through a combination of built-in mechanisms like authentication, role-based access control, and encryption for data in transit, at rest, and in use. Other security features include auditing, network isolation, and customizable guardrails to protect sensitive data throughout its lifecycle.

**15. Explain the concept of embedded documents and when they should be used.**

Embedded documents store related data within a single document, creating a hierarchical structure ideal for data that is frequently accessed together, like a customer's address or a blog post with its comments.

We should use embedding when data is tightly coupled, read performance is a priority, and the embedded data doesn't frequently change or exceed the document size limit of 16 MB in MongoDB.

**16. What is the purpose of MongoDB’s $lookup stage in aggregation?**

The $lookup stage in MongoDB's aggregation pipeline serves to perform a left outer join to an unsharded collection within the same database. Its primary purpose is to combine data from two collections, integrating related information into a single result set for further processing within the aggregation pipeline.

**17. What are some common use cases for MongoDB?**

MongoDB is a popular NoSQL database that stores data in flexible, JSON-like documents instead of tables. It is used to manage and query large amounts of structured, semistructured, and unstructured data, providing a scalable and flexible solution for modern applications.

**18. What are the advantages of using MongoDB for horizontal scaling?**

The advantages of MongoDB's horizontal scaling include improved performance and speed by distributing load across multiple servers, enhanced availability and fault tolerance through replica sets that ensure data redundancy, and cost-efficiency by allowing the use of cheaper commodity hardware to handle large datasets.

**19. How do MongoDB transactions differ from SQL transactions?**

MongoDB and SQL transactions both aim to ensure data consistency, but they differ in their scope and implementation. SQL transactions, inherent to relational databases, typically span multiple tables and operations, strictly enforcing ACID properties across a predefined schema. MongoDB, a NoSQL document database, historically focused on atomic operations at the document level.

**20. What are the main differences between capped collections and regular collections?**

The main differences are that capped collections have a fixed size, don't allow documents to be deleted or grow in size, and overwrite the oldest documents when full, whereas regular collections allow for flexible growth, document deletion, and updates that change a document's size.

**21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?**

The purpose of the $match stage in MongoDB's aggregation pipeline is to filter documents, passing only those that meet the specified criteria to the next stage. This is functionally similar to the find() method but operates within the aggregation framework and is crucial for performance, as it reduces the number of documents that later, more complex stages need to process.

**22. How can you secure access to a MongoDB database?**

Securing access to a MongoDB database involves implementing a multi-layered approach. Key steps include enabling authentication with strong, unique passwords for each user and application, and utilizing role-based access control (RBAC) to grant granular permissions based on the principle of least privilege. Network security is also crucial, involving restricting MongoDB to listen only on necessary network interfaces (e.g., localhost or specific internal IPs) and configuring firewalls to limit inbound connections to trusted sources.

**23. What is MongoDB’s WiredTiger storage engine, and why is it important?**

MongoDB's WiredTiger storage engine is the default, high-performance system that manages how data is stored on disk and in memory.

Its importance stems from features like document-level concurrency, which allows for more efficient handling of multiple writes to the same collection, and data compression, which reduces storage space and can improve performance.