# 1. Introduction from DBMS to Cloud DBMS
### 1. Introduction to Database Management Systems (DBMS)

A **Database Management System (DBMS)** is software that interacts with end-users, applications, and the database itself to capture and analyze data. It is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications.

### 2. Importance of DBMS

- **Data Abstraction and Independence**: DBMS abstracts the physical data to logical data structures and provides data independence. This means changes in data storage structures or strategies do not affect the application programs.
- **Efficient Data Access**: DBMS uses a variety of sophisticated techniques to store and retrieve data efficiently.
- **Data Integrity and Security**: DBMS provides integrity constraints to ensure data accuracy and consistency, and it also provides security mechanisms to protect the data from unauthorized access.
- **Data Administration**: DBMS provides tools for database administration, which includes performance monitoring and tuning.
- **Concurrent Access and Crash Recovery**: DBMS allows multiple users to access the database concurrently without compromising the integrity of the data, and it provides mechanisms for recovering the database in case of a crash.

### 3. Database Models

DBMSs are categorized based on the data model they support. Here are the major types:

#### 3.1 Hierarchical Model

In the hierarchical model, data is organized into a tree-like structure where each record has a single parent and possibly many children. It is represented by parent-child relationships.

#### 3.2 Network Model

The network model allows more complex relationships between entities. Unlike the hierarchical model, a record can have multiple parents. It uses graph structures with nodes and edges.

#### 3.3 Relational Model

The relational model organizes data into tables (relations), where each table consists of rows and columns. Each row is a record, and each column is an attribute of the record.

#### 3.4 Object-oriented Model

In this model, data is stored in the form of objects, similar to object-oriented programming. Each object contains both data and methods that operate on the data.

#### 3.5 Entity-Relationship Model

The ER model uses entities and relationships to represent data. Entities are objects that exist independently, and relationships represent the associations between these objects.

### 4. Components of DBMS

A DBMS consists of several key components that work together to manage data effectively:

#### 4.1 Hardware

The physical devices like computers, storage devices, and network devices.

#### 4.2 Software

This includes the DBMS software itself, operating system, network software, and application programs.

#### 4.3 Data

The actual data stored in the database. This includes user data, metadata, and indexes.

#### 4.4 Procedures

Instructions and rules that govern the design and use of the database.

#### 4.5 Database Access Language

The language used to access and manipulate the database. SQL (Structured Query Language) is the most common database access language.

#### 4.6 Users

There are different types of users who interact with the DBMS:

- **Database Administrators (DBA)**: They are responsible for managing the entire database system.
- **Database Designers**: They design the database structure.
- **End Users**: They are the people who query and update the data.
- **Application Programmers**: They develop applications that interact with the database.

### 5. Database Architecture

DBMS architecture can be categorized as follows:

#### 5.1 One-tier Architecture

The database is directly available to the user. Any changes in the database are done directly.

#### 5.2 Two-tier Architecture

This architecture involves a client and a server. The application at the client end communicates with the database at the server end.

#### 5.3 Three-tier Architecture

In this architecture, an additional layer (often called an application server) is added between the client and the server to manage business logic and database access.

![](https://cdn.educba.com/academy/wp-content/uploads/2021/04/DBMS-3-tier-Architecture.jpg)

### 6. Database Schema

A schema is a blueprint or architecture of a database. It defines how data is organized and how relationships among data are associated.

#### 6.1 Physical Schema

Describes how data is stored in the database, including data types, indexes, and storage details. (More Abstract level)

#### 6.2 Logical Schema

Describes the logical constraints of the data, such as table structures, views, and integrity constraints. (Like Tabular structure of some abstract))

#### 6.3 View Schema

Describes how users interact with the data, including user interfaces and permissions.(Tabular UI)

![](https://afteracademy.com/images/what-is-a-schema-three-levels-of-schema-84a896db453efdac.jpg)

### 7. Normalization

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.

#### 7.1 Normal Forms

- **First Normal Form (1NF)**: Ensures that the values in each column of a table are atomic.
- **Second Normal Form (2NF)**: Ensures that all non-key attributes are fully functional dependent on the primary key.
- **Third Normal Form (3NF)**: Ensures that there are no transitive dependencies.
- **Boyce-Codd Normal Form (BCNF)**: A stricter version of 3NF where every determinant is a candidate key.

### 8. Database Keys

Keys are attributes or a set of attributes that help in identifying a row (or record) in a table uniquely.

#### 8.1 Primary Key

A primary key is a field (or combination of fields) that uniquely identifies each record in a table.

#### 8.2 Foreign Key

A foreign key is a field (or combination of fields) in one table that uniquely identifies a row of another table. It establishes a relationship between two tables.

#### 8.3 Candidate Key

A candidate key is a field (or combination of fields) that can uniquely identify each record in a table. Each table can have multiple candidate keys, out of which one is selected as the primary key.

#### 8.4 Composite Key

A composite key is a primary key composed of multiple fields to uniquely identify a record.

#### 8.5 Super Key

A super key is a set of one or more columns (attributes) that uniquely identify a row in a table.

### 9. Transactions in DBMS

A transaction is a sequence of operations performed as a single logical unit of work.

#### 9.1 ACID Properties

Transactions have the following ACID properties to ensure database reliability:

- **Atomicity**: Ensures that all operations within the transaction are completed; if not, the transaction is aborted.
- **Consistency**: Ensures that the database remains consistent before and after the transaction.
- **Isolation**: Ensures that transactions are isolated from each other until they are completed.
- **Durability**: Ensures that the results of a transaction are permanently recorded in the database even if there is a system failure.

#### 9.2 Concurrency Control

Concurrency control ensures that multiple transactions can occur concurrently without leading to inconsistencies. Methods include locking, timestamp ordering, and multi-version concurrency control.

#### 9.3 Deadlocks

A deadlock occurs when two or more transactions are waiting for each other to release resources, causing them to be stuck indefinitely. Techniques to handle deadlocks include timeout, deadlock detection, and deadlock prevention.

### 10. SQL: Structured Query Language

SQL is the standard language for interacting with relational databases. It includes the following components:

#### 10.1 Data Definition Language (DDL)

Commands to define and manage database structures:

- **CREATE**: Creates a new table or database.
- **ALTER**: Modifies an existing table structure.
- **DROP**: Deletes a table or database.

#### 10.2 Data Manipulation Language (DML)

Commands to manipulate data within tables:

- **SELECT**: Retrieves data from one or more tables.
- **INSERT**: Adds new data to a table.
- **UPDATE**: Modifies existing data in a table.
- **DELETE**: Removes data from a table.

#### 10.3 Data Control Language (DCL)

Commands to control access to data within the database:

- **GRANT**: Gives a user permission to perform specific tasks.
- **REVOKE**: Removes user permissions.

#### 10.4 Transaction Control Language (TCL)

Commands to manage transactions:

- **COMMIT**: Saves the changes made by the current transaction.
- **ROLLBACK**: Reverts the changes made by the current transaction.
- **SAVEPOINT**: Sets a point within a transaction to which you can roll back.

### 11. Indexing

Indexes improve the speed of data retrieval operations by providing quick access paths to data. Common types of indexes include:

- **Primary Index**: Built on the primary key.
- **Secondary Index**: Built on non-primary key attributes.
- **Clustered Index**: Determines the physical order of data in a table.
- **Non-clustered Index**: Does not alter the physical order of data.

### 12. Views

A view is a virtual table that provides a specific representation of the data in one or more tables. It is defined by a SQL query and does not store data physically.

### 13. Database Security

Database security involves protecting the database against unauthorized access, corruption, or theft.

#### 13.1 Authentication

Ensures that only authorized users can access the database.

#### 13.2 Authorization

Defines what actions users can perform on the database objects.

#### 13.3 Encryption

Protects data by converting it into a format that is unreadable without a decryption key.

#### 13.4 Auditing

Tracks database access and changes to ensure compliance with policies and regulations.

### 14. Backup and Recovery

Backup and recovery procedures ensure that data can be restored in case of data loss or corruption.

#### 14.1 Types of Backups

- **Full Backup**: A complete copy of the database.
- **Incremental Backup**: Copies only the data that has changed since the last backup.
- **Differential Backup**: Copies all the data that has changed since the last full backup.

#### 14.2 Recovery Techniques

- **Cold Backup**: Taken while the database is offline.
- **Hot Backup**: Taken while the database is running.
- **Point-in-Time Recovery**: Restores the database to a specific point in time.

### 15. Distributed Databases

A distributed database is a database that is spread across multiple locations, either on the same network or on different networks.

#### 15.1 Advantages

- **Scalability**: Easily scale the database by adding more nodes.
- **Availability**: Data is available even if some nodes are down.
- **Performance**: Improved performance due to data localization.

#### 15.2 Challenges

- **Complexity**: More complex to manage and maintain.
- **Consistency**: Ensuring data consistency across multiple nodes is challenging.
- **Latency**: Data retrieval may be slower due to network delays.

### 16. NoSQL Databases

NoSQL databases are non-relational databases designed for specific data models and have flexible schemas.

#### 16.1 Types of NoSQL Databases

- **Document-based**: Stores data in JSON or BSON format (e.g., MongoDB).
- **Key-value**: Stores data as key-value pairs (e.g., Redis).
- **Column-family**: Stores data in columns rather than rows (e.g., Cassandra).
- **Graph-based**: Stores data in graph structures with nodes, edges, and properties (e.g., Neo4j).

#### 16.2 Advantages

- **Scalability**: Easily scale horizontally.
- **Flexibility**: Schema-less design allows for dynamic changes.
- **Performance**: Optimized for read/write performance.

### 17. Data Warehousing

A data warehouse is a centralized repository for storing large volumes of structured and semi-structured data from multiple sources. It is optimized for query and analysis.

#### 17.1 ETL Process

- **Extract**: Extract data from various sources.
- **Transform**: Transform data into a suitable format.
- **Load**: Load transformed data into the data warehouse.

#### 17.2 Benefits

- **Data Integration**: Integrates data from multiple sources.
- **Historical Analysis**: Provides historical data for analysis.
- **Improved Decision Making**: Supports business intelligence and analytics.

### 18. Big Data

Big data refers to large, complex datasets that cannot be handled by traditional DBMSs. It involves the use of distributed computing and storage systems.

#### 18.1 Characteristics

- **Volume**: Large amounts of data.
- **Velocity**: High-speed data generation and processing.
- **Variety**: Diverse data types and sources.
- **Veracity**: Uncertainty and inconsistency in data.

#### 18.2 Technologies

- **Hadoop**: An open-source framework for distributed storage and processing.
- **Spark**: An open-source distributed computing system for big data processing.
- **NoSQL Databases**: Used for storing and retrieving big data.

### 19. Cloud Databases

Cloud databases are databases that run on cloud computing platforms and provide various database services.

#### 19.1 Advantages

- **Scalability**: Easily scale up or down based on demand.
- **Cost-Effective**: Pay for what you use.
- **Accessibility**: Access data from anywhere.

#### 19.2 Types of Cloud Databases

- **Relational**: Cloud-based relational databases (e.g., Amazon RDS, Google Cloud SQL).
- **NoSQL**: Cloud-based NoSQL databases (e.g., Amazon DynamoDB, Google Cloud Datastore).
- **Data Warehouse**: Cloud-based data warehouses (e.g., Amazon Redshift, Google BigQuery).

### 20. Future Trends in DBMS

#### 20.1 Artificial Intelligence and Machine Learning

AI and ML are increasingly being integrated into DBMSs for improved data management, query optimization, and predictive analytics.

#### 20.2 Blockchain

Blockchain technology is being explored for secure, transparent, and tamper-proof databases.

#### 20.3 Multi-Model Databases

Multi-model databases support multiple data models (relational, document, graph, etc.) within a single database engine.

#### 20.4 Automation

Automation in database management, such as self-healing databases and automated tuning, is becoming more prevalent.

#### 20.5 Edge Computing

Edge computing involves processing data closer to where it is generated. This trend is driving the need for lightweight and distributed DBMS solutions.

### Conclusion

Database Management Systems play a crucial role in the efficient storage, retrieval, and management of data in various applications and industries. Understanding the full concept of DBMS involves knowledge of its architecture, components, data models, normalization, SQL, transactions, security, and emerging trends. As technology advances, DBMS continues to evolve, incorporating new features and addressing the growing demands of data-driven environments.

This is the chapter 01 so it covers the introduction part only

#### Prepared By,
Ahamed Basith