# Database Systems
## Lecture 01 - Introduction to Databases and Database Management Systems

---
<a name="top"></a>
* [Welcome to Database Systems Course](#welcome)
* [Literature](#literature)
* [Brief History of Database Development](#history)
* [Definition of Database (DB)](#definition)
* [Database Management Systems (DBMS)](#dbms)
* [DBMS Facilities](#dbms_fac)
* [Components of the DBMS Environment](#components)
* [Advantages and Disadvantages of DBMSs](#adv_disadv)
* [Roles of Database Management Systems (DBMS)](#roles)
* [Three-Level ANSI-SPARC Architecture](#3level)
* [Three-Tier Client-Server](#3tier)
* [Distributed DBMS](#distrib)

---
<a name="welcome"></a>
## [Welcome to Database Systems Course](#top)

We are thrilled to have you join us on this journey into the fascinating world of databases!

In today's interconnected digital age, data is the lifeblood of organizations across all industries. Understanding how to efficiently store, manage, and retrieve data is crucial for success in the field of computer science and beyond. This course will equip you with the fundamental knowledge and skills needed to navigate the complexities of database systems.

Throughout this course, we will explore various topics, including relational database concepts, SQL query language, database design, normalization, transaction management, and much more. Whether you're a beginner or have some prior experience with databases, this course is designed to accommodate learners of all levels.

Our aim is not only to impart theoretical knowledge but also to provide hands-on experience through practical exercises, case studies, and real-world examples. By the end of this course, you will have gained a solid foundation in database systems and be well-prepared to tackle real-world database challenges with confidence.


---
<a name="literature"></a>
## [Literature](#top)
* <a name="lit1"></a>[1] Thomas M. Connolly, Carolyn E. Begg - Database Systems: A Practical Approach to Design, Implementation, and Management (6th Edition). 2014.
* <a name="lit2"></a>[2] Elmasri Ramez and Navathe Shamkant - Fundamentals Of Database System (7th Edition). 2017.


---
<a name="history"></a>
## [Brief History of Database Development](#top)

`Information Technology (IT)` refers to the use of computers, software, networks, and other technologies to **store**, **retrieve**, **process**, **transmit** and **present** data and information. The terms `information` and `data` are related but distinct concepts, here's how they differ:

***Data:***
   - Data refers to raw facts, figures, or symbols that represent information. It is typically unprocessed and lacks context.
   - Examples of data include numbers, text, images, and sounds.
   - Data by itself does not convey meaning or understanding until it is processed and interpreted.

***Information:***
   - Information is processed and organized data that provides context, relevance, and meaning.
   - It results from analyzing, interpreting, and structuring data to make it useful for decision-making or understanding.
   - Information answers questions like who, what, where, and when, providing insights or answers to specific queries.
   - For example, a list of student grades (data) becomes information when organized into a report showing the average grade for each class.

In summary, data is raw and unprocessed, information is processed and meaningful data.

`Databases` deal with storage and retrieval of data, while `business logic` of an application processes data, that later on presented by `user interface` (UI). We will have a closer look at typical application architectures with databases later on.

### 1. File-based Systems (Pre-1960s):
   - In the early days of computing, data was typically stored in flat files using file-based systems. (And it is still a viable solution! See *Learn more* section)
   - Each application maintained its own set of files (separation and isolation of data), leading to redundancy (additional storage space), data inconsistency (loss of integrity) and incompatible file formats.
   - Programs directly interacted with files using low-level file operations (fixed queries).

<div class="learnmore"><b>Learn more:</b> <br><a href="https://en.wikipedia.org/wiki/Comma-separated_values">Comma-separated values (CSV)</a><br>
<a href="https://en.wikipedia.org/wiki/JSON">JavaScript Object Notation (JSON)</a>
</div>

### 2. Hierarchical and Network Databases (1960s-1970s):
   - Hierarchical databases, such as IBM's IMS (Information Management System), organized data in a tree-like structure.
   - Network databases, like CODASYL's network model, introduced more flexible relationships between data elements.
   - These models offered better organization and retrieval compared to flat files but were complex to implement and manage.

### 3. Relational Databases (1970s-Present):
   - Relational databases, pioneered by Edgar F. Codd, introduced the relational model based on mathematical set theory.
   - Data is organized into tables (relations) with rows (tuples) and columns (attributes).
   - SQL (Structured Query Language) became the standard language for interacting with relational databases.
   - Relational databases, such as Oracle, MySQL, and SQL Server, gained widespread adoption due to their simplicity and flexibility.

### 4. Object-Oriented Databases (1980s-2000s):
   - Object-oriented databases extended the relational model to include complex data types and inheritance.
   - They aimed to bridge the gap between programming languages and databases by directly supporting object-oriented programming concepts.
   - Examples include ObjectStore and GemStone, but they faced challenges in scalability and interoperability with relational systems.

### 5. NoSQL Databases (2000s-Present):
   - NoSQL (Not Only SQL) databases emerged to address the limitations of relational databases in handling large-scale distributed data.
   - These databases offer flexible schemas, horizontal scalability, and high availability.
   - Types of NoSQL databases include document stores (e.g., MongoDB), key-value stores (e.g., Redis), column-family stores (e.g., Apache Cassandra), and graph databases (e.g., Neo4j).

### 6. New Paradigms: Graph and Vector-based Databases (2010s-Present):
   - Graph databases, such as Neo4j, focus on representing and querying relationships between data entities.
   - Vector-based databases, like those used in machine learning and analytics, optimize storage and processing for high-dimensional data.
   - These databases cater to specialized use cases, such as recommendation systems, spatial data analysis, and deep learning.

The evolution of databases has been driven by the need for more efficient data storage, retrieval, and manipulation, as well as advancements in computing technologies and data science.

---
<a name="definition"></a>
## [Definition of Database (DB)](#top)

A `database` is a collection of related data. By data, we mean known facts that can be recorded and that have implicit meaning [[2]](#lit2).

A `database` is a shared collection of logically related data, and a description of this data, designed to meet the information needs of an organization [[1]](#lit1).

- *shared collection*: single, possibly large, repository of data with minimal redundancy instead of separate files. It can be accessed by many departments and users simultaneously using variety of tools. 
- *logically related data*: means that the data stored in a database is not just a random collection of information, but rather it is structured and related in a way that supports efficient storage, retrieval, management, and analysis. This organization is based on the logical structure of the data rather than its physical storage.
- *description of this data*: The database contains not only the operational data of the organization, but also a description of that data. The description of the data is known as the `system catalog` (or `data dictionary` or `metadata` - the "data about data"). 
- *to meet the information needs*: provides timely, accurate, and relevant information to support the operations, decision-making processes, and strategic goals of an organization. Essentially, a database is designed to serve as a foundational component of an organization's information system, enabling the organization to achieve its objectives by providing the right information to the right people at the right time.

## OLAP vs OLTP

---
<a name="dbms"></a>
## [Database Management Systems (DBMS)](#top)

A `database management system` is a collection of programs that enables users to create and maintain a database. The DBMS is a general-purpose software system that facilitates the processes of defining, constructing, manipulating, and sharing databases among various users and applications [[2]](#lit2)

A `database management system` is a software system that enables users to define, create, maintain, and control access to the database [[1]](#lit1)

A `database management system` is software that controls the storage, organization, and retrieval of data (Oracle)

---
<a name="dbms_fac"></a>
## [DBMS Facilities](#top)

Typically, a DBMS provides the following facilities:

- It allows users to define the database, usually through a `Data Definition Language` (DDL). The DDL allows users to specify the data types and structures and the constraints on the data to be stored in the database
- It allows users to insert, update, delete, and retrieve data from the database, usually through a `Data Manipulation Language` (DML). Having a central repository for all data and data descriptions allows the DML to provide a general inquiry facility to this data, called a query language

- It provides controlled access to the database. For example, it may provide:
    - security system, which prevents unauthorized users accessing the database
    - integrity system, which maintains the consistency of stored data
    - concurrency control system, which allows shared access of the database
    - recovery control system, which restores the database to a previous consistent state following a hardware or software failure
    - user-accessible catalog, which contains descriptions of the data in the database

---
<a name="components"></a>

## [Components of the DBMS Environment](#top)
- Hardware
- Software
- Data
- Procedures
    - log on to the DBMS;
    - use a particular DBMS facility or application program
    - start and stop the DBMS
    - make backup copies of the database
    - handle hardware or software failures.
- People (Users)

---
<a name="adv_disadv"></a>
## [Advantages and Disadvantages of DBMSs](#top)
Advantages of DBMS:
- Control of data redundancy
- Data consistency
- More information from the same amount of data
- Sharing of data
- Improved data integrity
- Improved security
- Enforcement of standards
- Economy of scale
- Balance of conflicting requirements
- Improved data accessibility and responsiveness
- Increased productivity
- Improved maintenance through data independence
- Increased concurrency
- Improved backup and recovery services

Disadvantages of DBMS:
- Complexity
- Size
- Cost of DBMSs
- Additional hardware costs
- Cost of conversion
- Performance
- Higher impact of a failure

---
<a name="roles"></a>
## [Roles of Database Management Systems (DBMS)](#top)

A Database Management System (DBMS) plays several crucial roles in the management and organization of data within a computer system. Here are the key roles:

1. **Data Definition:**
   - The DBMS allows users to define the data structure, create tables, and specify the relationships between tables. This role involves defining the schema that represents the organization of data.

2. **Data Storage Management:**
   - DBMS is responsible for efficiently storing and managing the data on physical storage devices. It determines how data is stored, indexed, and retrieved for optimal performance.

3. **Data Retrieval:**
   - DBMS provides a Query Language (e.g., SQL) to retrieve data from the database. Users can perform queries to extract specific information based on their requirements.

4. **Data Manipulation:**
   - DBMS allows users to insert, update, and delete data in the database. It ensures the integrity of the data by enforcing constraints and rules defined during the data definition phase.

5. **Security and Authorization:**
   - DBMS manages access control and ensures that only authorized users have access to specific data. It includes features such as user authentication, encryption, and role-based access control.

6. **Concurrency Control:**
   - In a multi-user environment, DBMS manages concurrent access to the database to ensure data consistency. It employs techniques like locking and transaction management to handle multiple transactions simultaneously.

7. **Data Integrity and Constraints:**
   - DBMS enforces data integrity by defining constraints (e.g., primary key, foreign key) to ensure the accuracy and reliability of the stored data.

8. **Backup and Recovery:**
   - DBMS provides mechanisms for backing up the database regularly and recovering data in case of system failures or data corruption. This ensures the availability and durability of data.

9. **Database Maintenance and Optimization:**
   - DBMS includes tools and utilities for database maintenance, such as indexing, query optimization, and performance monitoring, to enhance the overall efficiency of the system.

10. **Data Independence:**
    - DBMS provides a level of abstraction between the application programs and the physical storage of data. This enables changes to the database structure without affecting the application programs.

These roles collectively contribute to the effective and organized management of data within a database system.


---

<a name="3level"></a>
## [Three-Level ANSI-SPARC Architecture](#top)

<center><img height="400" width="400" align="center" src="attachment:37f2e77a-d9b0-41a3-a013-2ec62f739790.png" /></center>

* External level – the users’ view of the database. This level describes that part of the database that is relevant to each user
* Conceptual level – the community view of the database. This level describes what data is stored in the database and the relationships among the data
* Internal level – the physical representation of the database on the computer. This level describes how the data is stored in the database


<center><img height="400" width="400" align="center" src="attachment:06ee41e4-925a-426b-877d-faab370996ee.png" /></center>

The overall description of the database is called the `database schema`. A major objective for the three-level architecture is to provide `data independence`, which means that upper levels are unaffected by changes to lower levels.

`Data model` – integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on the data in an organization
A model is a representation of "real world" objects and events, and their associations. It is an abstraction that concentrates on the essential, inherent aspects of an organization and ignores the accidental properties.




---

<a name="3tier"></a>
## [Three-Tier Client-Server](#top)

<center><img height="400" width="400" align="center" src="attachment:45789fad-f2d0-4437-84d6-5d9784aeca06.png" /></center>


---

<a name="distrib"></a>
## [Distributed DBMS](#top)

* A distributed database is a logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network.
* A distributed DBMS is the software system that permits the management of the distributed database and makes the distribution transparent to users.
* A DDBMS consists of a single logical database split into a number of `fragments`. 
* Each fragment is stored on one or more computers (`replicas`) under the control of a separate DBMS, with the computers connected by a network. 
* Each site is capable of independently processing user requests that require access to local data (that is, each site has some degree of local autonomy) and is also capable of processing data stored on other computers in the network.

---

---
# Summary

Within this lecture you have experienced the most basic definitions and buildings blocks in databases. These include in particular:

* [Brief History of Database Development](#history)
* [Definition of Database (DB)](#definition)
* [Database Management Systems (DBMS)](#dbms)
* [DBMS Facilities](#dbms_fac)
* [Components of the DBMS Environment](#components)
* [Advantages and Disadvantages of DBMSs](#adv_disadv)
* [Roles of Database Management Systems (DBMS)](#roles)
* [Three-Level ANSI-SPARC Architecture](#3level)
* [Three-Tier Client-Server](#3tier)
* [Distributed DBMS](#distrib)

The [next lecture](next_lecture_filename.ipynb) will introduce you to the world of data structures in Python.