# History of Database Systems

- 1950s to early 1960s
    - transistor-based programmable digital computers
    - punched cards and magnetic tapes
- late 1960s to 1970s
    - hard disks allowed more direct access to data
    - network and hierachical data models in widespread use
    - Ted Codd defines the relational data model
        - won the ACM Turing Award in 1981
        - IBM -> System R prototype
        - UCB -> Ingres prototype
    - high-performance transaction processing
- 1980s
    - research relational prototypes evolve into commercial systems
    - SQL becomes industry standard
    - first parallel and distributed database systems
    - object-oriented database systems
- 1990s
    - large multi-terabyte data warehouses
- early 2000s
    - XML and XQuery standards
    - automated database administration
- late 2000s
    - unstructured and semi-structured data
    - scalable NoSQL systems (BigTable, Dynamo)

# Drawbacks of File Systems

- file systems lead to data **redundancy and inconsistency**
    - multiple file formats, duplication of information in different files
- file systems **cannot answer queries directly**
    - write a new program to carry out each new task
    - one data set may be scattered across multiple files
- file systems do **not enforce integrity**
    - integrity constriants become buried in program code rather than being stated explicitly
    - adding new constraints or changing existing ones is cumbersome
- updates in a file system are **not always atomic**
    - failures may leave data in an inconsistent state with partial updates carried out
- file systems offer **limited support for concurrent access** by multiple users
    - concurrent access needed for performance
    - uncontrolled concurrent accesses can lead to inconsistencies
- file systems do not provide sufficient **security**
    - difficult to provide access to a specific subset of the data

A database management system (DBMS) offers solutions to above problems


# Levels of Abstraction in a Database

![Snip20190909_13.png](img/Snip20190909_13.png)

- **physical level**: describes how a record (e.g., customer) is stored
    - e.g., customers are stored in ascending order by ID, and there is a secondary index on the name attribute
- **logical level**: describes the structure of the data stored in a database, and the relationships among the data
    - e.g., an instructor has an ID and name, and belongs to some department
- **view level**: describes a virtual structure of the data imposed by the database designer on top of the logical level (e.g., a virtual table defined as the result of querying a base table or joining two base tables)

# Physical vs Logical Schemas

- **Schema**: the shape or structure of the database
    - **physical schema**: design at physical level (storage, file formats, indexes)
    - **logical schema**: design at logical level (data types, relations, rows, columns)
- **Physical Data Independence**: the ability to modify the physical schema without changing the logical schema or the application program
    - e.g., a table that is accessed frequently can be moved to a faster disk without breaking any of the queries
    - e.g., an index can be added to speed up a specific query without breaking that query or other queries
- **Logical Data Independence**: the ability to modify the logical schema without changing the application program
    - e.g., a new table can be added to the logical schema without breaking any of the queries defined over existing tables or views
    - e.g., a coulmn can be added to a table without breaking any of the queries that access the table or any view defined over the table

# Structure of a Modern Database

![Snip20190909_14.png](img/Snip20190909_14.png)