# Database : 

- database is an organized collection of data stored and accessed electronically. 
- Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage.
- The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance.


# DBMS : 

- A database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze the data.
- The DBMS software additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system.
- Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.



- Computer scientists may classify database management systems according to the database models that they support.
- Relational databases became dominant in the 1980s. 
- These model data as rows and columns in a series of tables, and the vast majority use SQL for writing and querying data. 
- In the 2000s, non-relational databases became popular, collectively referred to as NoSQL, because they use different query languages.





- a "database" refers to a set of related data and the way it is organized.
- Access to this data is usually provided by a "database management system" (DBMS) consisting of an integrated set of computer software that allows users to interact with one or more databases and provides access to all of the data contained in the database (although restrictions may exist that limit access to particular data). 
- The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information and provides ways to manage how that information is organized.


- Because of the close relationship between them, the term "database" is often used casually to refer to both a database and the DBMS used to manipulate it.


- Outside the world of professional information technology, the term database is often used to refer to any collection of related data (such as a spreadsheet or a card index) as size and usage requirements typically necessitate use of a database management system.


- Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups:

1. Data definition – Creation, modification and removal of definitions that define the organization of the data.

2. Update – Insertion, modification, and deletion of the actual data.

3. Retrieval – Providing information in a form directly usable or for further processing by other applications. The retrieved data may be made available in a form basically the same as it is stored in the database or in a new form obtained by altering or combining existing data from the database.

4. Administration – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information that has been corrupted by some event such as an unexpected system failure.

- Examples of DBMS's include MySQL, PostgreSQL, Microsoft SQL Server, Oracle Database, and Microsoft Access.

# Database languages : 

- Database languages are special-purpose languages, which allow one or more of the following tasks, sometimes distinguished as sublanguages:


> Data control language (DCL) – controls access to data

> Data definition language (DDL) – defines data types such as creating, altering, or dropping tables and the relationships among them

> Data manipulation language (DML) – performs tasks such as inserting, updating, or deleting data occurrences

> Data query language (DQL) – allows searching for information and computing derived information.
Database languages are specific to a particular data model. 

- SQL combines the roles of data definition, data manipulation, and query in a single language. It was one of the first commercial languages for the relational model, although it departs in some respects from the relational model as described by Codd (for example, the rows and columns of a table can be ordered). 
- SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of the International Organization for Standardization (ISO) in 1987. 
- The standards have been regularly enhanced since and are supported (with varying degrees of conformance) by all mainstream commercial relational DBMSs.

- OQL is an object model language standard (from the Object Data Management Group). It has influenced the design of some of the newer query languages like JDOQL and EJB QL.

- XQuery is a standard XML query language implemented by XML database systems such as MarkLogic and eXist, by relational databases with XML capability such as Oracle and Db2, and also by in-memory XML processors such as Saxon.

- SQL/XML combines XQuery with SQL.

- A database language may also incorporate features like:

    - DBMS-specific configuration and storage engine management
    - Computations to modify query results, like counting, summing, averaging, sorting, grouping, and cross-referencing
    - Constraint enforcement (e.g. in an automotive database, only allowing one engine type per car)
    - Application programming interface version of the query language, for programmer convenience

# Models : 

- A database model is a type of data model that determines the logical structure of a database and fundamentally determines in which manner data can be stored, organized, and manipulated.
- The most popular example of a database model is the relational model (or the SQL approximation of relational), which uses a table-based format.

- Common logical data models for databases include:

    - Navigational databases
        - Hierarchical database model
        - Network model
        - Graph database
    - Relational model
    - Entity–relationship model
        - Enhanced entity–relationship model
    - Object model
    - Document model
    - Entity–attribute–value model
    - Star schema

An object–relational database combines the two related structures.

Physical data models include:

    Inverted index
    Flat file

Other models include:

    Multidimensional model
    Array model
    Multivalue model

Specialized models are optimized for particular types of data:

    XML database
    Semantic model
    Content store
    Event store
    Time series model

# Entity - Relationship Model : 

-  An entity–relationship model (or ER model) describes interrelated things of interest in a specific domain of knowledge.
- A basic ER model is composed of entity types (which classify the things of interest) and specifies relationships that can exist between entities (instances of those entity types).

- an ER model is commonly formed to represent things a business needs to remember in order to perform business processes.
- Consequently, the ER model becomes an abstract data model, that defines a data or information structure which can be implemented in a database, typically a relational database.




# Star schema : 

- the star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts.

- The star schema consists of `one or more fact tables` referencing `any number of dimension tables`. - The star schema is an important special case of the snowflake schema, and is more effective for handling simpler queries.

- The star schema gets its name from the physical model's resemblance to a star shape with a fact table at its center and the dimension tables surrounding it representing the star's points.

## Fact tables : 

- Fact tables record measurements or metrics for a specific event. 
- Fact tables generally consist of numeric values, and foreign keys to dimensional data where descriptive information is kept.
- Fact tables are designed to a low level of uniform detail (referred to as "granularity" or "grain"), meaning facts can record events at a very atomic level.
- This can result in the accumulation of a large number of records in a fact table over time. Fact tables are defined as one of three types:

- Transaction fact tables record facts about a specific event (e.g., sales events)
- Snapshot fact tables record facts at a given point in time (e.g., account details at month end)
- Accumulating snapshot tables record aggregate facts at a given point in time (e.g., total month-to-date sales for a product)
- Fact tables are generally assigned a surrogate key to ensure each row can be uniquely identified. This key is a simple primary key.

## Dimension tables : 

- Dimension tables usually have a relatively small number of records compared to fact tables, but each record may have a very large number of attributes to describe the fact data.
- Dimensions can define a wide variety of characteristics, but some of the most common attributes defined by dimension tables include:

    - Time dimension tables describe time at the lowest level of time granularity for which events are recorded in the star schema
    - Geography dimension tables describe location data, such as country, state, or city
    - Product dimension tables describe products
    - Employee dimension tables describe employees, such as sales people
    - Range dimension tables describe ranges of time, dollar values or other measurable quantities to simplify reporting
- Dimension tables are generally assigned a surrogate primary key, usually a single-column integer data type, mapped to the combination of dimension attributes that form the natural key.

### Primary Key : 

- In the relational model of databases, a primary key is a specific choice of a minimal set of attributes (columns) that `uniquely specify a tuple (row) in a relation (table)`.

- Informally, a primary key is "which attributes identify a record," and in simple cases constitute a single attribute: a unique ID. 
- More formally, a primary key is a choice of candidate key (a minimal superkey); any other candidate key is an alternate key.


- A primary key may consist of real-world observables, in which case it is called a natural key, while an attribute created to function as a key and not used for identification outside the database is called a surrogate key. For example, for a database of people (of a given nationality), time and location of birth could be a natural key.
- National identification number is another example of an attribute that may be used as a natural key.

### Defining primary keys in SQL : 

    ALTER TABLE <table identifier> 
        ADD [ CONSTRAINT <constraint identifier> ] 
        PRIMARY KEY ( <column name> [ {, <column name> }... ] )
        
        
     CREATE TABLE table_name (
       id_col  INT  PRIMARY KEY,
       col2    CHARACTER VARYING(20),
       ...
    )       

### Foreign Key : 

- A foreign key is a set of attributes in a table that refers to the primary key of another table.
- The foreign key links these two tables. Another way to put it: In the context of relational databases, a foreign key is a set of attributes subject to a certain kind of inclusion dependency constraints, specifically a constraint that the tuples consisting of the foreign key attributes in one relation, R, must also exist in some other (not necessarily distinct) relation, S, and furthermore that those attributes must also be a candidate key in S.

        CREATE TABLE child_table (
          col1 INTEGER PRIMARY KEY,
          col2 CHARACTER VARYING(20),
          col3 INTEGER,
          col4 INTEGER,
          FOREIGN KEY(col3, col4) REFERENCES parent_table(col1, col2) ON DELETE CASCADE
        )

    
        CREATE TABLE child_table (
      col1 INTEGER PRIMARY KEY,
      col2 CHARACTER VARYING(20),
      col3 INTEGER,
      col4 INTEGER REFERENCES parent_table(col1) ON DELETE CASCADE
        )