<h1 style="color: rgb(241, 90, 36)"><img src="./images/SQLIcon.png?modified=23232323342" width=80px height=80px style="vertical-align: middle;">  What is SQL?</h1>

*SQL (Structured Query Language)* is a programming language that is designed for managing and manipulating data in *relational databases*. It is the standard language used for interacting with and querying *relational database management systems (RDBMS)*. The term "relational" in **relational databases** refers to the way in which the data is stored in a database, data is stored in tables within the database which have some ""relation" between them.

SQL is used to create and modify *database* *schema*, insert, update, delete, and query data stored in a **relational database**. With SQL, you can create tables, define their structure, set relationships between tables, and add or remove data from those tables. You can also filter and sort data based on specific criteria, aggregate data to summarize information, and perform various data manipulation and analysis tasks.

SQL is used in a wide range of applications, including e-commerce websites, online banking systems, healthcare applications, and many others. Its simplicity and power have made it a popular choice among database administrators, data analysts, and developers.


<h2 style="color: rgb(241, 90, 36)"> What is Data?</h2>

First of all, what is data? Working in a data related role you will need to understand what data is:

- Data can be thought of as recorded measurements of something in
the real world. This something is a *unit of observation (sample)*. For example, a list of people’s height is data.
- A person would be a **sample**, data can describe a vast amount of different things
- For example, there is a lot of data we can use to describe a person
- Each **measurement** is called a *variable (feature)*
- Each **observation** (the height of a person) is a *data point*
- Several **data points** together create a *Dataset*

<h2 style="color: rgb(241, 90, 36)"> Relational databases</h2>

Data is useful to obtain valuable information. Data can be processing manually but computers are much quicker. There are two tools for storing, organizing, and processing data in a computer. The first one is **relational databases** and the second, *spreadsheet software*.

<h3 style="color: rgb(241, 90, 36)"> What is a relational database?</h3>

A *relational database* is a type of *database* that organises data into one or more *tables*, which consist of *rows* and *columns*. Each row or *record* represents a sample of data and each column or *field* represents a *unit of observation(feature)*. A **database** is a structured collection of data organised in an efficient manner, allowing the retrieval, storage and management of information. 

What is a different about a **relational database**?:

- They utilize the **relation model of data**: Data corresponding to the same `ID` is used
to link records in different **tables**
- Data is organized as *relations*, containing a *relation key (`ID`)*
- **Relations** are usually implemented as **tables**
- **Tables** are assimilated in common collections in databases called *schemas*
- For example: One table contains (`ID`, `name`, `last_name`, `age`). Another table contains (`ID`, `height`, `smoker?`). And another table contains (`smoker?`, `cancer_development`)
- The software used to manage relational databases is referred to as a *Relational
DataBase Management System (RDBMS)*

<img src="./images/RDBMS_graphics.png">

The `Account` table is **related** to the three other tables by the following column relations:


- **`Customer`** - `cust_id` column
- **`Product`** - `product_cd` column
- **`Account`** - `account_id` column


<h2 style="color: rgb(241, 90, 36)"> A Brief History</h2>

> <p align=center>“Along with Codd’s definition of the relational model, he proposed a language called DSL/Alpha for
> manipulating the data in relational tables. Shortly after Codd’s paper was released, IBM commissioned
> a group to build a prototype based on Codd’s ideas. This group created a simplified version of DSL/
> Alpha that they called SQUARE. Refinements to SQUARE led to a language called SEQUEL, which was,
>finally, shortened to SQL. While SQL began as a language used to manipulate data in relational
>databases, it has evolved […] to be a language for manipulating data across various database
>technologies […]</p>
>
><p align=center>One final note: SQL is not an acronym for anything (although many people will insist it stands for
>“Structured Query Language”). When referring to the language, it is equally acceptable to say the letters
>individually (i.e., S. Q. L.) or to use the word sequel.”</p>
>
><p align=center>Extracted from ‘Learning SQL’ by Alan Beaulieu</p>


<h2 style="color: rgb(241, 90, 36)"> SQL database - Pros</h2>

SQL Databases provide a ton of **advantages** that make it the de facto standard for many applications:
- <b style="color: rgb(241, 90, 36)">Intuitive</b>: Relations that almost anyone can understand
- <b style="color: rgb(241, 90, 36)">Efficient</b>: They use *normalization* so it doesn’t repeat its representation of data requiring less space
- <b style="color: rgb(241, 90, 36)">Declarative</b>: You tell the data what you want, and the system takes care of how to
execute the query
- <b style="color: rgb(241, 90, 36)">Robust</b>: Most databases have *ACID* compliance


**ACID (*Atomicity*, *Consistency*, *Isolation*, and *Durability*)**, is a set of properties then ensures database reliability and consistency:

  - <b style="color: rgb(241, 90, 36)">*Atomicity*</b>: Each statement in an SQL transaction is treated as a single unit.
  Either the whole statement completes successful or the database state is rolled back to its previous state. This helps to prevent data loss or corruption.
  - <b style="color: rgb(241, 90, 36)">*Consistency*</b>: All transactions will not violate the constraints placed on the database. If the database were to enter a corrupted state the current process will be aborted and the database will be rolled back to its previous state.
  - <b style="color: rgb(241, 90, 36)">*Isolation*</b>: Even though multiple users appear to be reading and writing to the database at the same time each transaction is treated separately. There is a globally preserved queue and the result of one transaction will finish before a new one begins.  
  - <b style="color: rgb(241, 90, 36)">*Durability*</b>: Ensuring that each change made to data by successfully executed statements are saved, even if the system fails.

<h2 style="color: rgb(241, 90, 36)"> SQL database - Cons</h2>

However, we might find some **downsides** when working with SQL databases:
- <b style="color: rgb(241, 90, 36)">Lower specificity</b>: Sometimes, SQL’s functionality is limited to what it can be
programmed to do. This is not common, since its management systems get updated.
- <b style="color: rgb(241, 90, 36)">Limited Scalability</b>: Due to its strict schema requirements, this might be an impediment for scaling data
- <b style="color: rgb(241, 90, 36)">Object-relation mismatch impedance</b>: Sometimes objects have attributes with *many to-many relationships*. For example, a customer may own multiple products, but each
product may have multiple objects.

<h2 style="color: rgb(241, 90, 36)"> Entity Relationship Diagrams</h2>

<b style="color: rgb(241, 90, 36)">*Entity Relationship Diagram (ERD)*</b> is a type of structural diagram for use in database design. An ERD
contains different symbols and connectors that visualize two pieces of important information: the *major
entities* within the system scope, and the *inter-relationships* among these **entities**.
- <b style="color: rgb(241, 90, 36)">*Primary Keys*</b>: uniquely defines a record in a database
table.
  - There must not be two (or more) records that share the
same value for the **primary key** attribute.
  - For example `ID` (in the diagram below)
- <b style="color: rgb(241, 90, 36)">*Foreign Keys*</b>: references a **primary key** in another table.
It is used to identify the relationships between entities. They don’t need to be unique, for example (`ShipmentID`, `CourierID`)

### Cardinality

*Cardinality*, the possible number of occurrences in **ONE** **entity** which is associated with the number of
occurrences in another. For example, **ONE** `team` has **MANY** `player`'s. Then `team` and `player` are inter-connected with a *one-to-many relationship*.

<img src="./images/entity_relationship_diagrams.png?modified=33342">

- Here **ONE** shipment has **many** orders and **ONE** courier has **MANY** orders. So the `Courier` and `Shipment` tables have a *ONE to MANY relationship* with the `Order` table. 

The connection between the tables in an **ERD diagram** can be used to represent many different **cardinality** relationships between entities. 

<h3 style="color: rgb(241, 90, 36)"> ONE to ONE relationship</h3>

<img src="./images/one_to_one.png?modified=33122342">

Here **ONE** `customer` has **ONE** `address` the **|** represents the cardinality of one. 

<img src="./images/one_to_possibly_none.png?modified=32233242">

Optionally, there is the case where it could be a *one to possibly none relationship*. 
A customer might or might not have created a website login. So there is a **ONE** to possibly **ONE** or **NONE** relationship. 
The **circle** in the connection represents that a website login just might not exist for that user and the **|** again represents that there could be **ONE**.

### ONE to MANY relationship

<img src="./images/one_to_many.png?modified=3333242">

With the *ONE to MANY relationship* again the one is represented by the **|** symbol and the three pronged symbol, represented by a *crow-foot* symbol, means many.
**ONE** to **MANY** means, **ONE** record in **ONE** table can be associated with many records of another table.
In this case **ONE** student can be associated with many different courses. 

<img src="./images/one_to_possibly_many.png?modified=3333242">

Equally there is the possibility that a student could be associated with no courses at all. Again we can add a **circle** to the connection representing there could be no records associated. 

These decisions will be up to the designer of the **database schema** and what makes sense in the particular use case. 
In this case, do you want to enforce that all students must be enrolled before picking a course? 
Is the student record created when a student signs up to a course, or can the student enrol beforehand?
These will be the type of decisions you will have to make when designing your database. 

### MANY to MANY relationship

<img src="./images/many_to_many_v2.png?modified=3333242">

The *MANY to MANY relationship* occurs when many records in **ONE** table can be represented by **MANY** records in another table. 
When we want to establish this relationship this is done with the use of a *Junction table* otherwise known as a *Bridge or Association table*. 

This is a special type of table which contains **foreign keys** which link to the **primary keys** in the tables being joined. Each row of the **junction table** represents the relationship between records in one of the tables being joined to the other. 

The **junction table** is needed due to the uniqueness of the **primary keys** in the tables on which the relationship is based on.
They help to establish the **many to many** relationship by creating **foreign keys** linking to the **primary key** columns of both tables to create the relation. 

Let's take the example above. In this case students can sign up for **MANY** courses and a course can contain **MANY** students. 
We can call this event of signing up an **enrolment**, which requires an `ID` from the student table and an `ID` from the `course` table. So the `enrolment` table will act as the **junction table** saving the student `ID`'s and course `ID`'s. 
Notice we need the **junction table**, imagine if it wasn't there to link the tables. 
Then for each course `ID` there would be multiple student `ID` entries for the same course. 
This would violate the **primary key constraint** in the `course` table. 

### Other relations

There are other ways to represent relationships between the tables. Here is a summary of the different relationships:

<img src="./images/summary_relationship.png?modified=3333242">

<h3 style="color: rgb(241, 90, 36)"> Use case</h3>

This example **ERD diagram** is a rather complicated example generated from the **pagila** database.
See if you can spot the different types of connections:

<img src="./images/use_case.png?modified=3333242">


## Key Takeaways

- SQL is the language that allows the manipulation and management of data in a **relational database** system
- Databases have any pros and few cons which makes them one of the most common structures to store data today
- The `ACID` properties of a database ensure that the database data is always consistent and reliable
- The cardinality is the possible occurrences of records in a table based on the occurrences of the records in another table
- `ERD` diagrams are the standard used to model and view the relationships in a **relational database**s
- Database **Primary** and **Foreign** keys help to ensure consistency between relationships in databases

