### Some Terminology
**RDBMS** is collection of relations (tables).  
Each relation has set of **attributes** or columns.  
Each **tuple** or row has a value for each attribute.  
Each attribute has a **type** or domain.  

![structure](https://i.imgur.com/8lFNpkT.png)  

**Schema** is the structural description of relations in database.  
**Key** is an attribute (or set of attributes) whose value (or combined value) is unique in each tuple.

### Relational Algebra
Relational Algebra is the underlying maths behind relational database querying. Below we have defined three relations:
- `College(name, state, enrollment)`
- `Student(id, name, gpa, highSchoolSize)`
- `Apply(id, collegeName, major, decision)`

A simplest query is simply the name of relation, for example `Student`. We use operators to filter, slice or combine relations. 

**Select operator:** picks certain rows. Syntax is `Sigma with condition in subscript applied to relation`. Examples:
1. Pick students with GPA > 3.7: $\sigma_{GPA\gt3.7}Students$'
2. Pick students with GPA > 3.7 and High School size < 1000: $\sigma_{GPA\gt 3.7 \wedge highSchoolSize\lt 1000}Students$

**Project operator:** picks certain columns. Syntax is `Pi with comma separated columns as subscript applied to relation`. Examples:
1. Select id and decision of all applications: $\prod_{id, decision} Apply$  

**Duplicates:** unlike SQL, in relational algebra, the duplicates are automatically removed.  

**Cross Product (Cartesian Product):** combines two relations.
1. Cross product Student and Apply relations: $Student \times Apply$. The resultant relation will have 8 attributes and the number od tuples will the equal to the product of number of tuples in Student and Apply relations respectively.
2. Names and GPAs of students with high school size > 1000 who applied to CS major and were rejected. $\prod_{name,\ gpa}(\sigma_{Student.id = Apply.id\ \wedge\ highSchoolSize\gt 1000\ \wedge\ major = 'CS'\ \wedge\ decision = 'R'}(Student \times Apply))$

**Natural Join:** extension of cross product which makes sure to
- enforce equality on all attributes with the same name
- eliminate one copy of duplicate attributes.
Examples:
1. Names and GPAs of students with high school size > 1000 who applied to CS major and were rejected. $\prod_{name,\ gpa}(\sigma_{highSchoolSize\gt 1000\ \wedge\ major = 'CS'\ \wedge\ decision = 'R'}(Student \bowtie Apply))$

### Table Relationships
**One to one relationship:** A pair of tables bears a one-to-one relationship when a single record in the first table is related to only one record in the second table, and a single record in the second table is related to only one record in the first table. Column of one table is linked to candidate table of another table.

![1 to 1](https://i.imgur.com/1wD5r7x.jpg)

**One to Many:** A one-to-many relationship exists between a pair of tables when a single record in the first table can be related to many records in the second table, but a single record in the second table can be related to only one record in the first table.

![1 to many](https://i.imgur.com/o1JCRnH.jpg)

**Many to Many:** A pair of tables bears a many-to-many relationship when a single record in the first table can be related to many records in the second table and a single record in the second table can be related to many records in the first table. You establish this relationship with a *linking table*.

![Many to Many](https://i.imgur.com/WoQk7Ke.jpg)

### Database Normalization
Normalization is the process of organizing data in a database. This is done in order to reduce redundancy and inconsistent dependency. What is an "inconsistent dependency"? While it is int uitive for a user to look in the Customers table for the address of a particular customer, it may not make sense to look there for the salary of the employee who calls on that customer. The employee's salary is related to, or dependent on, the employee and thus should be moved to the Employees table.  

Normalization involves following certain rules. If the first rule is followed it is called first normal form. If first three rules are formed it is called third normal form. Third normal form is considered the highest level necessary for most applications.

**First Normal Form:** rules:
- Eliminate repeating groups in individual tables.
- Create a separate table for each set of related data.
- Identify each set of related data with a primary key.

**Second Normal Form:** rules:
- Create separate tables for sets of values that apply to multiple records.
- Relate these tables with a foreign key.

**Third Normal Form:** rules:
- Eliminate fields that do not depend on the key.

![Normalization Example](https://i.imgur.com/GQnQ4V1.png)

### ACID Properties
ACID = Atomicity + Consistency + Isolation + Durability

**Database Transaction:** A database transaction is a collection of queries. Either all the queries are *committed* or all are *rolled back*.  

**Atomicity:** All queries in a transaction must succeed. If even one fails, rollback.  
**Isolation:** This relates to how database handles concurrent transactions. Some of the problems associated with multiple concurrent transactions are:
- **Dirty Reads:** A dirty read occurs when a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed.  
![DR](https://i.imgur.com/RgXv5u7.png)
- **Non Repeatable Reads:** A non-repeatable read occurs when, during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.  
![NRR](https://i.imgur.com/HebK9jE.png)
- **Phantom Reads:** A phantom read occurs when, in the course of a transaction, new rows are added or removed by another transaction to the records being read.  
![PR](https://i.imgur.com/FLaChm0.png)