# Relational Databases

## What is a Database?

A database is a collection of data (information) stored electronically in a structured manner. The way the data is organized determines the type of database.

## Types of Databases

There are 4 common types of databases:
1. Hierarchical
2. Network
3. Relational
4. NoSQL

### Hierarchical Databases

Just as in any hierarchy, this database categorizes data in ranks or levels, and the ranks are expressed using links. One perspective is to think about the data as being organized in parent-child relationship, and the entire database would resemble a tree.

![Hierarchical Database](https://i.imgur.com/23nTU9D.png)

There are 3 components of hierarchical databases:
- attributes
- records
- links

It consists of a collection of records connected to each other through **links**. Each **record** is a collection of **fields** or **attributes**, each of which contains only one data value. A **child** record can only be linked to only one **parent** record.

### Network Databases

A network database is a hierarchical database, but with a major difference. The child records are given the freedom to associate with multiple parent records. As a result, a network or net of database files linked with multiple threads is observed.

![Network Databases](https://i.imgur.com/MhP6vhP.png)

There are 3 components of network databases:
- attributes
- records
- links

A network database consists of a collection of records connected to each other through **links**. Each **record** is a collection of **fields** or **attributes**, each of which contains only one data value. A **child** record can be linked to *multiple* **parent** records.

### Relational Databases

Relational databases are the most mature of all databases. In this database, every record can be linked to every other record.

There are 3 components of network databases:
- attributes
- entity
- relationships
- cardinality

### NoSQL Databases

A NoSQL is a non-relational database that provides a mechanism for the storage and retrieval of data. This database is inherently simple in design and is thereby simpler to scale for clusters of machines, whilst allowing for finer control over availability. The data structures used by NoSQL databases are very different from tables in relational databases, which makes some operations faster in NoSQL.

## Database Management Systems

A DBMS is a computereized data-keeping system. Users of the system of given facilities to perform several kinds of operations on such a system for either manipulation of the data in the database or the management of the database structure itself. DBMSs are categorized according to their database structures or types:
- hierarchical
- network
- relational

Conversely, RDBMSs refer to Relational Database Management Systems and are defined in their own category.

## Introduction to RDBMS

Web apps can be split into two major components: a front-end that displays and collects information, and a back-end for storing the information. It is important to know how to properly design the back-end database to store that information.

A database stores data in an organised way so that it can be searched and retrieved later. It should contain one or more tables. Much like a spreadsheet, a table uses columns and rows to store and display data. Data can be Created, Retrieved, Updated, and Deleted from a table. As such, there 4 functions are generally known by its acronym CRUD. 

![Database Tale](https://cdn.tutsplus.com/net/authors/lalith-polepeddi/relational-databases-for-dummies-fig1.png)

A Relational Database is a type of database that organizes data into tables, linking them based on defined relationships. These relationships enable its user to retrieve and combine data from one or more tables with a single query.

### Get Some Data

In the article, the example used is a sample of tweets related to the hashtag `databases`. The data was retrieved in a tabulated format with headers defining each column and data points stored per row. While easy to read, there is repetitive data that makes it difficult to efficiently store and retrieve the relevant data.

There are two types of repetition:
1. Duplicated rows
2. Redundant columns

Repetitions such as these can be solved using separate tables.

full_name | username | text | created_at | following_username
---|---|---|---|---
"Boris Hadjur" | "_DreamLead" | "What do you think about #emailing #campaigns #traffic in #USA? Is it a good market nowadays? do you have #databases?" | "Tue, 12 Feb 2013 08:43:09 +0000" | "Scootmedia", "MetiersInternet"
"Gunnar Svalander" | "GunnarSvalander" | "Bill Gates Talks Databases, Free Software on Reddit https://t.co/ShX4hZlA #billgates #databases"| "Tue, 12 Feb 2013 07:31:06 +0000" | "klout", "zillow"
"GE Software" | "GEsoftware" | "RT @KirkDBorne: Readings in #Databases: excellent reading list, many categories: http://t.co/S6RBUNxq  via @rxin Fascinating." | "Tue, 12 Feb 2013 07:30:24 +0000" | "DayJobDoc", "byosko"
"Adrian Burch" | "adrianburch" | "RT @tisakovich: @NimbusData at the @Barclays Big Data conference in San Francisco today, talking #virtualization, #databases, and #flash memory." | "Tue, 12 Feb 2013 06:58:22 +0000" | "CindyCrawford", "Arjantim"
"Andy Ryder" | "AndyRyder5" | "http://t.co/D3KOJIvF article about Madden 2013 using AI to prodict the super bowl #databases #bus311" | "Tue, 12 Feb 2013 05:29:41 +0000" | "MichaelDell", "Yahoo"
"Andy Ryder" | "AndyRyder5" | "http://t.co/rBhBXjma an article about privacy settings and facebook #databases #bus311" | "Tue, 12 Feb 2013 05:24:17 +0000" | "MichaelDell", "Yahoo"
"Brett Englebert" | "Brett_Englebert" | "#BUS311 University of Minnesota's NCFPD is creating #databases to prevent "food fraud." http://t.co/0LsAbKqJ" | "Tue, 12 Feb 2013 01:49:19 +0000" | "RealSkipBayless", "stephenasmith"
Brett Englebert | "Brett_Englebert" | "#BUS311 companies might be protecting their production #databases, but what about their backup files? http://t.co/okJjV3Bm" | "Tue, 12 Feb 2013 01:31:52 +0000" | "RealSkipBayless", "stephenasmith"
"Nimbus Data Systems" | "NimbusData" | "@NimbusData CEO @tisakovich @BarclaysOnline Big Data conference in San Francisco today, talking #virtualization, #databases,& #flash memory" | "Mon, 11 Feb 2013 23:15:05 +0000" | "dellock6", "rohitkilam"
"SSWUG.ORG" | "SSWUGorg" | "Don't forget to sign up for our FREE expo this Friday: #Databases, #BI, and #Sharepoint: What You Need to Know! http://t.co/Ijrqrz29" | "Mon, 11 Feb 2013 22:15:37 +0000" | "drsql", "steam_games"

### Remove Repetitive Data Across Columns

In this example, the columns `username` and `following_username` are repetitive. To solve this, another table was created to define the established relationships between each user and their followers such that the `following_username` is no longer required in the original table, which allows that table to be truncated.

This step of removing repetitive data across columns is called the **first normal form** (1NF)

![1NF](https://cdn.tutsplus.com/net/authors/lalith-polepeddi/relational-databases-for-dummies-fig2.png)

### Remove Repetitive Data Across Rows

After fixing the column repetitions, information replicated through the rows will need to be removed. In the remaining table, there are 2 users with multiple tweets. Thus, it is best to separate the tweet information into its own table and the users to another.

This step of removing repetitive data along rows is called the **second normal form** (2NF)

![2NF](https://cdn.tutsplus.com/net/authors/lalith-polepeddi/relational-databases-for-dummies-fig3.png)

> **full_name**: The user's full name \
> **username**: The Twitter handle \
> **text**: The tweet itself \
> **created_at**: The timestamp of the tweet \
> **following_username**: A list of people this user follows, separated by commas. For briefness, I limited the list length to two

### Linking Tables with Keys

THe final form of the organised data is now split between 3 new tables:
- following
- treets
- users

In order to retrieve data, meaningful links must be defined between tables. The way to draw links between tables is to give each row in a table a unique identifier, termed a primary key, and then reference that primary key in the other tables where necessary.

This step was completed between the `users` and `tweets` table by consequence of how they were split. They formed a natural key. The downside of using natural keys depends on whether or not such keys are subject to changes. If so, it may be desirable to create another column of unique IDs.

The common way to generate IDs is to add a numerical auto-incrementing ID column and use that as the primary key. The final form os this database is below.

![Final Tables](https://cdn.tutsplus.com/net/authors/lalith-polepeddi/relational-databases-for-dummies-fig4.png)

## Relational Database Management Systems

The above example is the thought process behind designing a relational database, but how is such a database implemented? RDBMS are software that allows the creation and interfacing of relational databases. Several commercial and open source vendors are available:
- Commercial
    - Oracle Database
    - IBM DB2
    - Microsoft SQL Server
- Open Source
    - MySQL
    - SQLite
    - PostgreSQL



### Structured Query Language (SQL)

Once implemented, the universal language to interact with such databases is called SQL.

SQL is similar to regular English sentences. There are small variations in SQL between each RDBMS vendor, termed SQL dialects, but the differences are not dramatic enough that the variations will present problems to those fluent in one or another.

## Entity Relationship Diagram (ERD)

An ERD can help understand, explore and document a database. It can also help trouleshoot logic or deployment problems., spot inefficiencies and help improve processes. ERDs can also be used to design and model new databases, giving engineers or anyone who will need to work with the database an opportunity to identify any logic or design flaw before they're implemented in production.

In general, as a data professional, an ERD can be used for the following:
- document an existing database structure
- debug, troubleshoot, and analyze
- design a new database
- gather design requirements

To better understand how the tables in a relational database are interconnected, we can use an Entity Relationship Daigram. This kind of diagram displays each table as a box. It links these boxes together indicating the kind of relationship each has with each other.

- `ENTITY`: An entity is an object or concept about which you want to store information. Each table is an entity.
- `RELATIONSHIPS`: This shows how two entities share information in the database.
- `ATTRIBUTES`: A key attribute is the unique, distinguishing characteristic of the entity. For example, an employee's social security number might be the employee's key attribute.
- `CARDINALITY`: This specifies the numerical attribute of the relationship between entities. It can be one-to-one, many-to-one, or many-to-many.
> The cardinality between entities can be described by the [Crow's Foot Notation](https://www.vertabelo.com/blog/crow-s-foot-notation/).

# SQL Queries

Structured Query Language is a database programming language that allows a user to create relational databases within a specific client and efficiently search through the database for specific information. Such queries can be optimized through planning techniques, but such methods differ in functionality and syntax for each client.

Example of joining 2 entities in a relational database to query for a specific attribute value:
```sql
SELECT name_attr, number_attr FROM emtity)_1
    JOIN entity_2
    ON entity_1.id = entity_2.id
WHERE attribute = 'value'
ORDER BY name_attr DESC;
```

## Changing Rows with UPDATE and DELETE

A user can update and delete rows using simple SQL commands. However, be careful because these are destructive functions and can cause problems if improperly done. These are generally unsafe to do.

Example of updating and deleting values in a database:
```sql
UPDATE entity_1 SET content = 'updated'
    WHERE row_id = 1;

DELETE FROM entity_2 WHERE row_id=2
```

## Changing the Entity

Entities can be altered to take on additional attributes after the entity has been created. Changing a table through the `CREATE TABLE` method after the fact can be reckless as it could reset the data for the entire entity. It is a much safer practice to use `ALTER TABLE` to alter entities in this way.

Example of updating and deleting values in a database:
```sql
ALTER TABLE entity_1 ADD another_attr REAL default 10;
DROP TABLE entity_3;
```