# IBM Data Science | Dabases and SQL - SQL basics

Created by: Sangwook Cheon

Date: June 3, 2019 ~ June 17, 2019

Supplementary notes for the Database and SQL course provided by IBM, in addition to Jupyter Notebooks. These notes are created while watching classroom videos to catch anything that might not be included in the notebooks.

### Table of Contents:

- Introduction to SQL and Databases
- Basic SQL Commands
  - Create Table
  - Select
  - Count, Distinct, Limit
  - Insert
  - Update, Delete
- Suplementary Relational Database Concepts


## Introduction to SQL and Databases

**SQL** is a language used for relational databases to query data.

- Relational databases are data stored in tabular form, with columns and rows.
- RDBMS (Relational Database Management System) is a software tool for storing, organizing, and managing data. Not all types of data are in tabular form.

**Basic SQL Commands**: Create, Insert, Select, Update, Delete

**Cloud Database**: A a database service built and accessed through a Cloud platform. It serves many of the same functions as traditional databases with the added flexibility of **Cloud computing**. Users install software on a Cloud infrastructure to implement the database. Advantages include:

- Ease of use, users can access Cloud databases from virtually anywhere using a vendors API or web interface. 
- Scalability, Cloud databases can expand their storage capacities during runtime to accommodate changing needs, organizations only pay for what they use. 
- Disaster recovery, in the event of a natural disaster equipment failure or power outage data is kept secure through backups on remote servers.

Can use services like IBM db2 or AWS (I would personally go for AWS).



## Basic SQL Commands

DDL (Data Definition Language) statements: Define, change or drop data.

DML (Data Manipulation Language) statements: Read and motify data.

### Create Table Statement

General Syntax:

```sql
create table TABLENAME (
    COLUMN1 datatype,
    COLUMN2 datatype,
    COLUMN3, datatype,
        ...
    ) ;
```

To walk through an example:

```sql
drop table COUNTRY;
create table COUNTRY (
    ID int NOT NULL,
    CCODE char(2),
    NAME varchar(60),
    PRIMARY KEY (ID)
    );
```

In the above example, we create a table called COUNTRY. We first drop the table with the same name if it already exists. The ID column has the "NOT NULL" constraint added after the datatype - meaning that it cannot contain a NULL or an empty value. We are also using ID as a "PRIMARY KEY," where there cannot be null values by default. A Primary Key is a unique id entifier in a table, and using Primary Keys can help speed up queries significantly.

CCODE is a two letter country code column, and NAME is a variable length country-name column.  



### Select Statement

Use Select statement to see (read) the data. This statement is a DML statement aimed at reading data.

Select Statement: Query

Result from the query: Result set/table

**# RDBMS** supports all the inequality operators used in Python (same syntax), with one difference:

- Not equal is noted by ```<>```, not ```!=```.

General Syntax for Select Statemnet:

```sql
select COLUMN1, COLUMN2, ... from TABLE1 ;
```

To retrieve a list of all country names and their IDs from the COUNTRY table:

```sql
select ID, NAME from COUNTRY ;
```

To retrieve all columns from the COUNTRY table we could use "*" instead of specifying individual column names:

```sql
select * from COUNTRY ;
```

The ```where``` clause can be added to the query to filter results or get specific rows of data. To retrieve data for all rows in the COUNTRY table where the ID is less than 5:

```sql
select * from COUNTRY where ID < 5 ;
```

In case of character based columns, the values of the predicates in the where clause need to be enclosed in **single quotes**. To retrieve the data for the country with country code "CA":

```sql
select * from COUNTRY where CCODE = 'CA'; 
```



### Count, Distinct, Limit

**Count:** A built in function that retrieves the number of rows matching the query criteria.

Number of rows in a table:

```sql
select COUNT(*) from tablename
```

Rows in the table where a column fits a certain criterion (can be number, string, etc)

```sql
select COUNT(column_name) from table_name where column_name = criterion
```

**Distinct**: is used to remove duplicates in a result set.

Retrieve unique values in column:

List of unique items in a column that fits a criterion

```sql
select DISTINCT column_name from table_name where column_name = criterion

select DISTINCT COUNTRY from MEDALS where MEDAL_TYPE = 'gold'
```

> Column_name doesn't have to be the same. Different column can be referenced to find unique values in another column

**Limit**: is used to restrict the number of rows retrieved from the database.

Retrieve only 10 rows from a table:

```sql
select * from table_name LIMIT 10
```

ex) Retrieve 5 rows in the MEDALS table for the year 2018

```sql
select * from MEDALS where YEAR = 2018 LIMIT 5
```



### Insert Statement

Inserting rows after a table is created. This is a DML statement for manipulating a dataset.

```sql
INSERT INTO table_name (column_name1, column_name2)
VALUES (value1, value2), (value1, value2);
```

Can add one row or multiple rows at a time.



### Update Statement, Delete Statement.

**Update**: Table can be altered using the ```UPDATE``` Statement. This is a DML statement.

```sql
UPDATE table_name
SET column1 = value1
		columne2 = value2
WHERE ID = value
```

**Delete**: Remove one or more rows.

```sql
DELETE from table_name WHERE column IN condition
```





## Suplementary Relational Database Concepts

- Building blocks of a relationship:
  - Entities sets - represented by **rectangles**
  - Relationship sets - represented by **diamond and connecting lines**
  - Crows foot notations - represented by **less than, greater than, vertical symbol**



#### Entity Relationship Diagram (ERD)

Entity is drawn as a rectangle.

Attributes are drawn as ovals, and they are certain properties of the entity, such as title, author, year related to the entity 'book.' Attributes are connected to exactly one entity.

Crows foot notation is used to represent relationships between entities.

![image-20190617121044151](https://github.com/SangwookCheon/learning-python-data-science/blob/master/Images%26Resources/Screen%20Shot%202019-06-17%20at%2012.10.39.png)

![image-20190617121117289](https://github.com/SangwookCheon/learning-python-data-science/blob/master/Images%26Resources/Screen%20Shot%202019-06-17%20at%2012.11.11.png)

![image-20190617121151356](https://github.com/SangwookCheon/learning-python-data-science/blob/master/Images%26Resources/Screen%20Shot%202019-06-17%20at%2012.11.45.png)



#### Mapping Entities to Tables

Entities become tables, and attributes of those become columns.

#### Relational Model Concepts

**A Relation** is made up of two parts: Relational Schema and Relational Instance.

- Relational Schema - specifies the name of a relation, and type of each column (attributes)
- Relational Instance: A table made up of attributes or columns
  - Column = Attribute = Field
  - Row = tuple
- Degree: Number of attributes in a relation.
- Cardinality: Number of rows (or tuples)