
# Table of Content

1. [Table of Content](#Table-of-Content)
2. [SQL](#SQL)
3. [Terminology](#Terminology)
4. [SQL Basics](#SQL-Basics)
5. [SQL commands](#SQL-commands)
    1. [SHOW](#SHOW)
    2. [SELECT](#SELECT)
    3. [DISTINCT](#DISTINCT)
    4. [WHERE](#WHERE)

10. [Exercises](#Exercises)
    1. [Exercise 1 - Simple SELECT](#Exercise-1---Simple-SELECT)
    2. [Exercise 2 - Different values](#Exercise-2---Different-values)
    3. [Exercise 3 - Filtering Selections](#Exercise-3---Filtering-Selections)

# SQL

SQL = Structured Query Language

SQL allows to access and manipulate databases. SQL is an ANSI/ISO standard, but there are different versions. The major commands are supported by all.

We use MySWL workbench as GUI, but this is only one of many RBDMS (Relational Database Management System). There are different server types, I have to have the correct RBDMS to match the type of my server.

When working with workbench we are always connected to one server. One server can hold several databases, the databases have to match the server type. If we want to work on several databases they all have to be on the same server. It is possible to connect to different servers, just not at the same time.

I can write scripts in SQL that I can load into workbench to run. If another database has the same server type I can use these scripts (with tweaking to allow for the other database structure)

# Terminology

- field is the column definition, i.e. the column name, data type, rules etc. (We will call this column header and stick to field when meaning a single row/column combination)
- record is a row (a horizontal entity in a table)
- a column is a vertical entity in a table
- PK as primary key and FK or MUL(?) as foreign key

# SQL Basics

SQL is completely case insensitive! It is convention to use all capital letters for SQL statements, but it is not a requirement for the system. 

Statements are closed with a`;`. Everything before the `;`, even in several lines, is recognized as one statement. Indentation and linebreaks don't matter.

We add single line comments with `-- comment` or multiline comments with `/* comment */`

# SQL commands 

## SHOW

Very handy especially for databases without a GUI (e.g. MariaDB), as it displays the content in the terminal.

- `SHOW DATABASES; ` shows all databses in the system
- `SHOW TABLES;` shows all tables in the active database
- `SHOW TABLES dbs;` shows all tables in the database called dbs
- `SHOW COLUMNS FROM table_name;` shows all columns of the table called table_name

## SELECT

SELECT is used to filter data from a database. The returned data is stored in a temporary result tabel = result set. SELECT works row by row

Select requires

- the table we want to get data from
- the columns we want included in the result
- more statements, e.g. conditions, can be added to SELECT
- Multiple SELECT statements can be combined (sub queries) to make use of the relation between tables

General Syntax:

    SELECT column1, column2, ..., columnLast
    FROM table_name;

to look at all columns use `*` to stand for all columns:

    SELECT * FROM table_name;

Note: we can't use the wildcard together with text, SELECT Protein* FROM proteins *doesn't* work

Order of additions to SELECT

- SELECT
- DISTINCT / COUNT / MEAN etc. 
- FROM
- WHERE
- GROUP BY
- HAVING
- ORDER BY
- LIMIT

## DISTINCT

To show only unique values in a column use DISTINCT

Syntax:

    SELECT DISTINCT column1 FROM table1;

Also works for distinc combinations

    SELECT DISTINCT column1, column2 FROM table1;

## WHERE

WHERE is used to filter records/rows according to specific conditions. Multiple conditions can be combined with AND/OR/NOT

Syntax:

    SELECT column1, column2, ...
    FROM table_name
    WHERE condition;

Operators: 

- Equal `=`
- Greater than `>`
- Less than `<`
- greater than or equal `>=`
- less than or equal `<=`
- not equal `<>` (in some SQL `!=`)

These operators work for numbers *and strings*, in which case they're sorted alphabetically, in a case insensitive manner. Comparison goes letter by letter until a mismatch is found. Text has to be in `' '`, most systems also accept `" "`. Numbers can be in quotation marks but don't have to be

Condition Syntax:

    column_name operator value

Example:

    SELECT * FROM secondary_structure
    WHERE Structure_Name = 'Helix';

Note that the column we filter against in the WHERE statement doesn't have to be displayed (selected with SELECT), but does have to be part of the table we select FROM

# Exercises

## Exercise 1 - Simple SELECT

Write SELECT statements that fulfill the following criteria from the following *tables*:

1. The Name and taxonomic name of all *organisms*
2. All information about *kingdoms*
3. The resolution, the R-free value and the Clashscore for *structural_data*
4. All information about *secondary_structure*

Optional:

5. Charge, Mass and simplified formula for small molecules
6. Weight, Melting Point and IUPAC-Name for non-canonical amino acids

       -- Exercise 1
       SELECT Organism_name, taxonomy FROM organisms;
       SELECT * FROM kingdoms;
       SELECT Resolution, R_Free, Clashscore FROM structural_data;
       SELECT * FROM secondary_structure;
       SELECT Charge, Mass, SMILES FROM atom_information;
       SELECT Molecular_Weight, Melting_Point, IUPAC_Name FROM modification_data;

## Exercise 2 - Different values

Write SELECT statements that fulfill the following criteria:

- For which Kingdom_IDs do you have organisms in the *database*?
- For which Structure_IDs do you have data in *secondary_protein*?
- For which Method_IDs do you have *structures* in your database?
- For which Protein_IDs do you have data in *modifications_proteins*?

Optional:

- For which proteins can you find structures in the database?
- For which hetero atoms do you have IUPAC-Names in the database?
- What are the different maximal repeats you can find for domains in the database?

      -- Exercise 2
      SELECT DISTINCT Kingdom_ID FROM organisms;
      SELECT DISTINCT Structure_ID FROM secondary_protein;
      SELECT DISTINCT Method_ID FROM structures;
      SELECT DISTINCT Protein_ID FROM modifications_proteins;

      -- Optional Exercises
      SELECT DISTINCT Protein_ID FROM structures;
      SELECT DISTINCT Hetero_ID FROM IUPAC_names;
      SELECT DISTINCT Max_Repeats FROM domain_data;

## Exercise 3 - Filtering Selections

Write SELECT statements that fulfill the following criteria:

- All *proteins* with more than 1000 amino acids
- All *structures* with a Source_ID of 1
- All *structural_data* with a Resolution smaller than 2.0
- All *organisms* that have a Kingdom_ID of 1
- All *proteins* with a mass smaller than 25000
- All *proteins* named ‘Cytochrome c oxidase subunit 1’

Optional:

- Atoms with a positive charge
- Atoms with a mass between 50 and 150

        -- Exercise 3
        SELECT * FROM proteins
        WHERE Protein_Length > 1000;

        SELECT * FROM structures
        WHERE Source_ID = 1;

        SELECT * FROM structural_data
        WHERE Resolution < 2.0;

        SELECT * FROM organisms
        WHERE Kingdom_ID = 1;

        SELECT * FROM proteins
        WHERE Mass < 25000;

        SELECT * FROM proteins
        WHERE Protein_Name = 'Cytochrome c oxidase subunit 1';

        -- Optional Exercises
        SELECT * FROM atom_information
        WHERE Charge > 0;

        SELECT * FROM atom_information
        WHERE Mass >= 50 AND Mass =< 150;

## Exercise 4 - Logical Operators

Write SELECT statements that fulfill the following criteria:

- All *proteins* with more than 1000 amino acids and a Mass greater than 100000
- All *structural_data* with a Resolution less than 2.0 or an R-free value smaller than 0.25
- All *proteins* where the Organism_ID is not 4
- All *organisms* with a Kingdom_ID of 1 or 2
- All *proteins* with an Organism_ID of 3 or 28

Optional:

- Atoms with a positive charge, a mass higher than 100 and a CHEBI-ID over 20000
- Structures generated with the Methods 1, 2 or 3
- Modifications with a mass over 125, more than 4 Hydrogenbond-donors and -acceptors

```
-- Exercise 4
SELECT * FROM proteins
WHERE Protein_Length > 1000 AND Mass > 100000;

SELECT * FROM structural_data
WHERE Resolution < 2.0 AND R_Free < 0.25;

SELECT * FROM proteins
WHERE NOT Organism_ID = 4;

SELECT * FROM organisms
WHERE Kingdom_ID = 1 OR Kingdom_ID = 2;

SELECT * FROM proteins
WHERE Organism_ID = 3 OR Organism_ID = 28;


-- Optional Exercises
SELECT * FROM atom_information
WHERE Charge > 0 AND Mass > 100 AND CHEBI_ID > 20000;

SELECT * FROM structures
WHERE Method_ID = 1 OR Method_ID = 2 OR Method_ID = 3;

SELECT * FROM modification_data
WHERE Molecular_Weight > 125 AND Hydrogenbond_acceptors > 4 AND Hydrogenbond_donors > 4;
```


