# Structured Query Language: SQL

## SQL Select Statement

Viewing table data is a key need of interacting with a database.  
This is best accomplished using the `SELECT` command.

The format of a selection is as follows:

```SQL
SELECT <tuple_projection_definition>
FROM <table_expressions>
WHERE <tuple_retrictions>
```

**NOTE:** The result of a SELECT statement is a _table_, aka a relation. 
The result table will have structured columns, inherited from the source table definition or the functions/transforms.
The rows of the result table will all be structured the same, as is customary with any defined table in the relational model.
For instance, if row one has 10 positional columns, all subsequent rows with have the same number of positional columns.



The parts are defined as follows:
 * **tuple_projection_definition** : a list of columns, functions over columns, or other column expressions.
 * **table_expressions** : a list of tables and table expressions that provide source data for selected columns.
 * **tuple_retrictions**  : a boolean expression that evaluates to either TRUE or FALSE for every row composited from the _table_expressions_. Only the TRUE evaluative rows are returned in the result table.
 
Additionally, SQL supports the aggregation of data:

```SQL
SELECT <tuple_projection_definition, <aggregate_projection> >
FROM <table_expressions>
WHERE <tuple_retrictions>
GROUP BY <tuple_projection_definition>
HAVING <restriction_on_aggregate_projection>
```


## Example : Survey Database

Survey database from opensource sql lessons: [External Link](http://swcarpentry.github.io/sql-novice-survey/)

<h1>
<div class="row">
  <div class="col-md-6">

   <p>  
   <strong>Person</strong>: people who took readings.</p>

   <table>
      <thead>
        <tr>
          <th>id</th>
          <th>personal</th>
          <th>family</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>dyer</td>
          <td>William</td>
          <td>Dyer</td>
        </tr>
        <tr>
          <td>pb</td>
          <td>Frank</td>
          <td>Pabodie</td>
        </tr>
        <tr>
          <td>lake</td>
          <td>Anderson</td>
          <td>Lake</td>
        </tr>
        <tr>
          <td>roe</td>
          <td>Valentina</td>
          <td>Roerich</td>
        </tr>
        <tr>
          <td>danforth</td>
          <td>Frank</td>
          <td>Danforth</td>
        </tr>
      </tbody>
    </table>

    
   <strong>Site</strong>: locations where readings were taken.</p>

   <table>
      <thead>
        <tr>
          <th>name</th>
          <th>lat</th>
          <th>long</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>DR-1</td>
          <td>-49.85</td>
          <td>-128.57</td>
        </tr>
        <tr>
          <td>DR-3</td>
          <td>-47.15</td>
          <td>-126.72</td>
        </tr>
        <tr>
          <td>MSK-4</td>
          <td>-48.87</td>
          <td>-123.4</td>
        </tr>
      </tbody>
    </table>

   <p><strong>Visited</strong>: when readings were taken at specific sites.</p>

   <table>
      <thead>
        <tr>
          <th>id</th>
          <th>site</th>
          <th>dated</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>619</td>
          <td>DR-1</td>
          <td>1927-02-08</td>
        </tr>
        <tr>
          <td>622</td>
          <td>DR-1</td>
          <td>1927-02-10</td>
        </tr>
        <tr>
          <td>734</td>
          <td>DR-3</td>
          <td>1930-01-07</td>
        </tr>
        <tr>
          <td>735</td>
          <td>DR-3</td>
          <td>1930-01-12</td>
        </tr>
        <tr>
          <td>751</td>
          <td>DR-3</td>
          <td>1930-02-26</td>
        </tr>
        <tr>
          <td>752</td>
          <td>DR-3</td>
          <td>-null-</td>
        </tr>
        <tr>
          <td>837</td>
          <td>MSK-4</td>
          <td>1932-01-14</td>
        </tr>
        <tr>
          <td>844</td>
          <td>DR-1</td>
          <td>1932-03-22</td>
        </tr>
      </tbody>
    </table>

  </div>
  <div class="col-md-6">

   <p><strong>Survey</strong>: the actual readings.</p>

   <table>
      <thead>
        <tr>
          <th>taken</th>
          <th>person</th>
          <th>quant</th>
          <th>reading</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>619</td>
          <td>dyer</td>
          <td>rad</td>
          <td>9.82</td>
        </tr>
        <tr>
          <td>619</td>
          <td>dyer</td>
          <td>sal</td>
          <td>0.13</td>
        </tr>
        <tr>
          <td>622</td>
          <td>dyer</td>
          <td>rad</td>
          <td>7.8</td>
        </tr>
        <tr>
          <td>622</td>
          <td>dyer</td>
          <td>sal</td>
          <td>0.09</td>
        </tr>
        <tr>
          <td>734</td>
          <td>pb</td>
          <td>rad</td>
          <td>8.41</td>
        </tr>
        <tr>
          <td>734</td>
          <td>lake</td>
          <td>sal</td>
          <td>0.05</td>
        </tr>
        <tr>
          <td>734</td>
          <td>pb</td>
          <td>temp</td>
          <td>-21.5</td>
        </tr>
        <tr>
          <td>735</td>
          <td>pb</td>
          <td>rad</td>
          <td>7.22</td>
        </tr>
        <tr>
          <td>735</td>
          <td>-null-</td>
          <td>sal</td>
          <td>0.06</td>
        </tr>
        <tr>
          <td>735</td>
          <td>-null-</td>
          <td>temp</td>
          <td>-26.0</td>
        </tr>
        <tr>
          <td>751</td>
          <td>pb</td>
          <td>rad</td>
          <td>4.35</td>
        </tr>
        <tr>
          <td>751</td>
          <td>pb</td>
          <td>temp</td>
          <td>-18.5</td>
        </tr>
        <tr>
          <td>751</td>
          <td>lake</td>
          <td>sal</td>
          <td>0.1</td>
        </tr>
        <tr>
          <td>752</td>
          <td>lake</td>
          <td>rad</td>
          <td>2.19</td>
        </tr>
        <tr>
          <td>752</td>
          <td>lake</td>
          <td>sal</td>
          <td>0.09</td>
        </tr>
        <tr>
          <td>752</td>
          <td>lake</td>
          <td>temp</td>
          <td>-16.0</td>
        </tr>
        <tr>
          <td>752</td>
          <td>roe</td>
          <td>sal</td>
          <td>41.6</td>
        </tr>
        <tr>
          <td>837</td>
          <td>lake</td>
          <td>rad</td>
          <td>1.46</td>
        </tr>
        <tr>
          <td>837</td>
          <td>lake</td>
          <td>sal</td>
          <td>0.21</td>
        </tr>
        <tr>
          <td>837</td>
          <td>roe</td>
          <td>sal</td>
          <td>22.5</td>
        </tr>
        <tr>
          <td>844</td>
          <td>roe</td>
          <td>rad</td>
          <td>11.25</td>
        </tr>
      </tbody>
    </table>

  </div>
</div>

Notice that three entries — one in the `Visited` table,
and two in the `Survey` table — don’t contain any actual
data, but instead have a special `-null-`.


## Example : Queries

### Interpretation Legend

 * Selected Columns are marked in <span style="background:pink">PINK</span>    

 * Constrained Rows are marked in <span style="background:lightblue">LIGHT BLUE</span>    

 * Result Cells are marked in <span style="background:yellow">YELLOW</span>    


In [1]:
%load_ext sql

In [2]:
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dsa_ro

'Connected: dsa_ro_user@dsa_ro'

### Constraining Rows : Where

```SQL
SELECT id FROM Person WHERE personal = 'Anderson';
```

   <table>
      <thead>
        <tr>
          <th style="background:pink">id</th>
          <th>personal</th>
          <th>family</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td style="background:pink">dyer</td>
          <td>William</td>
          <td>Dyer</td>
        </tr>
        <tr>
          <td style="background:pink">pb</td>
          <td>Frank</td>
          <td>Pabodie</td>
        </tr>
        <tr>
          <td style="background:yellow">lake</td>
          <td style="background:lightblue">Anderson</td>
          <td style="background:lightblue">Lake</td>
        </tr>
        <tr>
          <td style="background:pink">roe</td>
          <td>Valentina</td>
          <td>Roerich</td>
        </tr>
        <tr>
          <td style="background:pink">danforth</td>
          <td>Frank</td>
          <td>Danforth</td>
        </tr>
      </tbody>
    </table>
    

In [3]:
%sql SELECT id FROM Person WHERE personal = 'Anderson';

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
1 rows affected.


id
lake


### Constraining Rows : Where, with Multiple conditions

**WHERE ... AND**
```SQL
SELECT taken, person FROM Survey WHERE quant = 'rad' AND  reading = 9.82;
```

<table>
  <thead>
    <tr>
      <th>taken</th>
      <th>person</th>
      <th>quant</th>
      <th>reading</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="background:yellow">619</td>
      <td style="background:yellow">dyer</td>
      <td style="background:lightblue">rad</td>
      <td style="background:lightblue">9.82</td>
    </tr>
    <tr>
      <td>619</td>
      <td>dyer</td>
      <td>sal</td>
      <td>0.13</td>
    </tr>
    <tr>
      <td>622</td>
      <td>dyer</td>
      <td>rad</td>
      <td>7.8</td>
    </tr>
    <tr>
      <td>622</td>
      <td>dyer</td>
      <td>sal</td>
      <td>0.09</td>
    </tr>
    <tr>
      <td>734</td>
      <td>pb</td>
      <td>rad</td>
      <td>8.41</td>
    </tr>
    <tr>
      <td>734</td>
      <td>lake</td>
      <td>sal</td>
      <td>0.05</td>
    </tr>
    <tr>
      <td>734</td>
      <td>pb</td>
      <td>temp</td>
      <td>-21.5</td>
    </tr>
    <tr>
      <td>735</td>
      <td>pb</td>
      <td>rad</td>
      <td>7.22</td>
    </tr>
    <tr>
      <td>735</td>
      <td>-null-</td>
      <td>sal</td>
      <td>0.06</td>
    </tr>
    <tr>
      <td>735</td>
      <td>-null-</td>
      <td>temp</td>
      <td>-26.0</td>
    </tr>
    <tr>
      <td>751</td>
      <td>pb</td>
      <td>rad</td>
      <td>4.35</td>
    </tr>
    <tr>
      <td>751</td>
      <td>pb</td>
      <td>temp</td>
      <td>-18.5</td>
    </tr>
    <tr>
      <td>751</td>
      <td>lake</td>
      <td>sal</td>
      <td>0.1</td>
    </tr>
    <tr>
      <td>752</td>
      <td>lake</td>
      <td>rad</td>
      <td>2.19</td>
    </tr>
    <tr>
      <td>752</td>
      <td>lake</td>
      <td>sal</td>
      <td>0.09</td>
    </tr>
    <tr>
      <td>752</td>
      <td>lake</td>
      <td>temp</td>
      <td>-16.0</td>
    </tr>
    <tr>
      <td>752</td>
      <td>roe</td>
      <td>sal</td>
      <td>41.6</td>
    </tr>
    <tr>
      <td>837</td>
      <td>lake</td>
      <td>rad</td>
      <td>1.46</td>
    </tr>
    <tr>
      <td>837</td>
      <td>lake</td>
      <td>sal</td>
      <td>0.21</td>
    </tr>
    <tr>
      <td>837</td>
      <td>roe</td>
      <td>sal</td>
      <td>22.5</td>
    </tr>
    <tr>
      <td>844</td>
      <td>roe</td>
      <td>rad</td>
      <td>11.25</td>
    </tr>
  </tbody>
</table>




In [4]:
%sql SELECT  taken, person FROM Survey WHERE (quant='rad' AND reading = '9.82');

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
1 rows affected.


taken,person
619,dyer



### WHERE ... OR 

```SQL
SELECT taken, person FROM Survey WHERE (reading = 9.82 OR reading = 11.25);
```


### WHERE with both AND and OR 
Note: We must carefully use parentheses to control and order the operations.

```SQL
SELECT taken, person FROM Survey WHERE (quant = 'sal' AND reading = 0.05) OR reading = 11.25;
```


### WHERE ... IN () 
Note: The `IN()`  scans the variable for any value in the list, conceptually as giant set of `OR`s. 

```SQL
SELECT taken, person FROM Survey WHERE reading in (9.82,11.25)
```


### WHERE with LIKE  
Note: `LIKE`, and in some DBs `ILIKE`, provide pattern matching using the wildcard `%`.  `ILIKE` is case insensitive `LIKE`.

```SQL
SELECT dated FROM Visited WHERE site LIKE 'DR%'
```


In [5]:
%sql SELECT taken, person FROM Survey WHERE (reading = '9.82' OR reading = '11.25');

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
2 rows affected.


taken,person
619,dyer
844,roe


In [6]:
%sql SELECT taken, person FROM Survey WHERE (quant = 'sal' AND reading = '0.05') OR (reading = '11.25');

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
2 rows affected.


taken,person
734,lake
844,roe


In [7]:
%sql SELECT taken, person FROM Survey WHERE reading in (9.82,11.25);

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
2 rows affected.


taken,person
619,dyer
844,roe


In [8]:
%sql SELECT dated FROM Visited WHERE site LIKE 'DR%'

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
7 rows affected.


dated
1927-02-08
1927-02-10
1930-01-07
1930-01-12
1930-02-26
""
1932-03-22


### Removing Repetition : DISTINCT

The `DISTINCT` keyword in front of a list of columns effectively removes duplicates from the results.

```SQL
SELECT DISTINCT person FROM Survey WHERE quant = 'rad';
```
Compare the two queries below.

In [9]:
%sql SELECT person FROM Survey WHERE quant = 'rad';

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
8 rows affected.


person
dyer
dyer
pb
pb
pb
lake
lake
roe


In [10]:
%sql SELECT DISTINCT person FROM Survey WHERE quant = 'rad';

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
4 rows affected.


person
dyer
lake
pb
roe


### Sorting : ORDER BY

The `ORDER BY` clause allows a list of columns to be used for ordering the results.
The columns may or may not be output result columns, but they usually are.
The arguments ASC or DESC determine the sorting order; ASC is the default.

```SQL
SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY reading;
```

```SQL
SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY reading DESC;
```

In the third example below, the result data is first sorted ascending on `person`, then descending on `reading`.

```SQL
SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY person, reading DESC;
```

In [11]:
%%sql SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY reading;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
8 rows affected.


person,reading
lake,1.46
lake,2.19
pb,4.35
pb,7.22
dyer,7.8
pb,8.41
dyer,9.82
roe,11.25


In [12]:
%%sql SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY reading DESC;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
8 rows affected.


person,reading
roe,11.25
dyer,9.82
pb,8.41
dyer,7.8
pb,7.22
pb,4.35
lake,2.19
lake,1.46


In [13]:
%%sql SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY person, reading DESC;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
8 rows affected.


person,reading
dyer,9.82
dyer,7.8
lake,2.19
lake,1.46
pb,8.41
pb,7.22
pb,4.35
roe,11.25


### LIMIT
The `LIMIT` clause constrains the number of output rows to be returned.  These rows may or may not be the first rows in the table.

```SQL
SELECT * FROM Survey
LIMIT 5
```

In [14]:
%%sql
SELECT * FROM Survey
LIMIT 5

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
5 rows affected.


taken,person,quant,reading
619,dyer,rad,9.82
619,dyer,sal,0.13
622,dyer,rad,7.8
622,dyer,sal,0.09
734,pb,rad,8.41


### Putting it all together

Examples : SELECT columns using a WHERE clause in conjunction with ORDER BY and LIMIT



An example of finding the data for the **maximum `rad` reading.**
```SQL
SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY reading DESC
LIMIT 1;
```

An example of finding the data for the **minimum `rad` reading.**
```SQL
SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY reading
LIMIT 1;
```

In [15]:
%%sql
SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY reading DESC
LIMIT 1;


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
1 rows affected.


person,reading
roe,11.25


In [16]:
%%sql
SELECT person, reading 
FROM Survey WHERE quant = 'rad'
ORDER BY reading
LIMIT 1;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
1 rows affected.


person,reading
lake,1.46


---

Subsequent labs will explore SQL Queries in practice.

Please refer to documentation resources for details and examples:
 * http://www.w3schools.com/sql/sql_select.asp
 

# SAVE YOUR NOTEBOOK, then `File > Close and Halt`

---