In [1]:
%load_ext sql

In [2]:
%sql sqlite:///survey.db

# 4. Calculating new values
We often need to alter the data we're anlayzing. For example, we might need to do some transformation of the data or we realize there's something wrong with the data and we need to correct it.

Let's say that the radiation measurements in our dataset were incorrectly measured and need to be be corrected upward by 5%. We could go back to the source files and edit directly, or we could make this adjustment in our query:

In [4]:
%%sql
SELECT 1.05 * reading FROM Survey WHERE quant = 'rad';

 * sqlite:///survey.db
Done.


1.05 * reading
10.311
8.19
8.8305
7.581
4.5675
2.2995
1.533
11.8125


In [5]:
%%sql
SELECT reading AS reading_original, 1.05*reading AS reading_corrected
FROM Survey 
WHERE quant = 'rad';

 * sqlite:///survey.db
Done.


reading_original,reading_corrected
9.82,10.311
7.8,8.19
8.41,8.8305
7.22,7.581
4.35,4.5675
2.19,2.2995
1.46,1.533
11.25,11.8125


We can also do more complex operations. For example, let's try converting temperatures from Farenheit to Celsius, and then rounding to 2 decimals.

The process for converting farenheight:
1. Subtract 32
2. Multiply by 5/9

In [6]:
%%sql
SELECT *, ROUND(5*(reading - 32)/9, 2) celsius
FROM Survey
WHERE quant = 'temp';

 * sqlite:///survey.db
Done.


taken,person,quant,reading,celsius
734,pb,temp,-21.5,-29.72
735,,temp,-26.0,-32.22
751,pb,temp,-18.5,-28.06
752,lake,temp,-16.0,-26.67


#### TODO
Salinity should be reported as proportions, but it looks like they were put in as percentages. Write a new query that makes them proportions.

In [8]:
%%sql
SELECT reading sal_pct, ROUND(reading/100, 4) sal_prop
FROM Survey WHERE quant = 'sal';

 * sqlite:///survey.db
Done.


sal_pct,sal_prop
0.13,0.0013
0.09,0.0009
0.05,0.0005
0.06,0.0006
0.1,0.001
0.09,0.0009
41.6,0.416
0.21,0.0021
22.5,0.225


We might also want to **concatenate** values from multiple tables. For example, let's say we want the scientists' full names. 

#### TODO
What data fields do we need for this?

`personal` and `family` from `Person`.

The concatenation operator looks like:

```sql
COLUMN1 || ' ' || COLUMN2
```

In [7]:
%%sql
SELECT personal, family, personal || '' || family 
	, family || ', ' || personal 
    , personal || '' || family || ', ID:' || id
FROM Person;

 * sqlite:///survey.db
Done.


personal,family,personal || '' || family,"family || ', ' || personal","personal || '' || family || ', ID:' || id"
William,Dyer,WilliamDyer,"Dyer, William","WilliamDyer, ID:dyer"
Frank,Pabodie,FrankPabodie,"Pabodie, Frank","FrankPabodie, ID:pb"
Anderson,Lake,AndersonLake,"Lake, Anderson","AndersonLake, ID:lake"
Valentina,Roerich,ValentinaRoerich,"Roerich, Valentina","ValentinaRoerich, ID:roe"
Frank,Danforth,FrankDanforth,"Danforth, Frank","FrankDanforth, ID:danforth"


## Set operations
Venn diagram

In [15]:
%%sql
SELECT * FROM Person WHERE id = 'dyer' 
UNION 
SELECT * FROM Person WHERE id = 'roe';


 * sqlite:///survey.db
Done.


id,personal,family
dyer,William,Dyer
roe,Valentina,Roerich


#### TODO
What will each of the three queries below return?

In [20]:
%%sql
SELECT person, quant FROM Survey WHERE person = 'dyer'
UNION
SELECT person, quant FROM Survey WHERE quant = 'rad'

 * sqlite:///survey.db
Done.


person,quant
dyer,rad
dyer,sal
lake,rad
pb,rad
roe,rad


In [21]:
%%sql
SELECT person, quant FROM Survey WHERE person = 'dyer'
INTERSECT
SELECT person, quant FROM Survey WHERE quant = 'rad'

 * sqlite:///survey.db
Done.


person,quant
dyer,rad


In [22]:
%%sql
SELECT person, quant FROM Survey WHERE person = 'dyer'
EXCEPT
SELECT person, quant FROM Survey WHERE quant = 'rad'

 * sqlite:///survey.db
Done.


person,quant
dyer,sal


`UNION` takes unique values, but `UNION ALL` will not eliminate duplicate rows.

#### TODO
What will be the difference between these two queries? Which do you think is faster? What's another way to write both of these queries?

In [18]:
%%sql
SELECT person, quant FROM Survey WHERE person = 'dyer'
UNION
SELECT person, quant FROM Survey WHERE quant = 'rad'

 * sqlite:///survey.db
Done.


person,quant
dyer,rad
dyer,sal
lake,rad
pb,rad
roe,rad


In [19]:
%%sql
SELECT person, quant FROM Survey WHERE person = 'dyer'
UNION ALL
SELECT person, quant FROM Survey WHERE quant = 'rad'

 * sqlite:///survey.db
Done.


person,quant
dyer,rad
dyer,sal
dyer,rad
dyer,sal
dyer,rad
dyer,rad
pb,rad
pb,rad
pb,rad
lake,rad


#### TODO
Use UNION to create a consolidated list of salinity measurements in which Valentina Roerich’s, and only Valentina’s, have been corrected as described in the previous challenge.

In [24]:
%%sql
SELECT person, taken, reading, reading AS reading FROM Survey where QUANT = 'sal' AND person != 'roe'
UNION 
SELECT person, taken, reading, reading/100 AS reading FROM Survey where QUANT = 'sal' AND person = 'roe'
;

 * sqlite:///survey.db
Done.


person,taken,reading,reading_1
dyer,619,0.13,0.13
dyer,622,0.09,0.09
lake,734,0.05,0.05
lake,751,0.1,0.1
lake,752,0.09,0.09
lake,837,0.21,0.21
roe,752,41.6,0.416
roe,837,22.5,0.225


## Working with strings
The `instr(X, Y)` function takes two strings as arguments and tells us whether `X` contains `Y`. It will return either:
- The starting index of `Y` in `X`
- 0 if it isn't there

So for example we could find whether/where an `l` occurs in each of the first names:

In [25]:
%%sql
SELECT personal, instr(personal, 'l') FROM Person;

 * sqlite:///survey.db
Done.


personal,"instr(personal, 'l')"
William,3
Frank,0
Anderson,0
Valentina,3
Frank,0


#### TODO
How would we filter to rows that have a last name with the letter `e` in them??

In [31]:
%%sql
SELECT personal FROM Person WHERE instr(personal, 'e') >0 ;

 * sqlite:///survey.db
Done.


personal
Anderson
Valentina


Another related function is called `substr(X, I, [L])`, which takes a string `X`, a starting integer `I`, and an optional number of letters `L`. For example, the query below will return first names starting at the fifth letter:

In [29]:
%%sql
SELECT substr(personal, 5) FROM Person;

 * sqlite:///survey.db
Done.


"substr(personal, 5)"
iam
k
rson
ntina
k


#### TODO
How would we get the fifth letter *and only* the fifth letter?

In [30]:
%%sql
SELECT substr(personal, 5, 1) FROM Person;

 * sqlite:///survey.db
Done.


"substr(personal, 5, 1)"
i
k
r
n
k


#### TODO
How would we get the first names with a letter `e` starting at the letter `e`?

In [37]:
%%sql
SELECT personal, substr(personal, instr(personal, 'e')+1) FROM Person WHERE instr(personal, 'e') >0 ;

 * sqlite:///survey.db
Done.


personal,"substr(personal, instr(personal, 'e')+1)"
Anderson,rson
Valentina,ntina


#### TODO
Look at the list of sites in `Visited`. Let's get rid of the dashes in the site names.

In [49]:
%%sql
SELECT
    site 
    ,substr(site, 1, instr(site, '-')-1) || substr(site, instr(site, '-')+1) site2
FROM Visited 


 * sqlite:///survey.db
Done.


site,site2
DR-1,DR1
DR-1,DR1
DR-3,DR3
DR-3,DR3
DR-3,DR3
DR-3,DR3
MSK-4,MSK4
DR-1,DR1
