<img src="./intro_images/MIE.png" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right"><a href="https://alandavies.netlify.com" target="_blank">Dr Alan Davies</a></div>
            <div style="text-align: right">Lecturer health data science</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
         <td>
             <img src="./intro_images/alan.png" width="30%" />
         </td>
     </tr>
</table>

# Conditional queries
****

Lets start by recreating the database we left off with last time. Run the next few cells to setup the database with the two tables <code>med_data</code> and <code>drug_table</code>. 

In [5]:
%load_ext sql
%sql sqlite://

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: @None'

In [6]:
%%sql
DROP TABLE IF EXISTS med_data;
CREATE TABLE med_data (ID INTEGER NOT NULL PRIMARY KEY, Name VARCHAR(255), Age INTEGER, Sex CHAR, "Blood pressure" CHAR(7), "Heart rate" INTEGER);
INSERT INTO med_data (Name, Age, Sex, "Blood pressure", "Heart rate") VALUES("Alan Smith", 24, "M", "120/70", 78);
INSERT INTO med_data (Name, Age, Sex, "Blood pressure", "Heart rate") VALUES("Maureen Gdiver", 87, "F", "156/82", 82);
INSERT INTO med_data (Name, Age, Sex, "Blood pressure", "Heart rate") VALUES("Adam Blythe", 54, "M", "132/73", 72);
INSERT INTO med_data (Name, Age, Sex, "Blood pressure", "Heart rate") VALUES("Darren Sanders", 34, "M", "155/67", 67);
INSERT INTO med_data (Name, Age, Sex, "Blood pressure", "Heart rate") VALUES("Sally-Ann Joyce", 19, "F", "121/72", 65);
DROP TABLE IF EXISTS drug_table;
CREATE TABLE drug_table (ID INTEGER NOT NULL PRIMARY KEY, medication VARCHAR(255), route CHAR(4), "freq per day" INTEGER, dose VARCHAR(255), patient_id INTEGER, FOREIGN KEY(patient_id) REFERENCES med_data(ID));
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("AMOXICILLIN", "PO", 3, "500mg", 1);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("IRBESARTAN", "PO", 1, "150mg", 2);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("DIGOXIN", "PO", 1, "1.5mg", 2);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("SIMVASTATIN", "PO", 1, "40mg", 3);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("RAMIPRIL", "PO", 1, "2.5mg", 4);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("WARFARIN", "PO", 1, "variable", 4);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("SENNA", "PO", 1, "15mg", 4);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("None", "NA", 0, "NA", 5);

 * sqlite://
Done.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Done.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.


[]

Queries are good for asking questions of the data. You probably wouldn't want to look though an entire database to answer some question. Instead you would like to see a <code>subset</code> of the data. For example lets say we wanted to see all the patients who had a heart rate above 70 beats per minute. We could write a query for this.

In [7]:
%%sql
SELECT ID, name, "Heart rate" FROM med_data WHERE "Heart rate" > 70;

 * sqlite://
Done.


ID,Name,Heart rate
1,Alan Smith,78
2,Maureen Gdiver,82
3,Adam Blythe,72


Here we are retrieving a subset of the data containing the patients ID, name and heart rate for all records where the heart rate is greater than (>) 70 bpm.

<div class="alert alert-block alert-info">
<b>Task 1:</b>
<br> 
Have a go at writing a query to return the same fields but for heart rates less than 70 bpm.
</div>

In [64]:
%%sql
SELECT ID, name, "Heart rate" FROM med_data WHERE "Heart rate" < 70;

 * sqlite://
Done.


ID,Name,Heart rate
4,Darren Sanders,67
5,Sally-Ann Joyce,65


In [None]:
%%sql # type in your code below


<div class="alert alert-success">
<b>Note:</b> For fields with spaces in the field name we use quotation marks e.g. <code>"Blood pressure"</code>. This is not necessary with fields that have no spaces, e.g. <code>name</code>.
</div>

<div class="alert alert-block alert-info">
<b>Task 2:</b>
<br> 
Can we write a similar query for a patients' blood pressure? If not why?
</div>

No because we are currently storing the blood pressure as text and it would have no meaning to apply operators that work with numbers on a text field.

Lets remove the blood pressure column from the table and add 2 new columns with the systolic (top number) and diastolic (bottom number) blood pressure values. We do this by first making a new temporary table with the same fields minus the one we want to delete. We then insert the data from the selected fields of the <code>med_data</code> table into this new temp table. Now we delete the old <code>med_data</code> table using the <code>DROP TABLE</code> command. Finally we rename our temp table back to <code>med_data</code>.

In [8]:
%%sql

DROP TABLE IF EXISTS tmp_table;
CREATE TABLE tmp_table (
    ID INTEGER NOT NULL PRIMARY KEY,
    Name VARCHAR(255),
    Age INTEGER,
    Sex CHAR,
    "Heart rate" INTEGER
);

INSERT INTO tmp_table SELECT ID, Name, Age, Sex, "Heart rate" FROM med_data; 
DROP TABLE IF EXISTS med_data;
ALTER TABLE tmp_table RENAME TO med_data;
SELECT * FROM med_data;

 * sqlite://
Done.
Done.
5 rows affected.
Done.
Done.
Done.


ID,Name,Age,Sex,Heart rate
1,Alan Smith,24,M,78
2,Maureen Gdiver,87,F,82
3,Adam Blythe,54,M,72
4,Darren Sanders,34,M,67
5,Sally-Ann Joyce,19,F,65


<div class="alert alert-success">
<b>Note:</b> This might seem more complicated than necessary. This is because SQLite has limited support for <code>ALTER TABLE</code>. In other versions of SQL you can simply do <code>ALTER TABLE med_data DROP COLUMN "Blood pressure";</code>
</div>

We can now add the two new columns and populate them with the correct data.

In [9]:
%%sql
ALTER TABLE med_data ADD COLUMN sys INTEGER;
ALTER TABLE med_data ADD COLUMN dia INTEGER;

 * sqlite://
Done.
Done.


[]

In [10]:
%%sql
UPDATE med_data SET sys = 120 WHERE ID = 1;
UPDATE med_data SET sys = 156 WHERE ID = 2;
UPDATE med_data SET sys = 132 WHERE ID = 3;
UPDATE med_data SET sys = 155 WHERE ID = 4;
UPDATE med_data SET sys = 121 WHERE ID = 5;

UPDATE med_data SET dia = 70 WHERE ID = 1;
UPDATE med_data SET dia = 82 WHERE ID = 2;
UPDATE med_data SET dia = 73 WHERE ID = 3;
UPDATE med_data SET dia = 67 WHERE ID = 4;
UPDATE med_data SET dia = 72 WHERE ID = 5;
SELECT * FROM med_data;

 * sqlite://
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Done.


ID,Name,Age,Sex,Heart rate,sys,dia
1,Alan Smith,24,M,78,120,70
2,Maureen Gdiver,87,F,82,156,82
3,Adam Blythe,54,M,72,132,73
4,Darren Sanders,34,M,67,155,67
5,Sally-Ann Joyce,19,F,65,121,72


Grade 1 or mild hypertension is defined as a blood pressure between and including 140 and 159. We can write a query to find all the patients in our database with grade 1 hypertension.

In [11]:
%%sql
SELECT Name, sys FROM med_data WHERE sys >= 140 AND sys <=159;

 * sqlite://
Done.


Name,sys
Maureen Gdiver,156
Darren Sanders,155


<div class="alert alert-block alert-info">
<b>Task 3:</b>
<br> 
We can define hypertension (high blood pressure) as follows:
<br />
<table class="table-bordered">
<thead>
<th>Grade</th>
<th>Systolic (mmHG)</th>
<th>Diastolic (mmHG)</th>
</thead>
<tbody>
<tr>
<td>Normal/optimal</td>
<td>&lt; 140</td>
<td>&lt; 90</td>
</tr>
<tr>
<td>Grade 1 (mild)</td>
<td>140-159</td>
<td>90-99</td>
</tr>
<tr>
<td>Grade 2 (moderate)</td>
<td>160-179</td>
<td>100-109</td>
</tr>
<tr>
<td>Grade 3 (severe)</td>
<td>&ge; 180</td>
<td>&ge; 110</td>
</tr>
</tbody>
</table>
<br />
Blood pressure is typically measured in millimeters of mercury (mmHG). The top number (systolic) is when blood is being pumped (squeezed). The bottom number (diastolic) is when the vessels are relaxed.<br /><br />  
Write queries to see how many patients fit into each of the categories in the table (using just the systolic column). We already did grade 1 in the example above.
</div>

In [66]:
%%sql
SELECT Name, sys FROM med_data WHERE sys < 140;

 * sqlite://
Done.


Name,sys
Alan Smith,120
Adam Blythe,132
Sally-Ann Joyce,121


In [67]:
%%sql
SELECT Name, sys FROM med_data WHERE sys >= 140 AND sys <=159;

 * sqlite://
Done.


Name,sys
Maureen Gdiver,156
Darren Sanders,155


In [68]:
%%sql
SELECT Name, sys FROM med_data WHERE sys >= 160 AND sys <=179;

 * sqlite://
Done.


Name,sys


In [69]:
%%sql
SELECT Name, sys FROM med_data WHERE sys >= 180;

 * sqlite://
Done.


Name,sys


In [None]:
%%sql # type in your code below


In [None]:
%%sql # type in your code below


In [None]:
%%sql # type in your code below


In [None]:
%%sql # type in your code below


Another way of accomplishing a similar thing is to use the <code>CASE</code> clause. This works in a similar way to the if/else statements we used in Python. Generating an additional results column based on the selection criteria. Here we can grade each person according to the hypertension criteria. 

In [12]:
%%sql
SELECT Name, sys, 
CASE
WHEN sys > 140 THEN 'Grade 1'
WHEN sys > 159 THEN 'Grade 2'
WHEN sys >= 180 THEN 'Grade 3'
ELSE 'Normal'
END as 'BP classification'
FROM med_data;

 * sqlite://
Done.


Name,sys,BP classification
Alan Smith,120,Normal
Maureen Gdiver,156,Grade 1
Adam Blythe,132,Normal
Darren Sanders,155,Grade 1
Sally-Ann Joyce,121,Normal


<div class="alert alert-block alert-info">
<b>Task 4:</b>
<br> 
Write queries for the following:<br />
1. Get all the males (return name and sex) from <code>med_data</code>.<br />
2. Get all patients over the age of 50 years (return name and age) from <code>med_data</code>.<br />
3. Get all the people that have medication more than once a day (return medication and frequency) from <code>drug_table</code>.
</div>

In [70]:
%%sql 
SELECT Name, Sex FROM med_data WHERE Sex = "M";

 * sqlite://
Done.


Name,Sex
Alan Smith,M
Adam Blythe,M
Darren Sanders,M


In [71]:
%%sql
SELECT Name, Age FROM med_data WHERE Age > 50;

 * sqlite://
Done.


Name,Age
Maureen Gdiver,87
Adam Blythe,54


In [72]:
%%sql
SELECT medication, "freq per day" FROM drug_table WHERE "freq per day" > 1;

 * sqlite://
Done.


medication,freq per day
AMOXICILLIN,3


In [None]:
%%sql # type in your code below


In [None]:
%%sql # type in your code below


In [None]:
%%sql # type in your code below


#### 3.1 Dealing with text

We have looked at how to extract subsets of data from tables based on numerical values. Another useful operation is being able to extract data based on textual conditions. This can be more challenging depending on how the data is entered. Two of the most useful clauses/operators are <code>LIKE</code> and <code>GLOB</code>.

In [14]:
%%sql
SELECT * FROM med_data;

 * sqlite://
Done.


ID,Name,Age,Sex,Heart rate,sys,dia
1,Alan Smith,24,M,78,120,70
2,Maureen Gdiver,87,F,82,156,82
3,Adam Blythe,54,M,72,132,73
4,Darren Sanders,34,M,67,155,67
5,Sally-Ann Joyce,19,F,65,121,72


Let's say we were not sure exactly how <code>Maureen Gdiver's<code> name was spelled. We can use <code>LIKE</code> to retrieve all the similar data. For example:

In [15]:
%%sql
SELECT * FROM med_data WHERE Name LIKE "mau%";

 * sqlite://
Done.


ID,Name,Age,Sex,Heart rate,sys,dia
2,Maureen Gdiver,87,F,82,156,82


<div class="alert alert-success">
<b>Note:</b> It is common practice to store first and last name in separate fields to facilitate searching more easily (i.e. by last or first name).
</div>

Firstly <code>LIKE</code> is not case sensitive so we didn't need to use the capital M for Maureen's first name. Here we are using the <code>%</code> wildcard to select all names that start with <code>mau</code> but can end with anything. If we just wanted all people with names beginning with 'A' we could write:

In [16]:
%%sql
SELECT * FROM med_data WHERE Name LIKE "a%";

 * sqlite://
Done.


ID,Name,Age,Sex,Heart rate,sys,dia
1,Alan Smith,24,M,78,120,70
3,Adam Blythe,54,M,72,132,73


<div class="alert alert-block alert-info">
<b>Task 5:</b>
<br> 
Using the <code>%</code> wildcard and <code>LIKE</code>, write a query to return all the peoples names that <strong>end</strong> with the letter 'e'.
</div>

In [17]:
%%sql
SELECT * FROM med_data WHERE Name LIKE "%e";

 * sqlite://
Done.


ID,Name,Age,Sex,Heart rate,sys,dia
3,Adam Blythe,54,M,72,132,73
5,Sally-Ann Joyce,19,F,65,121,72


In [None]:
%%sql # type in your code below


You can also use the wildcard at either end of text if you are not sure about the beginning or end, but know the middle. For example.

In [18]:
%%sql
SELECT * FROM med_data WHERE Name LIKE "%reen%";

 * sqlite://
Done.


ID,Name,Age,Sex,Heart rate,sys,dia
2,Maureen Gdiver,87,F,82,156,82


<div class="alert alert-success">
<b>Note:</b> We can also use the <code>&#95;</code> wildcard when we want to limit results to a known number of characters. For example <code>WHERE Name LIKE "ada&#95;"</code> will get all names beginning with 'ada' that end in anything, but must be only 4 characters in length. 
</div>

Another useful way of searching for text patterns is with <code>GLOB</code>. This lets us check for zero or more characters (<code>*</code>) and/or with brackets (<code>[]</code>) to match any character in the list contained within the brackets. For example using the <code>drug_table</code>. 

In [19]:
%%sql
SELECT * FROM drug_table;

 * sqlite://
Done.


ID,medication,route,freq per day,dose,patient_id
1,AMOXICILLIN,PO,3,500mg,1
2,IRBESARTAN,PO,1,150mg,2
3,DIGOXIN,PO,1,1.5mg,2
4,SIMVASTATIN,PO,1,40mg,3
5,RAMIPRIL,PO,1,2.5mg,4
6,WARFARIN,PO,1,variable,4
7,SENNA,PO,1,15mg,4
8,,,0,,5


Lets return all the <code>dose</code> values with <code>mg</code> in them.

In [21]:
%%sql
SELECT * FROM drug_table WHERE dose GLOB "*mg";

 * sqlite://
Done.


ID,medication,route,freq per day,dose,patient_id
1,AMOXICILLIN,PO,3,500mg,1
2,IRBESARTAN,PO,1,150mg,2
3,DIGOXIN,PO,1,1.5mg,2
4,SIMVASTATIN,PO,1,40mg,3
5,RAMIPRIL,PO,1,2.5mg,4
7,SENNA,PO,1,15mg,4


We could return all the drug names from A to E.

In [24]:
%%sql
SELECT * FROM drug_table WHERE medication GLOB "[A-E]*";

 * sqlite://
Done.


ID,medication,route,freq per day,dose,patient_id
1,AMOXICILLIN,PO,3,500mg,1
3,DIGOXIN,PO,1,1.5mg,2


Or all the drug names that <strong>do not</strong> start with A to E.

In [25]:
%%sql
SELECT * FROM drug_table WHERE medication GLOB "[^A-E]*";

 * sqlite://
Done.


ID,medication,route,freq per day,dose,patient_id
2,IRBESARTAN,PO,1,150mg,2
4,SIMVASTATIN,PO,1,40mg,3
5,RAMIPRIL,PO,1,2.5mg,4
6,WARFARIN,PO,1,variable,4
7,SENNA,PO,1,15mg,4
8,,,0,,5


We can also use the <code>?</code> wildcard to specify a certain position (i.e. which character number in a string of text). So for example if we want <code>mg</code> to be in position 3 (as in 40mg and 15mg).

In [34]:
%%sql
SELECT * FROM drug_table WHERE dose GLOB "??mg";

 * sqlite://
Done.


ID,medication,route,freq per day,dose,patient_id
4,SIMVASTATIN,PO,1,40mg,3
7,SENNA,PO,1,15mg,4


<div class="alert alert-block alert-info">
<b>Task 6:</b>
<br> 
Write a query to return all the medications details that end in 'IN'.
</div>

In [35]:
%%sql
SELECT * FROM drug_table WHERE medication GLOB "*IN";

 * sqlite://
Done.


ID,medication,route,freq per day,dose,patient_id
1,AMOXICILLIN,PO,3,500mg,1
3,DIGOXIN,PO,1,1.5mg,2
4,SIMVASTATIN,PO,1,40mg,3
6,WARFARIN,PO,1,variable,4


In [None]:
%%sql # type in your code below


#### 3.2 Combining data with conditional queries

We can also combine data from both our tables using a <code>join</code>. We can construct a query to check if all the people who are hypertensive (have high blood pressure) are prescribed an antihypertensive (blood pressure medication) by combining data from both tables.

In [76]:
%%sql 
SELECT Name, sys, medication FROM med_data 
INNER JOIN drug_table ON drug_table.patient_id = med_data.Id 
WHERE med_data.sys > 140;

 * sqlite://
Done.


Name,sys,medication
Maureen Gdiver,156,IRBESARTAN
Maureen Gdiver,156,DIGOXIN
Darren Sanders,155,RAMIPRIL
Darren Sanders,155,WARFARIN
Darren Sanders,155,SENNA


<div class="alert alert-success">
<b>Note:</b> To be clear about which table a field is in, we use a dot. The convention is table name (dot) field name. e.g. <code>drug_table.patient_id</code>.
</div>

<div class="alert alert-block alert-info">
<b>Task 7:</b>
<br> 
Which items of medication presented in the last query are for the treatment of hypertension (high blood pressure)? Go to the <a href="https://bnf.nice.org.uk/" target="_blank">British National Formulary (BNF)</a> and search for the medications. Look at their indications for use.  
</div>

<ul>
<li>IRBESARTAN</li>
<li>RAMIPRIL</li>
</ul>

There are many different operators that can be used in SQL for arithmetic, comparison and logic. To see a complete list, take a look at this link: <a href="https://www.w3schools.com/sql/sql_operators.asp" target="_blank">SQL operators</a>. 

What we used in the query above is called a join. We will look at joins in more detail in the next workbook.