<img src="./intro_images/MIE.png" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right"><a href="https://alandavies.netlify.com" target="_blank">Dr Alan Davies</a></div>
            <div style="text-align: right">Lecturer health data science</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
         <td>
             <img src="./intro_images/alan.png" width="30%" />
         </td>
     </tr>
</table>

# Linking tables together
****

In the last notebook we looked at how we could use a <code>join</code> to combine results from several tables in a query. Joins allow us combine rows from multiple tables based on some related column. Here we will look at how we can combine data in various ways from the multiple tables it may be stored within. 

In [5]:
%load_ext sql
%sql sqlite://

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: @None'

First we need to construct some tables for us to make some joins. Let's also add a medical history table to store information about the patients past medical issues. The structure will look like this:

<img src="./intro_images/rel.png" width="60%" />

In [67]:
%%sql
DROP TABLE IF EXISTS med_data;
CREATE TABLE med_data (
    ID INTEGER NOT NULL PRIMARY KEY,
    Name VARCHAR(255),
    Age INTEGER,
    Sex CHAR,
    sys INTEGER,
    dia INTEGER,
    "Heart rate" INTEGER
);

DROP TABLE IF EXISTS drug_table;
CREATE TABLE drug_table (
    ID INTEGER NOT NULL PRIMARY KEY,
    medication VARCHAR(255),
    route VARCHAR(4), 
    "freq per day" INTEGER,
    dose VARCHAR(255),
    patient_id INTEGER,
    FOREIGN KEY(patient_id) REFERENCES med_data(ID)
);

DROP TABLE IF EXISTS medical_history;
CREATE TABLE medical_history (
    ID INTEGER NOT NULL PRIMARY KEY,
    condition VARCHAR(255),
    date_diagnosed CHAR(8), 
    patient_id INTEGER,
    FOREIGN KEY(patient_id) REFERENCES med_data(ID)
);

INSERT INTO med_data (Name, Age, Sex, sys, dia, "Heart rate") VALUES("Alan Smith", 24, "M", 120, 70, 78);
INSERT INTO med_data (Name, Age, Sex, sys, dia, "Heart rate") VALUES("Maureen Gdiver", 87, "F", 156, 82, 101);
INSERT INTO med_data (Name, Age, Sex, sys, dia, "Heart rate") VALUES("Adam Blythe", 54, "M", 132, 73, 72);
INSERT INTO med_data (Name, Age, Sex, sys, dia, "Heart rate") VALUES("Darren Sanders", 34, "M", 155, 67, 120);
INSERT INTO med_data (Name, Age, Sex, sys, dia, "Heart rate") VALUES("Sally-Ann Joyce", 19, "F", 121, 72, 65);

INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("AMOXICILLIN", "PO", 3, "500mg", 1);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("IRBESARTAN", "PO", 1, "150mg", 2);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("DIGOXIN", "PO", 1, "1.5mg", 2);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("SIMVASTATIN", "PO", 1, "40mg", 3);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("RAMIPRIL", "PO", 1, "2.5mg", 4);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("WARFARIN", "PO", 1, "variable", 4);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("SENNA", "PO", 1, "15mg", 4);
INSERT INTO drug_table (medication, route, "freq per day", dose, patient_id) VALUES("None", "NA", 0, "NA", 5);

INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("LRTI", "2019-10-18 00:00:00", 1);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Appendectomy", "2004-11-05 00:00:00", 1);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Hypertension", "2003-12-12 00:00:00", 2);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Atrial fibrillation", "2003-12-12 00:00:00", 2);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("#NOF", "1992-07-06 00:00:00", 2);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Otitis media", "1990-10-18 00:00:00", 2);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Pulmonary embolism", "1987-03-12 00:00:00", 2);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Hypercholesterolemia", "2018-04-02 00:00:00", 3);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Gonorrhea", "2012-06-14 00:00:00", 3);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("RTC", "1994-12-16 00:00:00", 3);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Hypertension", "2019-08-01 00:00:00", 4);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Constipation", "2019-04-12 00:00:00", 4);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("Atrial fibrillation", "2017-05-03 00:00:00", 4);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("CVA", "2016-12-16 00:00:00", 4);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("MI", "2014-12-12 00:00:00", 4);
INSERT INTO medical_history (condition, date_diagnosed, patient_id) VALUES("PCOS", "2016-06-08 00:00:00", 5);

 * sqlite://
Done.
Done.
Done.
Done.
Done.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.


[]

In [68]:
%%sql
SELECT * FROM med_data;

 * sqlite://
Done.


ID,Name,Age,Sex,sys,dia,Heart rate
1,Alan Smith,24,M,120,70,78
2,Maureen Gdiver,87,F,156,82,101
3,Adam Blythe,54,M,132,73,72
4,Darren Sanders,34,M,155,67,120
5,Sally-Ann Joyce,19,F,121,72,65


In [45]:
%%sql
SELECT * FROM drug_table;

 * sqlite://
Done.


ID,medication,route,freq per day,dose,patient_id
1,AMOXICILLIN,PO,3,500mg,1
2,IRBESARTAN,PO,1,150mg,2
3,DIGOXIN,PO,1,1.5mg,2
4,SIMVASTATIN,PO,1,40mg,3
5,RAMIPRIL,PO,1,2.5mg,4
6,WARFARIN,PO,1,variable,4
7,SENNA,PO,1,15mg,4
8,,,0,,5


In [46]:
%%sql
SELECT * FROM medical_history;

 * sqlite://
Done.


ID,condition,date_diagnosed,patient_id
1,LRTI,2019-10-18 00:00:00,1
2,Appendectomy,2004-11-05 00:00:00,1
3,Hypertension,2003-12-12 00:00:00,2
4,Atrial fibrillation,2003-12-12 00:00:00,2
5,#NOF,1992-07-06 00:00:00,2
6,Otitis media,1990-10-18 00:00:00,2
7,Pulmonary embolism,1987-03-12 00:00:00,2
8,Hypercholesterolemia,2018-04-02 00:00:00,3
9,Gonorrhea,2012-06-14 00:00:00,3
10,RTC,1994-12-16 00:00:00,3


<div class="alert alert-block alert-info">
<b>Task 1:</b>
<br> 
We have added some conditions. For example <code>#NOF</code>. The hash is often used to denote a fracture (break) and NOF stands for neck of femur. So this is a broken hip. Given that there are actually many different different ways to refer to these conditions (CVA = cerebrovascular accident (stroke), PCOS = polycystic ovary syndrome, LRTI = lower respiratory track infection, ...). Can you see any problems with representing this data like this, and if so what solutions could be applied?
</div>

As conditions can be referred to in a multitude of different ways, it can be very hard to extract such data. If we wanted to query type II diabetes, what would we search for (e.g. NIDDM, DM type II, type 2 etc.)? Representing information like this in electronic patient records is challenging. A simple solution would be to have another column with a more detailed textual description of the condition. A better solution to improve interoperability is to use a recognised clinical coding standard, such as: Read codes, ICD-10, SNOMED, etc). These allow a common representation of conditions, and procedures using a standard coding system. Of course the data we are using here is try out the features of SQL rather than to produce a sensible, well designed patient record system. 

#### 4.1 Joins

We saw an example in the last notebook of using joins to combine data from our <code>med_data</code> and <code>drug_table</code> tables. We used the <code>patient_id</code> in the <code>drug_table</code> to connect to the <code>id</code> column in the <code>med_data</code> table and return information where the systolic blood pressure was greater than 140 mmHg. 

In [14]:
%%sql 
SELECT Name, sys, medication FROM med_data 
INNER JOIN drug_table ON drug_table.patient_id = med_data.Id 
WHERE med_data.sys > 140;

 * sqlite://
Done.


Name,sys,medication
Maureen Gdiver,156,IRBESARTAN
Maureen Gdiver,156,DIGOXIN
Darren Sanders,155,RAMIPRIL
Darren Sanders,155,WARFARIN
Darren Sanders,155,SENNA


<div class="alert alert-block alert-info">
<b>Task 2:</b>
<br> 
Using the <code>medical_history</code> table. Modify the query above to return the same information this time with the condition when they also have a medical history of <code>hypertension</code>. Hint: the query is case sensitive, so remember to use a capital letter for Hypertension. Also you will need to add another <code>INNER JOIN</code> command. 
</div>

In [18]:
%%sql 
SELECT Name, sys, medication, condition FROM med_data 
INNER JOIN drug_table ON drug_table.patient_id = med_data.Id 
INNER JOIN medical_history ON medical_history.patient_id = med_data.Id
WHERE med_data.sys > 140 AND condition = "Hypertension";

 * sqlite://
Done.


Name,sys,medication,condition
Maureen Gdiver,156,DIGOXIN,Hypertension
Maureen Gdiver,156,IRBESARTAN,Hypertension
Darren Sanders,155,RAMIPRIL,Hypertension
Darren Sanders,155,SENNA,Hypertension
Darren Sanders,155,WARFARIN,Hypertension


In [None]:
%%sql # type in your code below


We could modify this further to only include results for antihypertensive drugs only, if we wanted to check they had hypertension in the their past medical history, currently were hypertensive and are taking appropriate treatment for their condition.

In [22]:
%%sql 
SELECT Name, sys, medication, condition FROM med_data 
INNER JOIN drug_table ON drug_table.patient_id = med_data.Id 
INNER JOIN medical_history ON medical_history.patient_id = med_data.Id
WHERE med_data.sys > 140 AND condition = "Hypertension" AND drug_table.medication IN ("IRBESARTAN", "RAMIPRIL");

 * sqlite://
Done.


Name,sys,medication,condition
Maureen Gdiver,156,IRBESARTAN,Hypertension
Darren Sanders,155,RAMIPRIL,Hypertension


<div class="alert alert-success">
<b>Note:</b> If we are not sure exactly how something maybe written, we can use the <code>LIKE</code> keyword to look for something similar i.e. <code>condition LIKE "%tension"</code>. This would get us anything ending in tension.
</div>

<div class="alert alert-block alert-info">
<b>Task 3:</b>
<br> 
Having looked at a few <code>JOIN</code> statements, can you see any potential issues with this? 
</div>

So one of the issues with joining data is that depending on the number of tables and data you want to return, you could end up with some very complex SQL join statements. This is one of the advantages of <code>NoSQL</code> alternatives like <code>MongoDB</code> where data can be nested and thus avoids the need for complex joins. Have a look at some of the differences between relational database and NoSQL database types here: <a href="https://www.mongodb.com/nosql-explained" target="_blank">NoSQL Databases</a>.

There are different types of joins that can be used depending on the requirements (note that not all versions of SQL support all types of join). Joins include (where A and B are different tables):

<img src="./intro_images/dbjoins.png" width="100%" />

<ul>
<li><code>Inner</code>: Records with matches in both A and B</li> 
<li><code>Left</code>: All records in A and any matches from B</li>
<li><code>Right</code>: All records in B and any matches from A</li>
<li><code>Full</code>: All records with match in either A or B</li>
</ul>

<div class="alert alert-success">
<b>Note:</b> <code>SQLite</code> does not support right (right outer join) or full (full outer join).
</div>

Another example is to look at all the patients that had a diagnosis in and after the year 2018.

In [57]:
%%sql 
SELECT Name, date_diagnosed FROM med_data 
INNER JOIN medical_history ON medical_history.patient_id = med_data.Id
WHERE medical_history.date_diagnosed > "2018-01-01 00:00:00";

 * sqlite://
Done.


Name,date_diagnosed
Alan Smith,2019-10-18 00:00:00
Adam Blythe,2018-04-02 00:00:00
Darren Sanders,2019-08-01 00:00:00
Darren Sanders,2019-04-12 00:00:00


<div class="alert alert-block alert-info">
<b>Task 4:</b>
<br> 
1. Adapt the above to return the patients that received a diagnosis between (and including) the years 1990 and 2017.<br />
2. Use the <code>ORDER BY</code> command to order the results by date diagnosed.
</div>

In [59]:
%%sql 
SELECT Name, date_diagnosed FROM med_data 
INNER JOIN medical_history ON medical_history.patient_id = med_data.Id
WHERE medical_history.date_diagnosed >= "1990-01-01 00:00:00" AND medical_history.date_diagnosed <= "2017-01-01 00:00:00" 
ORDER BY medical_history.date_diagnosed;

 * sqlite://
Done.


Name,date_diagnosed
Maureen Gdiver,1990-10-18 00:00:00
Maureen Gdiver,1992-07-06 00:00:00
Adam Blythe,1994-12-16 00:00:00
Maureen Gdiver,2003-12-12 00:00:00
Maureen Gdiver,2003-12-12 00:00:00
Alan Smith,2004-11-05 00:00:00
Adam Blythe,2012-06-14 00:00:00
Darren Sanders,2014-12-12 00:00:00
Sally-Ann Joyce,2016-06-08 00:00:00
Darren Sanders,2016-12-16 00:00:00


In [None]:
%%sql # type in your code below


#### 4.2 Unions

So we have seen how to combine data with joins. Another way of doing this is with a <code>union</code>. You can think of it as a way of placing tables on top of one another. For this to work the two tables need the same number of columns and the columns need to be of the same/similar types. This could be used to combine data from different sources. So lets say we had these 2 tables of admission data from two hospitals with the same number of columns.

In [60]:
%%sql
DROP TABLE IF EXISTS admin_data_hospital_one;
CREATE TABLE admin_data_hospital_one (
    ID INTEGER NOT NULL PRIMARY KEY,
    Name VARCHAR(255),
    hospital_number VARCHAR(8),
    Age INTEGER,
    Sex CHAR
);

DROP TABLE IF EXISTS admin_data_hospital_two;
CREATE TABLE admin_data_hospital_two (
    ID INTEGER NOT NULL PRIMARY KEY,
    patient_name VARCHAR(255),
    hospital_number VARCHAR(8),
    Age INTEGER,
    Sex CHAR
);

INSERT INTO admin_data_hospital_one (Name, hospital_number, Age, Sex) VALUES("Alan Smith", "342432", 34, "M");
INSERT INTO admin_data_hospital_one (Name, hospital_number, Age, Sex) VALUES("Paul Jones", "643334", 54, "M");
INSERT INTO admin_data_hospital_one (Name, hospital_number, Age, Sex) VALUES("Mohamed Aziz", "322432", 64, "M");

INSERT INTO admin_data_hospital_two (patient_name, hospital_number, Age, Sex) VALUES("Jane Smith", "544543", 88, "F");
INSERT INTO admin_data_hospital_two (patient_name, hospital_number, Age, Sex) VALUES("Allen Daniels", "435433", 78, "M");
INSERT INTO admin_data_hospital_two (patient_name, hospital_number, Age, Sex) VALUES("Sandra Jones", "4534534", 44, "F");
INSERT INTO admin_data_hospital_two (patient_name, hospital_number, Age, Sex) VALUES("Jan Golas", "3434534", 62, "M");

 * sqlite://
Done.
Done.
Done.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.


[]

In [63]:
%%sql
SELECT * FROM admin_data_hospital_one;

 * sqlite://
Done.


ID,Name,hospital_number,Age,Sex
1,Alan Smith,342432,34,M
2,Paul Jones,643334,54,M
3,Mohamed Aziz,322432,64,M


In [64]:
%%sql
SELECT * FROM admin_data_hospital_two;

 * sqlite://
Done.


ID,patient_name,hospital_number,Age,Sex
1,Jane Smith,544543,88,F
2,Allen Daniels,435433,78,M
3,Sandra Jones,4534534,44,F
4,Jan Golas,3434534,62,M


We can combine them with a union like so:

In [62]:
%%sql
SELECT Name, hospital_number, Age, Sex FROM admin_data_hospital_one 
UNION
SELECT patient_name, hospital_number, Age, Sex FROM admin_data_hospital_two

 * sqlite://
Done.


Name,hospital_number,Age,Sex
Alan Smith,342432,34,M
Allen Daniels,435433,78,M
Jan Golas,3434534,62,M
Jane Smith,544543,88,F
Mohamed Aziz,322432,64,M
Paul Jones,643334,54,M
Sandra Jones,4534534,44,F


<div class="alert alert-success">
<b>Note:</b> <code>UNION ALL</code> Will keep all values, whereas <code>UNION</code> only selects unique values.
</div>

#### 4.3 Views

We can create <code>views</code> in a relational database. A view is a database object that is searchable and is defined by a query. They can be though of as virtual tables, containing the results of a query. Each time the view is run, the query used to generate it also run. This can have an impact on performance. One main use of views is to simplify complex relationships or for security. In the later the view can be used to restrict access to a table, giving specific users only the information they need or are allowed to view. Many views are read only and so can't be modified or deleted. Let's create a view of the data to get the patients age, name and heart rate for all patients that have tachycardia (heart rate > 100 bpm).   

In [75]:
%%sql
DROP VIEW IF EXISTS tachycardia;
CREATE VIEW tachycardia AS SELECT Name, "Heart rate", Age FROM med_data WHERE "Heart rate" > 100;

 * sqlite://
Done.
Done.


[]

We can treat the view like any other table, view it and perform queries on it. We can use this to hide query complexity or to hide restricted data from certain database users.

In [73]:
%%sql
SELECT * FROM tachycardia;

 * sqlite://
Done.


Name,Heart rate,Age
Maureen Gdiver,101,87
Darren Sanders,120,34


In [74]:
%%sql
SELECT Age FROM tachycardia;

 * sqlite://
Done.


Age
87
34


<div class="alert alert-block alert-info">
<b>Task 5:</b>
<br> 
1. Create a view for the following query:<br />
<code>
SELECT Name, sys, medication FROM med_data 
INNER JOIN drug_table ON drug_table.patient_id = med_data.Id 
WHERE med_data.sys > 140;
</code>
<br />
2. Using this view called <code>hypertension</code> and the <code>DISTINCT</code> command, return the unique names from the hypertension view.
</div>

In [84]:
%%sql
DROP VIEW IF EXISTS hypertension;
CREATE VIEW hypertension AS
SELECT Name, sys, medication FROM med_data 
INNER JOIN drug_table ON drug_table.patient_id = med_data.Id 
WHERE med_data.sys > 140;

 * sqlite://
Done.
Done.


[]

In [85]:
%%sql
SELECT DISTINCT Name FROM hypertension;

 * sqlite://
Done.


Name
Maureen Gdiver
Darren Sanders


In [None]:
%%sql # type in your code below


In [None]:
%%sql # type in your code below


In the next notebook we will look at using some of the inbuilt SQL functions to aggregate data.