<img src="./intro_images/MIE.PNG" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right"><a href="https://alandavies.netlify.com" target="_blank">Dr Alan Davies</a></div>
            <div style="text-align: right">Senior Lecturer health data science</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
         <td>
             <img src="./intro_images/alan.PNG" width="30%" />
         </td>
     </tr>
</table>

# 1.0 Introduction to SQL and Databases
****

#### About this Notebook
This notebook introduces the concept of relational database systems with SQL.

<div class="alert alert-block alert-warning"><b>Learning Objectives:</b> 
<br/> At the end of this notebook you will be able to:
    
- Investigate key features of relational database systems

- Explore the structure of SQL databases

</div> 

<a id="top"></a>

<b>Table of contents</b><br>

1.1 [Making tables](#makingtables)

1.2 [Using basic queries](#basicqueries)

SQL (pronounced see-qual) stands for <code>Structured Query Language</code> and is used for creating, deleting, adding, changing and retrieving data stored in relational database systems. This set of notebooks will work through some of the basic concepts of relational databases using SQL. This is not an exhaustive explanation of all of the features of SQL, but rather an introduction that covers the main concepts and commands. We recommend that you also supplement this workbook with other resources if you want to extend your knowledge of SQL beyond this introduction.

Relational databases store data in tables (like a spreadsheet). Each table stores data that is (ideally) semantically related to the table. For example a <code>customer</code> table would store relevant details about customers for a business. There may be another separate table for <code>products</code> that stores information about product type, price etc. Relationships can be made between the various tables which is where the term relational database comes from. SQL allows us to manage the data stored within these structured databases. There are actually many different vendors of SQL, including:<br />
<ul>
<li>SQLite</li>
<li>MySQL</li>
<li>PostgreSQL</li>
<li>Microsoft SQL server</li>
</ul><br />
For the purpose of these workbooks, we will be using <code>SQLite</code>.

SQLite is a <code>relational database management system</code> (RDMS) that is mainly used for small programs or mobile development. It was designed to be lightweight and fast. To get SQLite to work in the Jupyter notebook environment we first need to run a few commands. We will run these commands at the start of each notebook in order to run SQL code in Jupyter.

In [1]:
%load_ext sql

In [2]:
%sql sqlite://

The two lines above load the SQL extension for the Jupyter notebook and open a connection for SQLite.

<div class="alert alert-success">
<b>Note:</b> In the notebook we use the <code>%sql</code> command before each SQL statement and for multiples lines of SQL we use <code>%%sql</code>. This is only used in the notebook environment. When you use SQL in other settings/contexts, you would not need to precede SQL statements in this way.
</div>

<a id="makingtables"></a>
#### 1.1 Making tables

In a relational database - data is stored in database objects called <code>tables</code>. Tables organise data in rows and columns. The columns in a database table are often referred to as <code>fields</code> and each row represents a single <code>record</code>. Tables are used to organise related data. Tables have some important characteristics, including: each row contains a single value for each column. Each value in a column has the same data type (more on this later) and each table holds information about a specific concept. For example the <code>stock</code> table below that contains information about some stock for a shop. The field <code>Stock item</code> contains information about the stock item in that column. The highlighted row represents a single record, or item in the table.

<img src="./intro_images/exampletable.PNG" width="60%" />

Relational databases are used to form relationships between tables. Often in larger systems we would have multiple tables to store different information. For example we might have a table with patients’ names and addresses, while another could contain their past medical history and yet another could contain a list of their medication. Storing all this data in a single table is possible, but is an inefficient way of storing and retrieving data and would also lead to duplication of data. Consider the two tables below that contain some admission data and some information about the patients medication. Although we have separated out the tables, there is still some duplication here (i.e. hospital number and patients name). If we made a mistake or wanted to update a field like the name, we would have to do this across multiple tables. Also failing to do this could lead to even more errors. 

<img src="./intro_images/badtables.PNG" width="80%" />

To overcome this, we instead store this data in multiple separate tables and form <code>relationships</code> between them, linking them with a unique id:

<img src="./intro_images/goodtables.PNG" width="80%" />

We will examine linking data together that is stored in different tables later. First we need to become familiar with some of the common SQL statements for manipulating data and querying data in a single table.

The code below removes the table <code>med_data</code> if it already exists and then creates a new table in the database called <code>med_data</code>. We then add some column (field) names to our table, including <code>name</code>, <code>age</code>, <code>sex</code>, <code>blood pressure</code> and <code>heart rate</code>. The following lines then add some data to our table for some fictitious patients. Also note that each line ends with a semi-colon (<code>;</code>).

In [3]:
%%sql
DROP TABLE IF EXISTS med_data;
CREATE TABLE med_data ("Name", "Age", "Sex", "Blood pressure", "Heart rate");

 * sqlite://
Done.
Done.


[]

Now we have created our table with column names, we can use the <code>INSERT INTO</code> command to select the table we want to add data into (<code>med_data</code>). We then supply the data in a comma separated list using the <code>VALUES</code> command. 

In [4]:
%%sql
INSERT INTO med_data VALUES("Alan Smith", 24, "M", "120/70", 78);
INSERT INTO med_data VALUES("Maureen Gdiver", 87, "F", "156/82", 82);
INSERT INTO med_data VALUES("Adam Blythe", 54, "M", "132/73", 72);
INSERT INTO med_data VALUES("Darren Sanders", 34, "M", "120/70", 67);
INSERT INTO med_data VALUES("Sally-Ann Joyce", 19, "F", "121/72", 65);

 * sqlite://
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.


[]

To ensure that we put the right data into the right field, we can specify which fields we want to insert the values into. It is best practice to do this to avoid issues related to incorrect data entry. For example:

In [8]:
%%sql
INSERT INTO med_data (Name, Age, Sex, "Blood pressure", "Heart rate") 
VALUES("David Davies", 34, "M", "124/73", 88)

 * sqlite://
1 rows affected.


[]

<div class="alert alert-success">
<b>Note:</b>  Although not strictly necessary in all versions of SQL, SQL commands are typically written in upper case.
</div>

<a id="basicqueries"></a>
#### 1.2 Using basic queries  

To retrieve all the information in our table we can write a <code>query</code>. This is used when we want to ask a question about our data. In the example below we select all the data (using the asterisk) and specify from which table we want the data from (in this case the <code>med_data</code> table).

In [9]:
%%sql 
SELECT * FROM med_data;

 * sqlite://
Done.


Name,Age,Sex,Blood pressure,Heart rate
Alan Smith,24,M,120/70,78
Maureen Gdiver,87,F,156/82,82
Adam Blythe,54,M,132/73,72
Darren Sanders,34,M,120/70,67
Sally-Ann Joyce,19,F,121/72,65
Alan Smith,24,M,120/70,78
David Davies,34,M,124/73,88


<div class="alert alert-danger">
<b>Note:</b>  In reality we would never usually want to return the contents of an entire database, because with large amounts of data this would be very time consuming and inefficient. Instead we would typically write queries to return only sub-sets of the information we are interested in.  
</div>

We can extract individual columns by specifying the column name in place of the star symbol. For example:

In [30]:
%%sql 
SELECT "Blood pressure" FROM med_data;

 * sqlite://
Done.


Blood pressure
120/70
156/82
132/73
120/70
121/72


<div class="alert alert-block alert-info">
<b>Task 1:</b>
<br> 
In the cells below:<br />
1. Try selecting the <code>Age</code> from the <code>med_data</code> table<br> 
2. Try selecting the <code>Heart rate</code> from the <code>med_data</code> table<br> 
3. Try selecting the <code>Age</code> and the <code>Heart rate</code> together<br> 
</div>

In [None]:
%%sql 
SELECT "Age" FROM med_data;

In [None]:
%%sql 
SELECT "Heart rate" FROM med_data;

In [None]:
%%sql 
SELECT "Age","Heart rate" FROM med_data;

In [None]:
%%sql # type in your code below


In [None]:
%%sql # type in your code below


<div class="alert alert-block alert-info">
<b>Task 2:</b>
<br> 
1. Using the <code>INSERT INTO</code> add a record (row) to the <code>med_data</code> table<br />
2. Using the <code>SELECT &ast;</code> view the entire table (<code>med_data</code>)
</div>

In [None]:
%%sql 
INSERT INTO med_data VALUES("Alan Davies", 38, "M", "156/83", 67);

In [None]:
%%sql
SELECT * FROM med_data;

In [None]:
%%sql # type in your code below


In [None]:
%%sql # type in your code below


What if we make a mistake or notice an error in a record? We can use the <code>UPDATE</code> command to change existing data in a table. Let’s say that we accidentally inputted the same blood pressure for <code>Alan Smith</code> and <code>Darren Sanders</code>. Darren’s correct blood pressure recording should be <code>155/67</code>.

In [35]:
%%sql
UPDATE med_data SET "Blood pressure" = "155/67" WHERE Name = "Darren Sanders";
SELECT * FROM med_data;

 * sqlite://
1 rows affected.
Done.


Name,Age,Sex,Blood pressure,Heart rate
Alan Smith,24,M,120/70,78
Maureen Gdiver,87,F,156/82,82
Adam Blythe,54,M,132/73,72
Darren Sanders,34,M,155/67,67
Sally-Ann Joyce,19,F,121/72,65


<div class="alert alert-block alert-info">
<b>Task 3:</b>
<br> 
Although this works, can you think of any possible issues with this approach to updating data?
</div>

If there were 2 (or more) people with the same name, we would also end up changing all their blood pressure results to the same value. This could cause big problems. We will look at how to overcome this in the next workbook where we look at data types and database schemas. 

<div class="alert alert-success">
<b>Note:</b>  If you omit the <code>WHERE</code> statement, updates will be applied to all records in the table.
</div>

Another useful thing to be able to do is to delete records from a table.

Here we will delete the patient <code>Adam Blythe</code> from the table.

In [36]:
%%sql
DELETE FROM med_data WHERE Name = "Adam Blythe";
SELECT * FROM med_data;

 * sqlite://
1 rows affected.
Done.


Name,Age,Sex,Blood pressure,Heart rate
Alan Smith,24,M,120/70,78
Maureen Gdiver,87,F,156/82,82
Darren Sanders,34,M,155/67,67
Sally-Ann Joyce,19,F,121/72,65


If you wanted to delete all the records from a table you can just write: <code>DELETE FROM med_data;</code> where <code>med_data</code> is the name of the table you want to delete the records from. 

In the next notebook, we will look at using data types and modeling data using database schema's. 

### Notebook details
<br>
<i>Notebook created by <strong>Dr. Alan Davies</strong> 

Publish date: March 2021<br>
Review date: March 2022</i>

Please give your feedback using the button below:

<a class="typeform-share button" href="https://hub11.typeform.com/to/Cz5cyIIB" data-mode="popup" style="display:inline-block;text-decoration:none;background-color:#3A7685;color:white;cursor:pointer;font-family:Helvetica,Arial,sans-serif;font-size:18px;line-height:45px;text-align:center;margin:0;height:45px;padding:0px 30px;border-radius:22px;max-width:100%;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;font-weight:bold;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale;" target="_blank">Rate this notebook </a> <script> (function() { var qs,js,q,s,d=document, gi=d.getElementById, ce=d.createElement, gt=d.getElementsByTagName, id="typef_orm_share", b="https://embed.typeform.com/"; if(!gi.call(d,id)){ js=ce.call(d,"script"); js.id=id; js.src=b+"embed.js"; q=gt.call(d,"script")[0]; q.parentNode.insertBefore(js,q) } })() </script>

## Notes: