# Introduction to Databases and SQL #
## Databases ##
**Databases** are a collection of data that is stored in a computer system. Databases allow us to store, retrieve, and manipulate data. Databases are used in many applications, such as websites, mobile apps, and desktop applications. Databases use tables to store data. Each table has columns and rows. Columns represent the *attributes* of the data, and rows represent *individual records*.

## SQL ##
**SQL (Structured Query Language)** is a programming language used to interact with databases. SQL allows us to create, read, update, and delete data in a database. SQL is used to perform operations on databases, such as querying data, inserting data, updating data, and deleting data.

**The basic SQL commands are:**
- `SELECT`: Used to retrieve data from a database.
- `INSERT`: Used to insert data into a database.
- `UPDATE`: Used to update data in a database.



## Part 1 - Creating a table ##
**In the code below we will:** 
1. Connect to a new database (school.db) using duckdb.
2. Create a table called 'test' with one column: "num" which stores an integer. *Integers are whole numbers (ie. positive or negative but not a decimal)*
3. Insert some data into this table 
4. Query the table to see what we have inserted. 
5. Update the data in the table and query again to see the changes.
6. Close the connection to the database.

> Note: to allow some code to only be run once, while other code runs multiple times, we have separated them into separate cells. To run the code, click on the play button for each cell in order.


In [2]:
# You just have to run this cell once to load the database.
import duckdb
import pandas as pd

%load_ext sql
conn = duckdb.connect('../data/school.db')
%sql conn --alias duckdb

In [3]:
%%sql
CREATE OR REPLACE TABLE test (num INTEGER);

Count


In [4]:
%%sql

-- This code will add a new row to the table each time you run it. You can run it as many times as you want.
-- Change the number to add a different value to see this in action. 
INSERT INTO test VALUES (42);
CHECKPOINT;

SELECT * FROM test

num
42


In [5]:
%%sql
-- Change the value of a row in the table.
UPDATE test SET num = 17 WHERE num = 42;

CHECKPOINT;
SELECT * FROM test;

num
17


## Additional SQL Commands ##
In addition to `SELECT`, `INSERT`, and `UPDATE`, there are other SQL commands that can be used to interact with databases. We already saw how to create a table, but there are also commands to delete a table (`DROP TABLE`), delete data from a table (`DELETE FROM`), and alter a table (`ALTER TABLE`). These commands can be used to modify the structure of a database, add or remove data, and perform other operations.

## Part 2 - Modifying and deleting from tables ##
**In the code below we will:**
1. Reconnect to the database (school.db) using a self-closing context managers.
2. Alter the table 'test' by adding a new column called "name" which stores text data as `VARCHAR`. Set the default value of this column to "Boris".
3. Insert some new data into this table and query
4. Delete some of the data from the table and query again to see the changes.
5. Drop the table 'test' and close the connection to the database.


In [6]:
%%sql
-- Adding a new column to the table. NOTE: This will error if the column already exists (so only run it once)
ALTER TABLE test ADD COLUMN name VARCHAR DEFAULT 'Boris';

SELECT * FROM test;

num,name
17,Boris


In [7]:
%%sql
-- 4. Add a row to the table. Feel free to use your own values.
INSERT INTO test VALUES (42, 'Alice');

SELECT * FROM test;

num,name
17,Boris
42,Alice


In [8]:
%%sql
DELETE FROM test WHERE name = 'Boris';

SELECT * FROM test;

num,name
42,Alice


In [9]:
%%sql
DROP TABLE test;

Success


## Part 3 - Writing your own SQL ##
Now that we have cleared out our table, we can start to apply our knowledge of SQL to a real-world problem. In the code below we will:

1. Create a new table called 'students' with the following columns:
    - `student_id` (integer) - primary key
    - `first_name` (text) - the first name of the student
    - `last_name` (text) - the last name of the student
    - `age` (integer) - the age of the student
    - `year_level` (integer) - the grade of the student
2. Insert some data into this table
    - Create at least one row using a favourite fictional character
    - the `student_id` field should be unique. Later in this notebook, we will learn how to automatically generate unique ids. For now just start at 1 and go up 1 from there.
3. Query the table to see what we have inserted

Fill in the blanks in the code below to complete the tasks. Each task has its own cell to allow you to run and debug each operation separately

In [10]:
%%sql
-- 1. Creating a new table called 'students' with five columns: student_id, first_name, last_name, age, and year_level.
-- Uncomment the code and complete it to create the table.

-- CREATE TABLE students (student_id INTEGER PRIMARY KEY, first_name VARCHAR, );


SELECT * FROM students;

student_id,first_name,last_name,age,year_level
1,Roger,Rabbit,18,11
2,Jessica,Rabbit,17,10
3,Bugs,Bunny,13,7
4,Daffy,Duck,10,9
100,Charlie,Bucket,17,11
101,Roger,Ramjet,17,11


In [11]:
%%sql

-- 2. Adding rows to the table. Add as many as you like (at least 5), just be sure to change the values and increment the id. 
-- Be sure to include different ages and year levels. The id must be unique for each row.
-- INSERT INTO students VALUES (1, 'Roger', 'Rabbit', 18, 11);

CHECKPOINT;

SELECT * FROM students;
    

student_id,first_name,last_name,age,year_level
1,Roger,Rabbit,18,11
2,Jessica,Rabbit,17,10
3,Bugs,Bunny,13,7
4,Daffy,Duck,10,9
100,Charlie,Bucket,17,11
101,Roger,Ramjet,17,11


In [12]:
%%sql
-- 3. Querying the Database. To get all the values from a table, we just used con.table('table_name').show(). Now we want to write
-- custom SQL queries to get specific data. 
--  This first query will get the first name and age of all the students who are older than 15. 
-- Uncomment the code and use the formatting of the first query to write the second query, 
-- which will get all the students who have a year level of 12.
 
SELECT first_name, age 
    FROM students 
    WHERE age > 15;


first_name,age
Roger,18
Jessica,17
Charlie,17
Roger,17


In [13]:
%%sql

-- Write your own query to get the first name and last name of all students in year 12.



UnboundLocalError: cannot access local variable 'result' where it is not associated with a value

## Part 4 - Creating a sequence ##
At the moment we have to manually enter the `student_id` field for each row. This is not ideal as it is easy to make a mistake and enter the same student_id twice. To avoid this, we can create a sequence that will automatically generate unique ids for us.

A sequence is a database object that generates a sequence of numbers. This ensures that our ids will always be unique when we use the sequence to generate them.

**In the code below we will:**
1. Create a sequence called 'student_id_seq' that starts at 100 (to keep it safe) and increments by 1.
2. Alter the 'students' table to set the default value of the `student_id` column to the next value in the sequence.
3. Insert some new data into the 'students' table and query to see the changes.

In [None]:
%%sql
;
-- 1. Creating a sequence. This will be used to generate unique student IDs.
--  NOTE: as with any CREATE or ALTER statement, we only want to run it once. 
CREATE OR REPLACE sequence student_id_seq START 100;

--  2. Altering the students table 
ALTER TABLE students ALTER COLUMN student_id SET DEFAULT nextval('student_id_seq');

CHECKPOINT;

In [None]:
%%sql
-- 3. Adding a new student to the table. This time we don't need to specify the ID, as it will be generated automatically.
-- Make sure you have at least one pair of students with the same first name, but different last names.
INSERT INTO students (first_name, last_name, age, year_level) VALUES ('Roger', 'Ramjet', 17, 11);
    

## Part 5 - Complex SQL Conditions ##

In addition to simple conditions, SQL allows us to use complex conditions to filter data. We can use `AND`, `OR`, and `NOT` to combine conditions and create more complex queries. We can also use comparison operators such as `=`, `!=`, `<`, `>`, `<=`, and `>=` to compare values. These operators can be used to filter data based on specific criteria.

**In the code below, you should:**
1. Query the `students` table to find all students who are in year 10 **and** are 15 years old.
2. Query the `students` table to find all students who are in year 12 **or** are 18 years old.
3. Query the `students` table to find all students who are **not** in year 9.

You might want to add additional rows to the `students` table to test these queries.

In [None]:
%%sql

-- It's up to you now! 
-- 1. All students who are in year 10 AND are 15 years old.
SELECT * FROM STUDENTS WHERE year_level = 10 AND age = 15;

In [None]:
%%sql
-- 2. All students who are in Year 12 OR are 18 years old.



In [None]:
%%sql
-- 3. All students who are not in Year 9



In [15]:
%%sql
CHECKPOINT;

Success
