# Introduction to Databases and SQL #
## Databases ##
**Databases** are a collection of data that is stored in a computer system. Databases allow us to store, retrieve, and manipulate data. Databases are used in many applications, such as websites, mobile apps, and desktop applications. Databases use tables to store data. Each table has columns and rows. Columns represent the *attributes* of the data, and rows represent *individual records*.

## SQL ##
**SQL (Structured Query Language)** is a programming language used to interact with databases. SQL allows us to create, read, update, and delete data in a database. SQL is used to perform operations on databases, such as querying data, inserting data, updating data, and deleting data.

**The basic SQL commands are:**
- `SELECT`: Used to retrieve data from a database.
- `INSERT`: Used to insert data into a database.
- `UPDATE`: Used to update data in a database.



## Part 1 - Creating a table ##
**In the code below we will:** 
1. Connect to a new database (school.db) using duckdb.
2. Create a table called 'test' with one column: "num" which stores an integer. *Integers are whole numbers (ie. positive or negative but not a decimal)*
3. Insert some data into this table 
4. Query the table to see what we have inserted. 
5. Update the data in the table and query again to see the changes.
6. Close the connection to the database.

> Note: to allow some code to only be run once, while other code runs multiple times, we have separated them into separate cells. To run the code, click on the play button for each cell in order.


In [1]:
import duckdb

# 1. Create a new database called 'school.db' in the data folder, accessed through the con variable.
# If the database already exists, this code will connect to the existing database file.
con = duckdb.connect('../data/school.db')



In [None]:
# 2. Creating the table
# Only run this code once: if you run it again it will error (because the table already exists)
con.sql("CREATE TABLE test (num INTEGER)")
con.table("test").show()

In [None]:
# 3. This code will add a new row to the table each time you run it. You can run it as many times as you want.
# Change the number to add a different value to see this in action. 
con.sql("INSERT INTO test VALUES (69)")
# query the table
con.table("test").show()

In [None]:
con.sql("UPDATE test SET num = 17 WHERE num = 42")
con.table("test").show()

In [7]:
# We want to close the connection to avoid any issues with the database file.
# In the future we will do this using 'with:' statements (which use a context manager). 
con.close()

## Additional SQL Commands ##
In addition to `SELECT`, `INSERT`, and `UPDATE`, there are other SQL commands that can be used to interact with databases. We already saw how to create a table, but there are also commands to delete a table (`DROP TABLE`), delete data from a table (`DELETE FROM`), and alter a table (`ALTER TABLE`). These commands can be used to modify the structure of a database, add or remove data, and perform other operations.

## Part 2 - Modifying and deleting from tables ##
**In the code below we will:**
1. Reconnect to the database (school.db) using a self-closing context managers.
2. Alter the table 'test' by adding a new column called "name" which stores text data as `VARCHAR`. Set the default value of this column to "Boris".
3. Insert some new data into this table and query
4. Delete some of the data from the table and query again to see the changes.
5. Drop the table 'test' and close the connection to the database.


In [None]:
# 1. Using a context manager to avoid having to close the connection manually. We will do this every time
with duckdb.connect('../data/school.db') as con:
    # 2. Adding a new column to the table. NOTE: This will error if the column already exists (so only run it once)
    con.sql("ALTER TABLE test ADD COLUMN name VARCHAR DEFAULT 'Boris'")
    con.table('test').show()

In [None]:
# 4. Add a row to the table. Feel free to use your own values.
with duckdb.connect('../data/school.db') as con:
    con.sql("INSERT INTO test VALUES (42, 'Alice')")
    con.table('test').show()

In [None]:
# 5. Deleting rows. We'll get rid of the rows called "Boris"

with duckdb.connect('../data/school.db') as con:
    con.sql("DELETE FROM test WHERE name = 'Boris'")
    con.table('test').show()

In [None]:
# 6. Dropping the table. This will delete the table and all its contents.
with duckdb.connect('../data/school.db') as con:
    con.sql("DROP TABLE test")

## Part 3 - Writing your own SQL ##
Now that we have cleared out our table, we can start to apply our knowledge of SQL to a real-world problem. In the code below we will:

1. Create a new table called 'students' with the following columns:
    - `student_id` (integer) - primary key
    - `first_name` (text) - the first name of the student
    - `last_name` (text) - the last name of the student
    - `age` (integer) - the age of the student
    - `year_level` (text) - the grade of the student
2. Insert some data into this table
    - Create at least one row using a favourite fictional character
    - the `student_id` field should be unique. Later in this notebook, we will learn how to automatically generate unique ids. For now just start at 1 and go up 1 from there.
3. Query the table to see what we have inserted

Fill in the blanks in the code below to complete the tasks. Each task has its own cell to allow you to run and debug each operation separately

In [None]:
# 1. Creating a new table called 'students' with five columns: student_id, first_name, last_name, age, and year_level.
with duckdb.connect('../data/school.db') as con:
    con.sql("CREATE TABLE students (student_id INTEGER PRIMARY KEY, first_name VARCHAR, )")
    con.table('students').show()

In [None]:
# 2. Adding rows to the table. Add as many as you like (at least 3), just be sure to change the values and increment the id. 
# To make it sufficiently interesting, give different ages and year levels.
with duckdb.connect('../data/school.db') as con:
    con.sql("INSERT INTO students VALUES (1, 'Roger', 'Rabbit', 18, 11)")
    con.sql("INSERT INTO students VALUES (2, 'Jessica', 'Rabbit', 17, 10)")
    con.sql("INSERT INTO students VALUES (3, 'Bugs', 'Bunny', 13, 7)")
    con.sql("INSERT INTO students VALUES (4, 'Daffy', 'Duck', 10, 9)")

    con.table('students').show()
    

In [None]:
# 3. Querying the Database. To get all the values from a table, we just used con.table('table_name').show(). Now we want to write
# custom SQL queries to get specific data. 
# The first query will get the first name and age of all the students who are older than 15. 
# Uncomment the code and use the formatting of the first query to write the second query, 
# which will get all the students who have a year level of 12.

with duckdb.connect('../data/school.db') as con:
    old_students = con.sql("""
                           SELECT first_name, age 
                           FROM students 
                           WHERE age > 15
                           """)
    old_students.show()
    
    # year_12s =  con.sql("""
    #                   SELECT ___
    #                   """)
    # year_12s.show()

## Part 4 - Creating a sequence ##
At the moment we have to manually enter the `student_id` field for each row. This is not ideal as it is easy to make a mistake and enter the same student_id twice. To avoid this, we can create a sequence that will automatically generate unique ids for us.

A sequence is a database object that generates a sequence of numbers. This ensures that our ids will always be unique when we use the sequence to generate them.

**In the code below we will:**
1. Create a sequence called 'student_id_seq' that starts at 100 (to keep it safe) and increments by 1.
2. Alter the 'students' table to set the default value of the `student_id` column to the next value in the sequence.
3. Insert some new data into the 'students' table and query to see the changes.

In [6]:
# 1. Creating a sequence. This will be used to generate unique student IDs.
# NOTE: as with any CREATE or ALTER statement, we only want to run it once. 
with duckdb.connect('../data/school.db') as con:
    con.sql("CREATE sequence student_id_seq START 100")
    # 2. Altering the students table 
    con.sql("ALTER TABLE students ALTER COLUMN student_id SET DEFAULT nextval('student_id_seq')")
    

In [None]:
# 3. Adding a new student to the table. This time we don't need to specify the ID, as it will be generated automatically.
# Make sure you have at least one pair of students with the same first name, but different last names.
with duckdb.connect('../data/school.db') as con:
    # Note, because we aren't providing an id, we need to specify the columns we are adding.
    con.sql("INSERT INTO students (first_name, last_name, age, year_level) VALUES ('Roger', 'Ramjet', 17, 11)")
    con.table('students').show()

## Part 5 - Complex SQL Conditions ##

In addition to simple conditions, SQL allows us to use complex conditions to filter data. We can use `AND`, `OR`, and `NOT` to combine conditions and create more complex queries. We can also use comparison operators such as `=`, `!=`, `<`, `>`, `<=`, and `>=` to compare values. These operators can be used to filter data based on specific criteria.

**In the code below, you should:**
1. Query the `students` table to find all students who are in year 10 **and** are 15 years old.
2. Query the `students` table to find all students who are in year 12 **or** are 18 years old.
3. Query the `students` table to find all students who are **not** in year 9.

You might want to add additional rows to the `students` table to test these queries.

In [None]:
with duckdb.connect('../data/school.db') as con:
    ## It's up to you now! Include all three queries (year 10 and 15 years; year 12 or 18; not in year 9)
    pass