# SQLite Python excercise

In [None]:
import sqlite3

## Intro

Create DB

In [None]:
# database represented as a 'connection'
# new database is created if it does not exist on given name
con = sqlite3.connect("movie.db")

Database interface is called 'cursor'

In [None]:
cur = con.cursor()

The cursor allows us to interact with the database using SQL

In [None]:
cur.execute("CREATE TABLE movie (title, year, score)")


Add data:

In [None]:
cur.execute("""
    INSERT INTO movie VALUES
        ('Monty Python and the Holy Grail', 1975, 8.2),
        ('And Now for Something Completely Different', 1971, 7.5)
""")
con.commit() # changes must be committed

Query data:

In [None]:
res = cur.execute("SELECT * FROM movie")
res.fetchall()

You can also add multiple rows using a one-liner:

In [None]:
data = [
    ("Monty Python Live at the Hollywood Bowl", 1982, 7.9),
    ("Monty Python's The Meaning of Life", 1983, 7.5),
    ("Monty Python's Life of Brian", 1979, 8.0),
]
cur.executemany("INSERT INTO movie VALUES(?, ?, ?)", data)
con.commit()

Notice that ? placeholders are used to bind data to the query. Always use placeholders instead of string formatting to bind Python values to SQL statements, to avoid SQL injection attacks.

What is an SQL injection attack? Consider the following:

In [None]:
# let's imagine we ask a user for their information. Instead
# they give us the following data:
row = "('dmbfkl', 1,1);DROP TABLE movie"

cur.executescript(f"INSERT INTO movie VALUES {row}")

The 'DROP TABLE' injection has destroyed the table

In [None]:
try:
    cur.execute("SELECT * FROM movie")
except Exception as e:
    print(e)


## Excercise 1: load xlsx and store rows in a single sqlite table

Data: Hus ostolaskut 22Q1

In [None]:
import pandas as pd

In [None]:
# read only 1000 rows, because the file is huge!
pd.read_excel("https://www.hus.fi/sites/default/files/2022-04/husin-ostolaskutiedot-q1-2022.xlsx", nrows = 1000)

In [None]:
#  your code here

## Exercise 2: There is a change! 

A provider called 'HENRY INVESTMENT OY' changes their company name to 'HENRY CAPITAL OY'. Perform the change on the records in the database. 

In [None]:
# your code here

## Exercise 3: Time for normalization!

From the previous example, you may have noticed how impractical it is to have all data in single master table. There is a lot of duplicate information risking breaking the data in face of change, and all operations need to handle the whole table and are thus very expensive. This is why normalization is needed. 

Normalization means breaking the table into entities and relations. The fundamental principle is to write each unit of information only once in the database, except for keys (IDs). There are exceptions, but this is what the aim is, usually. 

First, perform entity relationship modelling of the table (pen-and-paper or for example using https://erdplus.com). What are primary and foreign keys, attributes and constraints for the tables? Are the relations one-to-one or one-to-many?

Then, create and fill the tables according the ER model.

In [None]:
# your code here

Finally, try to perform the change of name from 'HENRY INVESTMENT OY' to HENRY CAPITAL OY'. How many rows do you need to change?

In [None]:
# your code here

BONUS: using your normalized tables, for each unit ('tulosyksikkö') figure out which provider ('toimittaja') has billed highest cost during the month of January in 2022, and how much the total bill sum is.

# your code here