### Adapted from ST2195 - Programming for Data Science
### Authors: Christine Yuen

####  Creating and manipulating databases in Python


** Please run the codes in order **

# Using database with Python

We have shown how to create, update and query a database using DB Browser for SQLite. Now we will illustrate how the same thing can be done in Python. Again we will continue using the University example (see the data folder).

# Connect to database using Python

We import the module `sqlite3` and use the function `connect` to create an object, `conn`, to connect to the SQLite driver to manipulate the database `university`.

In [1]:
# This makes sure you can run this notebook multiple times without errors
# make sure of updating the path to your files 
# /content is the path inside Google Colab
import os 
try:
    os.remove('/content/University.db')
except OSError:
    pass

In [2]:
import sqlite3

# make sure of updating the path to your files 
# /content is the path inside Google Colab
conn = sqlite3.connect('/content/University.db')

# Creating tables using Python

Now we are going to create some tables to the database `University`. Like before, we will create the tables using the data saved in the CSV files. We first load the CSV files into `DataFrame` in Python:

In [3]:
import pandas as pd

# make sure of updating the path to your files
student = pd.read_csv("/content/student.csv")
course = pd.read_csv("/content/course.csv")
grade = pd.read_csv("/content/grade.csv")

We then write records stored in the DataFrames `student`, `grade` and `course` as tables to the database `University` using the `DataFrame` method `to_sql`.

In [4]:
# index = False to ensure the DataFrame row index is not written into the SQL tables
student.to_sql('Student', con = conn, index = False) 
course.to_sql('Course', con = conn, index = False)
grade.to_sql('Grade', con = conn, index = False)

Again, we can check if the database is created properly by opening the databse in DB Browser for SQLite and browse the tables.

# Manipulate databases using Python


We can manipulate databases in Python by the `execute` and `fetchall` methods from the `sqlite3` module. This allows us to leverage the SQL commands we have learned to manipulate the databases in Python. We first need to create a cursor object `c`:

In [5]:
c = conn.cursor()

After that, we can execute the SQL commands we learned before using the function `execute` and `fetchall`. For example, if we want to get all the tables in the database, we can run:

In [6]:
c.execute('''
SELECT name 
  FROM sqlite_master 
 WHERE type='table'
''')

<sqlite3.Cursor at 0x7fa62d5992d0>

The result is not returned until we run `fetchall`:

In [7]:
c.fetchall()

[('Student',), ('Course',), ('Grade',)]

We can see there are 3 tables in the database. If we want to browse the table `Student` we can run:

In [8]:
c.execute("SELECT * FROM Student").fetchall()

[(201921323, 'Ava Smith', 2),
 (201832220, 'Ben Johnson', 3),
 (202003219, 'Charlie Jones', 1),
 (202045234, 'Dan Norris', 1),
 (201985603, 'Emily Wood', 1),
 (201933222, 'Freddie Harris', 2),
 (201875940, 'Grace Clarke', 2)]

Note here we combine the use of `execute` and `fetchall` in one line.

## Add a new table

We can add a new table by running the SQL command through `execute`:

In [9]:
c.execute('''
CREATE TABLE Teacher (
    staff_id TEXT PRIMARY KEY,
        name TEXT)
''')
conn.commit() # save (commit) the changes

When we list the tables, we can see four tables.

In [10]:
c.execute('''
SELECT name 
  FROM sqlite_master 
 WHERE type='table'
''')

<sqlite3.Cursor at 0x7fa62d5992d0>

## Delete a table

We can delete a table by running the SQL command through `execute`:

In [11]:
c.execute("DROP TABLE Teacher")
conn.commit() 

When we list the tables, we can see three tables.

In [12]:
c.execute('''
SELECT name 
  FROM sqlite_master 
 WHERE type='table'
''')

<sqlite3.Cursor at 0x7fa62d5992d0>

## Insert tuples / rows
Insert the year 1 student Harper Taylor with student id 202029744 to Student:

In [13]:
c.execute("INSERT INTO Student VALUES(202029744, 'Harper Taylor', 1)")
conn.commit() 

When we browse the table, we can see the new row is added.

In [14]:
c.execute("SELECT * FROM Student").fetchall()

[(201921323, 'Ava Smith', 2),
 (201832220, 'Ben Johnson', 3),
 (202003219, 'Charlie Jones', 1),
 (202045234, 'Dan Norris', 1),
 (201985603, 'Emily Wood', 1),
 (201933222, 'Freddie Harris', 2),
 (201875940, 'Grace Clarke', 2),
 (202029744, 'Harper Taylor', 1)]

## Update tuples / rows

Update the student id of student Harper Taylor to 201929744:

In [15]:
c.execute('''
UPDATE Student
   SET student_id = "201929744"
 WHERE name = "Harper Taylor"
''')
conn.commit()

When we browse the table, we can see the row has changed.

In [16]:
c.execute("SELECT * FROM Student").fetchall()

[(201921323, 'Ava Smith', 2),
 (201832220, 'Ben Johnson', 3),
 (202003219, 'Charlie Jones', 1),
 (202045234, 'Dan Norris', 1),
 (201985603, 'Emily Wood', 1),
 (201933222, 'Freddie Harris', 2),
 (201875940, 'Grace Clarke', 2),
 (201929744, 'Harper Taylor', 1)]

## Delete tuples / rows

Delete the record for the student Harper Taylor from table `Student`:

In [17]:
c.execute('''
DELETE FROM Student
 WHERE name = "Harper Taylor"
''')
conn.commit()

When we browse the table, we can see the row has been removed.

In [18]:
c.execute("SELECT * FROM Student").fetchall()

[(201921323, 'Ava Smith', 2),
 (201832220, 'Ben Johnson', 3),
 (202003219, 'Charlie Jones', 1),
 (202045234, 'Dan Norris', 1),
 (201985603, 'Emily Wood', 1),
 (201933222, 'Freddie Harris', 2),
 (201875940, 'Grace Clarke', 2)]

# Query the database using Python

We can query databases in Python by the `execute` and `fetchall` methods from the `sqlite3` module which performs SQL commands. Here we display the results as `Pandas` `DataFrame`. The SQL commands used here have been discussed in the previous notebooks.

## Example 1: get all the grades for the course with `course_id` `ST101`

In [19]:
q1 = c.execute('''
SELECT final_mark 
  FROM Grade 
 WHERE course_id = 'ST101'
''').fetchall()

import pandas as pd
pd.DataFrame(q1)

Unnamed: 0,0
0,78
1,60
2,47


## Example 2: get the names of the students who took the course with course_id `ST101` in alphabetical order

In [20]:
q2 = c.execute('''
SELECT Student.name
  FROM Grade, Student
 WHERE Grade.course_id = 'ST101' AND Student.student_id = Grade.student_id
 ORDER BY Student.name
''').fetchall()

pd.DataFrame(q2)

Unnamed: 0,0
0,Ava Smith
1,Charlie Jones
2,Emily Wood


## Example 3: get the name of the courses taken by the student Ava Smith or Freddie Harris

In [21]:
q3 = c.execute('''
SELECT DISTINCT Course.name
  FROM Student, Grade, Course
 WHERE (Student.name ='Ava Smith' or Student.name = 'Freddie Harris') AND 
        Student.student_id = Grade.student_id AND Course.course_id = Grade.course_id
''').fetchall()

pd.DataFrame(q3)

Unnamed: 0,0
0,Programming for Data Science
1,Managing and Visualising Data
2,Databases


## Example 4: calculate the average mark for each course corresponding to the `course_id`

In [22]:
q4 = c.execute('''
SELECT course_id, AVG(final_mark) as avg_mark 
  FROM Grade 
 GROUP BY course_id
''').fetchall()

pd.DataFrame(q4)

Unnamed: 0,0,1
0,ST101,61.666667
1,ST115,82.333333
2,ST207,66.5


# Disconnecting from the database

After we finish manipulating the database, we can close the connection using the method `close` on `conn`:

In [23]:
conn.close()