## Databases & Python

This is not a tutorial on databases or SQL. If you need that, there are many good tutorials out there on the internet, such as https://www.w3schools.com/sql

This is about how to inter-operate with a database using Python.

Why might to want to do this? Here are a few examples -

- design a web page that accesses the database to display product information
- use a gui to allow users to access the company database on an intranet
- analyse data by retrieving data from a database


#### SQL vs NonSQL databases

Most people think of Relational Databases (RDBMS) when talking of databases. Examples are -
- PostgreSQL
- Oracle
- MS Sql Server
- sqlite3
- MySQL

In recent years an alternative form of storing data has emerged, called NoSQL databases. A few are -
- MongoDB
- CouchDB
- Redis

Here is a brief overview of the differences -

###### RDBMS
- uses a predefined schema
- stores data in tables, rows, and columns
- structure must be set up before it can be used to store data
- difficult to modify schema once it contains any data
- uses a standard query language - SQL

###### NoSQL
- uses dynamic schema for unstructured data
- data can be stored as documents, key-value pairs, graph structures, and more 
- you can store data without having to define its structure
- each document can have its own unique structure
- each product has its own method for querying the data - no standards

NoSQL databases have become popular, and definitely have their place. However, this workshop will focus on RDBMS databases.

#### DB-API 2.0

Each RDBMS supplier publishes an API allowing it to be accessed programatically. In the early days of Python, different projects were started by different people to write a Python adaptor for each database. They worked, but as they were all different it was difficult to change from one RDBMS supplier to another.

Therefore it was agreed to produce a specification to provide consistency between the various adaptors. The first version, DB-API 1.0, was published in 1996. The second version, DB-API 2.0, was published in 1999. There have been attempts to come up with a revised version since then, but for various reasons they have not been successful. Therefore DB-API 2.0 is the standard on which all current database adaptors are based. The full specification can be read here - https://www.python.org/dev/peps/pep-0249/

Here is a selection of database adaptors -

- PostreSQL - psycopg2
- Sql Server - pyodbc
- sqlite3 - the built-in 'sqlite3' module

#### Connection object

All adaptors support the creation of a Connector object, using a method called connect().

```
import psycopg2
conn = psycopg2.connect(<connection string>)
```

The connection string will vary according the database installation.

Connection objects support the following methods -

- .cursor() returns a Cursor object (see next)

- .commit() commits the current transaction (see Transactions)

- .rollback() rolls back the current transaction (see Transactions)

- .close() closes the connection


#### Cursor object

All adaptors support the creation of a Cursor object.

```
cur = conn.cursor()
```

A cursor object is the means by which we issue SQL statements to the database and get the results.

Cursor objects support the following methods -

- .execute(statement, parameters)
      prepare and execute a SQL statement (query or command)
      parameters can be supplied as a tuple or a dictionary (see Parameters)
- .executemany(statement, sequence of parameters)
      prepare a SQL statement (query or command) and execute it against each parameter in the sequence
- .fetchone()
      fetch the next row of a result set, or None if no rows available
- .fetchmany(size=cursor.arraysize)
      fetch the next set of rows from the result set, up to a maximum of 'size'
      the set is returned as a list of tuples, or an empty list if no rows available
      size defaults to cursor.arraysize, which can set set as a cursor attribute
- .fetchall()
      fetch all remaining rows from the result set
      the set is returned as a list of tuples, or an empty list if no rows available

Although not specified in DB-API 2.0, most adaptors allow a cursor to return its results in the form of an iterator -

```
cur = conn.cursor()
cur.execute(<statement>)
for row in cur:
    [each row is returned as a tuple]
```


#### Parameters

You may want to execute a query that looks like this -

```
cur.execute(f"SELECT * FROM customers WHERE customer_name = '{customer_name}'")
```

It is strongly recommended that the query is changed to this -

```
cur.execute("SELECT * FROM customers WHERE customer_name = ?", (customer_name, ))
```

This is known as a parameterised query. The value for customer_name is replaced by a placeholder, and the value is supplied in the form of a tuple. You can have many values and many placeholders, provided that the tuple contains as many values as there are placeholders.

This is recommended for various reasons -

- if the same query is executed more than once with different customer names, the parameterised version allows the database engine to optimise the query.
- if the value for customer_name comes from an untrusted source (e.g. a web site) the un-parameterised version is vulnerable to SQL injection if the source is malicious. The parameterised version is immune to SQL injection.
- if the value is in the form of a Python object such as a datetime.date or a Decimal.decimal, the adaptor will handle converting it into the appropriate form without you having to worry about it.

A complication is that different database adaptors use different placeholders. sqlite3 and pyodbc use '?', psycopg2 uses '%s'.


#### ORMs


#### Transactions
