### Hello, databases

Databases are _very_ common software tools. They are fundamental infrastructure underlying the digital world. 

In this module, you will learn to use databases and understand how they work. This notebook will help you get setup with a particular kind of database called [SQLite](https://www.sqlite.org/index.html). There are lots of other kinds of databases too. We'll talk about that at the end of this notebook.

### Installation and setup

If you are using Anaconda, then it is very easy to import sqllite. The package is included in standard conda environments.

In [5]:
import sqlite3 as sql 

If you were able to import sqlite3 in the prior line, congrats you are all set up! If you are not using Anaconda, you might have to do some more work to install and configure sqlite3. 

In [6]:
## This cell shows you how to create a database

dbname = "shapedatabase"     # name the database

conn  = sql.connect(dbname)    # Connect to the database
cur = conn.cursor()            # Your connection to the database is maintained via a cursor

cur.execute("DROP TABLE shapes")   # This statement is just here in case you re-run your notebook; you can ignore it


<sqlite3.Cursor at 0x7f8247770340>

### Hello, SQL
We interact with databases via SQL. SQL is a computer language, sort of like Python. With Python you tell a computer *how* you want to do something, step-by-step. With SQL, you just [declare](https://en.wikipedia.org/wiki/Declarative_programming) *what* you want and then the computer figures out how to do it for you.

A SQL statement is a short command, either requesting information 
from a database or updating a database. The SQL statement below
creates a table called "shapes" in the database "shapedatabase." Please take a minute
to read over the `SQLStatement`.

The statement id `INTEGER PRIMARY KEY AUTOINCREMENT` says that there is a field called
id that is an ingeger and is aun automatically incremented primary key on the shapes table


In [7]:
SQLStatement = '''
CREATE TABLE shapes ( 
id INTEGER PRIMARY KEY AUTOINCREMENT, 
shape VARCHAR, 
color VARCHAR 
)'''     

cur.execute(SQLStatement) 

<sqlite3.Cursor at 0x7f8247770340>

**Questions**

Using the [docs](https://sqlite.org/lang_createtable.html) as a reference, what do you think `CREATE TABLE` does in the statement above?

[Type your answer here] 

Again using the docs, what does `VARCHAR` mean in the `SQLStatement` above?

[Type your answer here] 

Try running `cur.execute(SQLStatement)` a second time. What happens? Why do you think this occurs?

[Type your answer here]

In [8]:
### This code prints out the structure of the database 

SQLStatement = '''pragma table_info('shapes')'''     # this command requests the schema for the table shapes 
                                                     # A schema shows the layout of a database

sth = cur.execute(SQLStatement).fetchall()

print("cid", "name", "type", "primary key")
for s in sth:
    print(s[0], s[1], s[2], s[-1])

cid name type primary key
0 id INTEGER 1
1 shape VARCHAR 0
2 color VARCHAR 0


**Questions** 

How many rows are printed out above? Why do you think that is the case?

[Type your answer here]

What do you think the cid column might represent?

[Type your answer here]

Why is there a 1 in the primary key column for the first row?

[Type your answer here]


### Hello, insert statement

The whole point of a database is to store information. An [INSERT](https://www.sqlitetutorial.net/sqlite-insert/) statement is a kind of SQL statement that adds rows to a database. Please refer to the pre-class lecture for more details.

In [9]:
# If you run this code, it will insert a row into the table
insert_statement = '''INSERT INTO shapes (shape, color) VALUES ("square", "red")''' 
 
cur.execute(insert_statement)  # this line tells SQLite to run your insert statement
 

<sqlite3.Cursor at 0x7f8247770340>

### Hello, query statement

After you insert information into a database, you them must use a [SELECT](https://www.sqlitetutorial.net/sqlite-select/) statement to query rows from the database. Please refer to the pre-class lecture for more details.

In [11]:
# An QUERY statement is a kind of SQL statement that selects rows from a database
# This statement selects data from the id, shape and color columns in the database 

query_statement = '''SELECT id, shape, color FROM shapes''' 
 
sth = cur.execute(query_statement) 
results = sth.fetchall() 
for i in results: 
    print(i) 

(1, 'square', 'red')


In [14]:
# If you run this code, it will insert more rows into the table


insert_statement = '''INSERT INTO shapes (shape, color) VALUES ("triangle", "blue")''' 
cur.execute(insert_statement)  # this line tells SQLite to run your insert statement
 
insert_statement2 = '''INSERT INTO shapes (shape, color) VALUES ("triangle", "green")''' 
cur.execute(insert_statement2)  # this line tells SQLite to run your 2nd insert statement

insert_statement2 = '''INSERT INTO shapes (shape, color) VALUES ("circle", "blue")''' 
cur.execute(insert_statement2)  # this line tells SQLite to run your 2nd insert statement

<sqlite3.Cursor at 0x7f8247770340>

**Question**

What do you think will happen if you run the query `SELECT id, shape, color FROM shapes` again, now that you have run more insert statements? 

[Type your answer here]

In [15]:
query_statement = '''SELECT id, shape, color FROM shapes''' 
 
sth = cur.execute(query_statement) 
results = sth.fetchall() 
for i in results: 
    print(i) 

(1, 'square', 'red')
(2, 'triangle', 'blue')
(3, 'triangle', 'green')
(4, 'circle', 'blue')


### Hello, WHERE clause 

Query statements will select all rows from database. When you use a WHERE clause in a SELECT statement, it specifies that you are only selecting rows that match a certain condition

In [17]:
'''The statement below selects all circles from the database'''

query_statement = '''SELECT id, shape, color FROM shapes WHERE shape ="circle"''' 
 
sth = cur.execute(query_statement)
results = sth.fetchall() 
for i in results: 
    print(i) 


(4, 'circle', 'blue')


In [15]:
# Question: select all red shapes from the database

newSQLStatement = None # implement me! 

# [type your code here. Your code should print out all of the red shapes from the database]

### The wide world of databases

In this notebook, you got set up with SQLite and got practice with a few basic SQL commands. It is worth pointing out that there are lots and lots of different kinds of databases. We just picked SQLite for this assignment set because it is easy to get working. In your career in information science, you will have many database choices. 

**Postgres** and **MySQL** are standard, established and popular open-source, relational databases. Postgres in particular is a great choice for a reliable and performant database to support many applications (beware, Postgres can be slightly annoying to set up). 

Beyond reliable favorites like Postgres and MySQL, there are lots and lots of other kinds of specialized databases, which fill different software niches and use cases. For instance, standard databases store records on disk, but **VoltDB** stores records in memory. Some databases like **Oracle** are designed to support complex permission structures (i.e. who can access what record) which are sometimes needed at large organizations. **CockroachDB** replicates information in many machines across the cloud, to ensure that information is always accessible. Google **BigQuery** supports a SQL-like API for records stored on Google cloud. A SQL-like API lets you add, retrieve and update records using SQL commands.

In general, when people say “database” they mean something that supports a SQL-like API and guarantees four properties anytime you interact with the API. There are many, many articles online explaining the [ACID](https://en.wikipedia.org/wiki/ACID) properties. Recently, there has also been interest in [key-value stores](https://en.wikipedia.org/wiki/Key%E2%80%93value_database) which offer an alternative API that stores information in a systematic manner without SQL. One common key-value store is **MongoDB**.

**Question**

VoltDB stores records in memory but SQLite stores records on disk. Why do you think someone designed an in-memory database? What might be the advantage of this sort of software? What kinds of applications might be suited to VoltDB? (Hint: accessing disk is slow)

[Type your answer here]


The point of all that is not to overwhelm you! Instead, imagine that you are walking into the tool aisle at Home Depot. There are hundreds and hundreds of different tools on the shelves. When you install SQLite, you are essentially picking one particular kind of wrench. That's just a good thing to keep in mind. 