# Introduction to Data Persistence

-----

Persisting data is an important task, and not just for data science applications. Programs may need to persist data to ensure state, to share information, and to improve performance. As a result, many different approaches exist for saving data, spanning everything from basic file input/output techniques to enterprise-level database management software. In this lesson, we will introduce the most popular data persistence technique-RDBMS. We will also introduce a Python built-in, file based database, SQLite.

-----

## Table of Contents

[Database Systems](#Database-Systems)

- [Database Roles](#Database-Roles)
- [ACID Test](#ACID-Test)
- [Relational Database](#Relational-Database)
- [SQL: Structured Query Languag](#SQL:-Structured-Query-Language)

[SQLite](#SQLite)

- [SQLite Overview](#SQLite-Overview)
- [SQLite Data Types](#SQLite-Data-Types)

[Using SQLite](#Using-SQLite)
- [Creating and Populating a Database](#Creating-and-Populating-a-Database)
- [SQL Select](#SQL-Select)
- [SQL Delete](#SQL-Delete)
- [SQL Update](#SQL-Update)

-----
[[Back to TOC]](#Table-of-Contents)

## Database Systems

Whether you realize it or not, as you surf the Internet you're interacting with a variety of database-backed web applications. This nomenclature may be unfamiliar, but it simply means that a website you visit is dynamically created using data saved in a database. To demonstrate, consider the following types of web sites that you may visit:

- An information portal, like [Yahoo][1]

- A newspaper web site to catch up on the local news or sports

- A financial web site, like that of a bank or investment institution, to monitor your financial portfolio

- A map website to find driving directions

- A search engine where you can identify interesting web sites for more detailed information on a subject

Each of these examples use databases to store, locate, and retrieve information dynamically. In each of these applications, the website collects necessary information from the user (such as a street address), queries the application database, and collects the data that has been requested into a suitable visual result.

Many of these database systems are large and complex, imagine holding all the map information needed to provide accurate driving directions with pictures! Clearly, storing data and making it available to applications is a big task, one that has been addressed by a number of commercial vendors who provide different solutions that are optimized for different tasks. Many of these open-source or commercial database systems provide full, enterprise-class capabilities. As a result, they can hold enormous quantities of data, concurrently interact with a large number of users, and scale across large computational systems.

We can broadly classify these systems into two categories:

1. Relational database management systems like the open-source [MySQL][2] and [PostgreSQL][3], and commercial systems like [IBM DB2][4], [Microsoft SQL Server][5], or [Oracle Database][6] that rely on a tabular data model.

2. NoSQL (or Not only SQL) systems that abandon the tabular data model to achieve a simpler design, better scaling or higher availability than is traditionally possible with relational databases. NoSQL databases can be classified based on their data model, and include key-store databases like Amazon's [Dynamo][7], Object Databases like [ZopeDB][8], document store databases like [MongoDB][9], and column databases like  [Cassandra][10] or [HBase][11], which are open source implementations of Google's [BigTable][bt] model.

While the NoSQL databases are extremely interesting, many of them have been developed to meet the **big data** challenges faced by companies like Google, Facebook, or Amazon. For the rest of this week's lessons, we will focus on relational database systems.

-----
[rdb]: https://en.wikipedia.org/wiki/Relational_database
[nosql]: https://en.wikipedia.org/wiki/NoSQL
[1]: http://yahoo.com
[2]: https://www.mysql.com
[3]: http://www.postgresql.org
[4]: http://www-01.ibm.com/software/data/db2/
[5]: http://www.microsoft.com/en-us/server-cloud/products/sql-server/
[6]: https://www.oracle.com/database/index.html
[7]: https://aws.amazon.com/dynamodb/
[8]: http://www.zodb.org/en/latest/
[9]: https://www.mongodb.org
[bt]: https://en.wikipedia.org/wiki/BigTable
[10]: https://cassandra.apache.org
[11]: https://hbase.apache.org

### ACID Test

Diamonds are obviously a valuable commodity, so valuable that counterfeits are a serious concern. One simple and (at least, in the movies) popular test to determine whether a diamond is real is to run it across a piece of glass. Because diamonds are one of the hardest materials known, a real diamond easily cuts the glass surface; a fake, especially if it's made of glass itself, won't.

To a software developer, databases are equally valuable. If you use a database, you want to be sure it will safely store your data and let you easily retrieve the data later. You also want your database to allow multiple programs (or people) to work with the database without interfering with each other. To demonstrate, imagine you own a bank. The database for your bank must do the following, among other things:

- Safely store the appropriate data
- Quickly retrieve the appropriate data
- Support multiple, concurrent user sessions

These tasks can be collectively referred to as the ACID test; ACID is an acronym for Atomicity, Consistency, Isolation, and Durability.

**Atomicity** means that operations with the database can be grouped together and treated as a single unit. This unit of work, typically bundled as a *transaction*, ensures that either all operations are performed successfully (a *commit*), or none of them are performed (a *rollback*). In other words, a database can't be in an unfinished state.

**Consistency** guarantees that only valid data are written to the database, and that this process follows all rules and constraints present in the database.

To understand why these two characteristics are important, think about a bank transaction during which money is transferred from a savings account into a checking account. If the transfer process fails after subtracting the money from your savings account and before it was added to your checking account, you would lose money, and the bank would have an angry (ex)customer! Atomicity enables the two operations: the subtraction from the savings account and the addition to the checking account, to be treated as a single transaction. Consistency guarantees only valid data is written during the transaction or rollback. That way, your money isn't lost.

**Isolation** means that independent sets of database transactions are performed in such a way that they don't conflict with each other. Continuing the bank analogy, consider two customers who transfer funds between accounts at the same time. The database must track both transfers separately; otherwise, the funds could go into the wrong accounts, and the bank might be left with two angry (ex)customers.

**Durability** guarantees that the database is safe against unexpected terminations. It may be a minor inconvenience if your television or computer doesn't work when the power goes out, but the same can't be said for a database. If the bank's computers lose power when transferring your funds, you won't be a happy customer if the transaction is lost. Durability guarantees that if the database terminates abnormally during a funds transfer, then when the database is brought back up, it will be able to recover the transaction and continue with normal operations.

Passing the ACID test is nontrivial, and many simple databases fall short. For critical e-business or Web-based applications, passing the ACID test is a must. This is one of the reasons so many companies and individuals utilize enterprise-level database systems, such as Oracle Database, Microsoft SQL Server, MySQL, or IBM DB2. These databases are fully compliant with the ACID test and can meet many of the data persistence needs of large corporations or organizations. However, to do so often requires a large team that includes database administrators, database developers, and database application developers to ensure that data is effectively persisted and available as necessary for business applications.

-----

### Relational Database

A relational database stores data in relations made up of records with fields. The relations are usually represented as tables; each record is usually shown as a row, and the fields as columns. In most cases, each record will have a unique identifier, called a key, which is stored as one of its fields. Records may also contain keys that refer to records in other tables, which enables us to combine information from two or more sources. 
<img src="images/db.png"/>


Related tables are often grouped together into a schema. You can think of a schema as a container for all the related structure definitions within a particular database. A table name must be unique within a given schema. Thus, by using schemas, you can have identically named objects (such as tables) enclosed within different schemas. 

In an abstract sense, these database concepts may seem confusing, but in practice they're fairly straightforward. For example, imagine you own a store called Bigdog's Surf Shop that sells a variety of items like sunglasses, shirts, and so on. If you want to be profitable, you must keep a close eye on your inventory so you can easily order additional inventory or change vendors to keep your overhead to a minimum. One simple method for tracking this information is to write entries in a table-like format:

**<DIV ALIGN=CENTER>Suppliers Table </DIV>**

| Supplier# | Supplier Name |
| ----- | ---------- |
| 101 | Mikal Arroyo |
| 102 | Quiet Beach Industries |

**<DIV ALIGN=CENTER>Product Table </DIV>**

| Item# | Price | Supplier# | Stock Date | Description |
| ---- | ----- | ---- | ---------- | ----------- |
| 1 | 29.95 | 101 | 1/15/15 | Basic Sunglasses |
| 2 | 9.95 | 101 | 12/14/14 | Generic Shirt |
| 3 | 99.95 | 102 | 8/04/14 | Boogie Board |



From this simple visual design you can easily map the business logic straight into database tables. You have two database tables, `Suppliers` and `Products`, which are naturally linked by the supplier number. The data types for the columns in each table are easy to determine. Later in this lesson we will actually create this sample schema for Bigdog's Surf Shop, which consists of these two tables, in a SQLite database. 

### SQL: Structured Query Language

Database systems can be complex pieces of software, especially when they scale to support enterprise-level applications. As a result, you may expect that every database has its own application programming interface (API) and that these APIs may be different from one system to the next. When relational databases were first developed, this was the case; but, fortunately, a number of vendors agreed to develop a standard language for accessing and manipulating relational databases. This language is officially called Structured Query Language (or SQL, pronounced sea-quill). 

SQL has two main components: a Data Definition Language (DDL) and a Data Manipulation Language (DML). DDL commands are used to create, modify, or delete items (such as tables) in a database. DML commands are used to add, modify, delete, or select data from a table in the database. We will use DDL to create new tables and populate data in a database in this lesson. We will use the DDL command to access data in a database in the next lesson.

[[Back to TOC]](#Table-of-Contents)

## SQLite

But not all applications are this demanding, especially when you're starting out and trying to learn the basic relational database concepts. If you're just learning to work with databases, or if you want to quickly prototype a database application, most commercial database systems can be cumbersome. Fortunately, open-source, ACID-compliant database systems exist, including the zero-configuration, serverless relational database system known as [SQLite][1]. By using SQLite, you can learn to work with a relational database either by using SQL or by using the Python programming language. If you later find your application needs a more powerful database system, you can always migrate your efforts to a more powerful database system.

-----

[1]: https://www.sqlite.org

### SQLite Overview

SQLite is quite different than traditional relational database systems. SQLite does not have a separate server process, instead SQLite is a software library that, as the website states:

> implements a self-contained, serverless, zero-configuration,
> transactional SQL database engine.

Before progressing, lets examine each of these concepts in turn:

- *Self-contained*: Nothing else is needed to use SQLite but the software library. Since, by default,  this comes with Python, we can use SQLite without any additional software downloads or installs. In addition, if you want to embed SQLite in your own application, you can obtain a single ANSI-C file that contains the entire SQLite library.

- *Serverless*: We interact with the SQLite database by using the SQLite library. The database is stored in a single file that is platform independent (so you can simply copy it over to a new machine with no further effort).

- *Zero-configuration*: SQLite does not use a server process, so there is no configuration required. While you can customize SQLite to change [default limits][1], for most applications this is unnecessary. You can also pre-specify certain options for the `sqlite3` command line client in a separate configuration file (e.g., `.sqliterc`, which is located in the current user's home directory).

- *Transactional*: A transaction is a logical set of operations. SQLite is ACID-complaint by implementing [atomic commits][2], which means that either every operation within the transaction completes successfully or none of them do. No partial writes are persisted, so that the database is always in a consistent state.

With this power, it is even more surprising that the SQLite library is quite small and can be compacted to as small as **300 kb** if required.

SQLite by default will store data in a single database file; however, it can also be used as an _in memory_ database. SQLite has been distributed as a component within the Python language for many years, but also has a stand-alone command line interface client, called `sqlite3` that we will use in this lesson to create a database, create schema within that database, and to import data.

-----
[1]: https://www.sqlite.org/limits.html
[2]: https://www.sqlite.org/atomiccommit.html

### SQLite Data Types

SQL, being a programming language in its own right, defines a rich data-type hierarchy. Persisting these data types is one of the most important responsibilities of the database. As databases have become more powerful, this type hierarchy has grown more complex. But most simple databases don't require the full range of allowed types, and often they need to store only numerical, character, and date or time data. 

While the SQL standard defines basic [data types][1], different database systems can support the standard to varying degrees. While this might seem odd, doing so provides more flexibility in allowing a particular implementation to achieve a market niche. In the case of SQLite, the design decisions support a compact, zero-configuration database file that is platform-independent. As a result, [SQLite does not support][2] a rich data type hierarchy, and instead focuses on ease-of-use. 

SQLite supports five storage classes:

- **NULL**: A null value.

- **INTEGER**: A signed integer, the number of bytes (1, 2, 3, 4, 6, or 8) used depends on the magnitude of the value.

- **REAL**: A floating-point value stored as an 8 byte IEEE floating-point value.

- **TEXT**: A string of character values stored in the default database encoding (e.g., UTF-8).

- **BLOB**: A blob of data stored *exactly* as is in the database.

Note that SQLite does not support Boolean or Date/Time values directly. Instead, Boolean values are encoded as INTEGERs (0 = False, 1 = True). Likewise Date/Time values can be encoded either as TEXT, REAL, or INTEGER values. For full details, see the [SQLite documentation][sd]. 

-----
[sd]: https://www.sqlite.org/lang.html
[1]: https://en.wikipedia.org/wiki/SQL#Data_types
[2]: https://www.sqlite.org/datatype3.html

[[Back to TOC]](#Table-of-Contents)

## Using SQLite

By default, the `sqlite3` command line client will operate in interactive mode. However, this tool will also read and execute commands either in from a separate file by redirecting STDIN, or by enclosing the commands in quotes. Since SQLite databases are files, unless explicitly created from within a program as in memory databases, we pass the name of the database as a command line argument. Thus, to connect to the `i2ds` database with the `sqlite3` command line client in interactive mode, we simply enter the following at a command prompt:

```sql
/home/data_scientist/database: $ sqlite3 i2ds
SQLite version 3.19.3 2017-06-08 14:26:16
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> 
```

To exit from the `sqlite3` client, simply enter either ctrl-d or use the `.quit` command. The `sqlite3` client can either accept SQL commands, or the client can accept dot commands, which are instructions to the SQLite database engine that begin with a `.` character. These commands can be explicitly listed by entering `.help` at the `sqlite3` client prompt. We can do this from within our Notebook by creating and using a file as shown below.

-----

### Executing Terminal Commands in Jupyter Notebooks

In Jupyter Notebook you can execute terminal commands in a notebook code cell by prepending an exclamation mark(!) to the beginning of the command. This is equivalent to running a command in a terminal prompt. For example, if the notebook server is running on Linus(Mac), you can list all files in current folder by executing `!ls` in a code cell. If the notebook server is running on Windows, you can use `!dir`.

In the rest of the notebook, we will use terminal command `!sqlite3` to interact with sqlite database.

**Note**: `!sqlite3` is not a Python code and it won't work within a Python code. We will introduce how to interact with sqlite database in Python in lesson three.

### Creating and Populating a Database

We can easily create and populate a database by using the `sqlite3` client. While we could do this at the command line (and advanced users are encouraged to do so), we can also complete these tasks from within a notebook. The steps we must complete include:

1. Create the new database. We do this by simply passing the name of our new database to the `sqlite3` client. If the file does not exist, a new file will be created. This file will hold the entire contents of the new database.

2. Create the schema for our new database. A relational database is built on a tabular data model. Thus our schema consists of the table definitions, as well as the relationships that might exist between tables. To accomplish this, we must execute SQL `CREATE TABLE` statements.

3. Populate the tables with data. For simplicity, we can use the `.import` command within the `sqlite3` client to import data from a file directly into the relevant table in our database.

In following code cells, we will create a database with name testdb; then use SQL Data Definition Language (DDL) to create two tables, mySuppliers and myProducts, and then populate some values to the tables.

First, we will create a text file with SQL queries using notebook cell magic `%%writefile filename`. This magic will write the following content in the cell to a text file(filename). We can then use `!sqlite3` command to execute the quries in the file to create tables in the database.

In [1]:
%%writefile create.sql

-- First we drop any tables if they exist

DROP TABLE IF EXISTS mySuppliers;
DROP TABLE IF EXISTS myProducts;

-- Vendor Table
    
CREATE TABLE mySuppliers (
    supplierNumber INT NOT NULL,
    supplierName TEXT
) ;

-- Product Table
    
CREATE TABLE myProducts (
    itemNumber INT NOT NULL,
    price REAL,
    supplierNumber INT,
    stockDate TEXT,
    description TEXT
) ;

Overwriting create.sql


In [2]:
# Now create the schema in a new test database

!sqlite3 testdb < create.sql

Let's examine some SQL queries in above cells.
```
DROP TABLE IF EXISTS mySuppliers;
```
drops table mySuppliers from the database, if the table already exists.

```
CREATE TABLE mySuppliers (
    supplierNumber INT NOT NULL,
    supplierName TEXT
) ;
```
Creates table mySuppliers, which has 2 columns:
- supplierNumber, with data type INT, must have a value
- supplierName, with date type TEXT

Now we've created two tables in database testdb. We can verify the schemas with `!sqlite3 testdb .schema`.

In [3]:
!sqlite3 testdb .schema

CREATE TABLE IF NOT EXISTS "myProductSupplier" (
"itemNumber" INTEGER,
  "price" REAL,
  "stockDate" TEXT,
  "description" TEXT,
  "supplierName" TEXT
);
CREATE TABLE mySuppliers (
    supplierNumber INT NOT NULL,
    supplierName TEXT
);
CREATE TABLE myProducts (
    itemNumber INT NOT NULL,
    price REAL,
    supplierNumber INT,
    stockDate TEXT,
    description TEXT
);


SQL query is very strict. Notice that in the `CREATE TABLE` query, column definitions are separated by `,`, but the last column definition **cannot** have `,`, or the query won't work properly.

The two new tables are empty. Let's populate some value to the tables. First we will write some SQL INSERT statement  into a file, then use `!sqlite` command to execute the queries in the file. The SQL INSERT statement has following format:

```sql
INSERT INTO table-Name
    [ (Simple-column-Name [ , Simple-column-Name]* ) ]
	  Expression
```

In [4]:
%%writefile insert.sql

-- Insert into mySuppliers

INSERT INTO mySuppliers(supplierNumber, supplierName)
VALUES (101, 'Luna Vista Limited'),
       (102, 'Mikal Arroyo Incorporated'),
       (103, 'Quiet Beach Industries') ;
    
-- Insert into myProducts    
INSERT INTO myProducts (itemNumber, price, supplierNumber, stockDate, description)
VALUES (1, 29.95, 101, '2015-02-10', 'Male bathing suit, blue'),
       (2, 49.95, 101, '2015-02-20', 'Female bathing suit, one piece, aqua'),
       (3, 9.95, 101, '2015-01-15', 'Child sand toy set'),
       (4, 24.95, 102, '2014-12-20', 'White beach towel'),
       (5, 32.95, 102,'2014-12-22', 'Blue-striped beach towel'),
       (6, 12.95, 103, '2015-03-12', 'Flip-flop'),
       (7, 34.95, 103, '2015-01-24', 'Open-toed sandal') ;
        


Overwriting insert.sql


In [5]:
!sqlite3 testdb < insert.sql

### SQL Select

In the SQL programming language, the task of performing a query falls to the SELECT statement. To provide all the query functionality required by database applications, the SELECT statement's capabilities are extensive. Before looking at example SELECT statements, lets first look at the  formal syntax of SELECT, which, as shown below is actually simple. The basic format is `SELECT ... FROM ... WHERE;`, where you select the columns of interest from rows in a table or tables where certain conditions are satisfied. Of course, things can become considerably more complex. This section covers the basic features of SELECT and defers the more advanced issues to the official documentation.

```
SELECT [ DISTINCT | ALL ] SelectItem [ , SelectItem ]*
FROM clause
[ WHERE clause ]
[ GROUP BY clause ]
[ HAVING clause ]
```

From this you can see that a basic SELECT statement requires only a SELECT and a FROM; you must specify what data to select and indicate the location of the data of interest. Everything else is optional (as indicated by the square brackets). The DISTINCT and ALL keywords are optional qualifiers to indicate that either rows with unique values or all rows should be selected, respectively. By default, ALL is implicitly assumed, and you can use only one DISTINCT qualifier per SELECT statement.

A SELECT statement can have multiple columns listed following the SELECT keyword. Multiple elements (or, more generally, column names) are separated by commas. For example, `SELECT a, b, c` selects the three columns `a`, `b`, and `c`. To select all columns from a table, you can use the asterisk character (`*`) as a shorthand for all columns. An important point to remember is that the result of any SELECT statement is a transient SQLite table, and you can use it in many of the same ways you use a more permanent table.

The FROM component of a SELECT statement indicates from which table (or multiple tables) the data will be extracted. For now, we will focus on selecting data from a single table; latter we will cover table joins and selecting data from multiple tables. In this case, the fully qualified name of the table to query must follow the FROM keyword.

The rest of the SELECT statement is optional. Before you build your first query, however, lets review the order in which the SELECT statement components are evaluated:

1. FROM clause
2. WHERE clause
3. GROUP BY clause
4. HAVING clause
5. SELECT clause

When you break down the process SQLite follows when processing a query, this order is intuitive. First, you must locate the data to be analyzed, after which you filter out the rows of interest. The next steps are to group related rows and, finally, to select the actual columns of interest.

To demonstrate a SELECT statement, we can extract all the columns from the `myProducts` table by using the `sqlite3` tool and passing the SQL statement in as a command line argument.




In [6]:
!sqlite3 testdb "SELECT * FROM myProducts ;"

1|29.95|101|2015-02-10|Male bathing suit, blue
2|49.95|101|2015-02-20|Female bathing suit, one piece, aqua
3|9.95|101|2015-01-15|Child sand toy set
4|24.95|102|2014-12-20|White beach towel
5|32.95|102|2014-12-22|Blue-striped beach towel
6|12.95|103|2015-03-12|Flip-flop
7|34.95|103|2015-01-24|Open-toed sandal


In the previous Code cell, we used the asterisk character to select all columns from the myProducts table without listing them explicitly. This can be a useful shortcut, especially when you're developing database applications, but it isn't a recommended practice. By using the shortcut, you don't explicitly specify the database column names or their order. In a database application, if you always assume that the column names and their order in a table are fixed, you may end up with subtle bugs if someone else modifies the database tables on which your application depends. You should always explicitly name the database columns in your SELECT statements and list the order you require.
As a result, lets look at explicitly listing the columns to extract. This is a recommended practice that also allows us to control the order in which the columns are listed in the query output.

In [7]:
!sqlite3 testdb "SELECT price, itemNumber, description FROM myProducts ;"

29.95|1|Male bathing suit, blue
49.95|2|Female bathing suit, one piece, aqua
9.95|3|Child sand toy set
24.95|4|White beach towel
32.95|5|Blue-striped beach towel
12.95|6|Flip-flop
34.95|7|Open-toed sandal


Up to this point, we have only selected columns for all rows in a single table. This can be expensive in terms of query performance, especially if you only want a subset of the rows from a large table. A more efficient technique is to filter database rows by placing conditions in the WHERE clause, which is evaluated immediately after the tables are specified within the FROM clause. The rest of this section discusses some of the basic features that are enabled by using the WHERE clause, including the ability to select rows that satisfy Boolean conditions as well as join multiple tables to perform more complex queries.
The simplest and most common use of the WHERE clause is to filter the rows from a table before selecting any columns, as shown in the next two Code cells.


In [8]:
!sqlite3 testdb "SELECT itemNumber, price, supplierNumber, description FROM myProducts WHERE price > 30.00;"

2|49.95|101|Female bathing suit, one piece, aqua
5|32.95|102|Blue-striped beach towel
7|34.95|103|Open-toed sandal


In [9]:
!sqlite3 testdb "SELECT itemNumber, price, supplierNumber, description FROM myProducts WHERE description LIKE '%towel%';"

4|24.95|102|White beach towel
5|32.95|102|Blue-striped beach towel


In above cell we use SQL `LIKE` operator in `WHERE` clause to search a specific pattern in a column. `%towel%` represents all strings that contain word `towel`. Unlike Python, SQL is case insensitive. So `LIKE '%towel%'` and `LIKE '%TOWEL%'` will return same result.

-----
## SQL Delete

To delete data in a SQLite database, you use the SQL DELETE statement, which can delete either all rows in a table or a specific subset of rows. The formal syntax for the SQL DELETE statement is remarkably simple:

```sql
DELETE FROM tableName
    [WHERE clause]
```

The DELETE statement deletes all rows from the specified table that satisfy an optional WHERE clause. If no WHERE clause is included, all rows in the table are deleted. To demonstrate this use of the DELETE statement, we can create a temporary table, insert several rows, and delete them all.

-----

In [10]:
%%writefile delete.sql

-- First create the temporary table
CREATE TABLE temp (aValue INT) ;

-- Insert fake data
INSERT INTO temp VALUES(0), (1), (2), (3) ;

-- Count rows in the table
SELECT COUNT(*) AS COUNT FROM temp ; 

-- Delete all rows
DELETE FROM temp ;

-- Count all rows in the table
SELECT COUNT(*) AS COUNT FROM temp ; 

-- Now drop the temporary table

DROP TABLE temp ;

Overwriting delete.sql


In [11]:
!sqlite3 testdb < delete.sql

4
0


-----

The previous example created a single-column temporary table to hold a single integer value. Next we inserted four rows into the database and issued a SELECT statement to verify that the new table contained four rows. By using an unconstrained DELETE statement, we delete all four rows from the temporary table, which is verified by the second SELECT statement, which indicates that the temporary table contains zero rows. Finally, the DROP TABLE statement deletes the empty table from the schema.

In general, however, you don't want to delete all rows from a table; instead, you'll selectively delete rows. To do this, you create an appropriate WHERE clause that identifies all rows of interest. The syntax for the WHERE clause that you can use with a DELETE statement is identical to that discussed previously when we presented the full SQL SELECT statement syntax. The basic building blocks for constructing a Boolean expression within a WHERE clause were presented in an earlier table. The following example demonstrates using a WHERE clause in a DELETE statement, where we delete all rows that satisfy at least one of two conditions.

-----

In [12]:
%%writefile delete2.sql

SELECT '------Before Delete------';
-- First display data
SELECT itemNumber, description FROM myProducts ;

-- Selectively delete rows
DELETE FROM myProducts 
    WHERE description LIKE '%towel%' OR itemNumber <= 3 ;
SELECT '------After Delete------';
-- Confirm the proper deletion
SELECT itemNumber, description FROM myProducts ;

Overwriting delete2.sql


In [13]:
!sqlite3 testdb < delete2.sql

------Before Delete------
1|Male bathing suit, blue
2|Female bathing suit, one piece, aqua
3|Child sand toy set
4|White beach towel
5|Blue-striped beach towel
6|Flip-flop
7|Open-toed sandal
------After Delete------
6|Flip-flop
7|Open-toed sandal


-----
### SQL Update

The last SQL task for dealing with data that you need to address is updating specific column values for selected rows in a table. At some level, the SQL UPDATE statement is the union of the SQL INSERT and DELETE statements, because you must select rows to modify as well as specify how to modify them. Formally, the UPDATE statement syntax is straightforward, because you must specify the new column values for the set of rows to be updated:

```sql
UPDATE tableName
    SET columnName = Value
    [ , columnName = Value} ]*
    [WHERE clause]
```

As shown in this SQL syntax, an SQL UPDATE statement must have, at a minimum, one SET component to update one column, along with one or more SET components and a WHERE clause, both of which are optional. If the WHERE clause isn't included, the UPDATE statement modifies the indicated columns for all rows in the table.

Issuing an UPDATE statement is fairly easy, as shown in the following Code cell, where we modify two columns of a single row.

-----

In [14]:
%%writefile update.sql
SELECT '------Before Update------';
-- Extract the test row
SELECT itemNumber, price, stockDate FROM myProducts WHERE itemNumber = 6 ;

-- Update the row
UPDATE myProducts SET price = price * 2, stockDate = '2019-04-01'  WHERE itemNumber = 6 ;
SELECT '------After Update------';
-- Show the new result
SELECT itemNumber, price, stockDate FROM myProducts WHERE itemNumber = 6 ;

Overwriting update.sql


In [15]:
!sqlite3 testdb < update.sql

------Before Update------
6|12.95|2015-03-12
------After Update------
6|25.9|2019-04-01


-----

## Ancillary Information

The following links are to additional documentation that you might find helpful in learning this material. Reading these web-accessible documents is completely optional. The following sites allow you to try out SQL commands online.

1. The [SQLite documentation][23] provides more details on the commands presented in this notebook.
1. [W3 Schools SQL][1], a general SQL demo site
2. [SQLZoo][2], allows you to specify the Relational Database to target

-----

[23]: https://www.sqlite.org/lang.html
[1]: http://www.w3schools.com/SQL/
[2]: http://sqlzoo.net/wiki/SELECT_basics

**&copy; 2019: Gies College of Business at the University of Illinois.**

This notebook is released under the [Creative Commons license CC BY-NC-SA 4.0][ll]. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.

[ll]: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode