# Modifying data in existing tables

The previous activity covered a broad range of standard SQL syntax for querying data in existing tables. Now we're going to have a look at how we can set these tables up from scratch and how we can manipulate them in more advanced ways.

As a starting point, let's have a look at how we can modify data in an existing table.

There are three basic commands to manipulate a table's content:

INSERT: inserts a new row into a table;

DELETE: deletes a specified row;

UPDATE: changes attributes within a row.

We're going to look at each of them in turn using a small version of our Star table from the previous activity.

# The INSERT statement

The INSERT statement adds a row to an existing table using the following syntax:

INSERT INTO $\text{<tablename>}$ $\text{(<attributes>)}$ VALUES $(...)$;

You can optionally pass a list of attributes that define the order. The data goes into the VALUES $(...)$ clause. For example, let's add a new star to our Star table:

In [None]:
INSERT INTO Star (kepler_id, t_eff, radius)
VALUES (2713050, 5000, 0.956);

Have a look at the table before and after the INSERT command using SELECT * FROM Star.

You can insert multiple rows at once using multiple tuples of values:

In [None]:
INSERT INTO Star (kepler_id, t_eff, radius) VALUES
  (2713050, 5000, 0.956),
  (2713051, 3100, 1.321);

It is not always necessary to include a list of the attributes, it is still good practice. If the order of the columns in the table change, you don't have to change the INSERT command if you've set the order of attributes in there. For example, the following command works just as well as the one above:

In [None]:
INSERT INTO Star (kepler_id, radius, t_eff)
VALUES (2713050, 0.956, 5000);

# The DELETE statement

The DELETE statement deletes zero or more rows in the selected table. The rows are selected using a WHERE clause:

DELETE FROM $\text{<tablename>}$ WHERE $\text{<condition>}$;

If we want to delete a planet in the Planet table with a **kepler_id** of 2713049, we would put this condition in the WHERE clause:

In [None]:
SELECT * FROM Planet
WHERE kepler_id = 2713049;

DELETE FROM Planet
WHERE kepler_id = 2713049;

SELECT * FROM Planet
WHERE kepler_id = 2713049;

### Query Results

| kepler_id | koi_name  | kepler_name |  status   |   period    | radius | t_eq |
|-----------|-----------|-------------|-----------|-------------|--------|------|
|   2713049 | K00794.01 | Kepler-683b | CONFIRMED | 2.539183179 |   1.97 | 1348 |

**After Deleting the Entry**  

| kepler_id | koi_name | kepler_name | status | period | radius | t_eq |
|-----------|----------|-------------|--------|--------|--------|------|
|           |          |             |        |        |        |      |


If you don't include a condition in the DELETE statement which only affects a selection of rows, all rows will be deleted:

In [None]:
DELETE FROM Planet;

SELECT * FROM Planet;

### Query Results

**After Deleting 95 Entries**  

| kepler_id | koi_name | kepler_name | status | period | radius | t_eq |
|-----------|----------|-------------|--------|--------|--------|------|
|           |          |             |        |        |        |      |


# 🛑**DANGER!**

# The UPDATE statement

The UPDATE statement allows us to modify one or more attributes of existing rows using the following syntax:

UPDATE $\text{<tablename>}$

SET $\text{<expression>}$

WHERE $\text{<condition>}$

The attributes that we want to change are specified within the SET clause and we'll use a condition to select the row in which we want to make the changes.

For example, let's suppose that we want to change the temperature of the star with **kepler_id** = 2713049:

In [None]:
SELECT * FROM Star
WHERE kepler_id = 2713049;

UPDATE Star
SET t_eff = 6000
WHERE kepler_id = 2713049;

SELECT * FROM Star
WHERE kepler_id = 2713049;

### Query Results

**Before and After the Update**

| kepler_id | t_eff | radius |
|-----------|-------|--------|
|   2713049 |  5996 |  0.956 |
|   2713049 |  6000 |  0.956 |


Again, make sure that you always include the condition in an UPDATE command. Here's what happens when you don't:

In [None]:
UPDATE Star
SET t_eff = 6000;

SELECT * FROM Star;

### Query Results

**Update of 66 Rows**

| kepler_id | t_eff | radius |
|-----------|-------|--------|
|   2713049 |  6000 |  0.956 |
|   3114167 |  6000 |  0.677 |
|   3115833 |  6000 |  0.847 |
|   3246984 |  6000 |  0.973 |
|   3342970 |  6000 |  1.064 |
|   3351888 |  6000 |  1.057 |
|   3453214 |  6000 |   0.77 |
|   3641726 |  6000 |   0.82 |
|   3832474 |  6000 |  0.867 |
|   3935914 |  6000 |  0.893 |
|   3940418 |  6000 |  0.807 |
|   4049131 |  6000 |  0.761 |
|   4139816 |  6000 |   0.48 |
|   4275191 |  6000 |  0.781 |
|   4476123 |  6000 |  0.751 |
|   5358241 |  6000 |  0.945 |
|   5358624 |  6000 |  0.788 |
|   5456651 |  6000 |  0.734 |
|   6862328 |  6000 |  0.871 |
|   6922244 |  6000 |  1.451 |
|   8395660 |  6000 |  1.029 |
|   9579641 |  6000 |  1.332 |
|  10187017 |  6000 |  0.755 |
|  10480982 |  6000 |  0.947 |
|  10526549 |  6000 |  0.696 |
|  10583066 |  6000 |  0.693 |
|  10601284 |  6000 |  0.806 |
|  10662202 |  6000 |  0.527 |
|  10666592 |  6000 |  1.991 |
|  10682541 |  6000 |  0.847 |
|  10797460 |  6000 |   1.04 |
|  10811496 |  6000 |  0.868 |
|  10848459 |  6000 |  0.803 |
|  10854555 |  6000 |  1.046 |
|  10872983 |  6000 |  0.972 |
|  10875245 |  6000 |  1.411 |
|  10910878 |  6000 |  0.742 |
|  10984090 |  6000 |  1.073 |
|  10987985 |  6000 |  0.826 |
|  11018648 |  6000 |  0.796 |
|  11138155 |  6000 |  1.025 |
|  11153539 |  6000 |  0.969 |
|  11304958 |  6000 |  1.046 |
|  11391957 |  6000 |  0.782 |
|  11403044 |  6000 |  1.103 |
|  11414511 |  6000 |  0.965 |
|  11460018 |  6000 |  0.831 |
|  11465813 |  6000 |  0.983 |
|  11493732 |  6000 |  1.091 |
|  11507101 |  6000 |  0.971 |
|  11754553 |  6000 |   0.54 |
|  11812062 |  6000 |  0.812 |
|  11818800 |  6000 |  0.781 |
|  11853255 |  6000 |   0.45 |
|  11904151 |  6000 |  1.056 |
|  11918099 |  6000 |  0.727 |
|  11923270 |  6000 |   0.49 |
|  11960862 |  6000 |  0.989 |
|  12020329 |  6000 |  0.867 |
|  12066335 |  6000 |   0.48 |
|  12070811 |  6000 |  0.752 |
|  12110942 |  6000 |  0.917 |
|  12366084 |  6000 |  0.931 |
|  12404086 |  6000 |  0.775 |
|  12470844 |  6000 |  0.788 |
|  12644822 |  6000 |  0.919 |


# **Question: Adding Stars**

As a warm-up, add the following stars to the existing Star table:

| kepler_id | t_eff | radius |
|-----------|-------|--------|
| 7115384   | 3789  | 27.384 |
| 8106973   | 5810  | 0.811  |
| 9391817   | 6200  | 0.958  |


Have a look at the table before and after the insertion to make sure everything went fine. The automarker will run a SELECT * FROM Star query to check the result.

# ⌛Solution:

In [None]:
INSERT INTO Star (kepler_id, t_eff, radius) VALUES
(7115384, 3789, 27.384),
(8106973, 5810, 0.811),
(9391817, 6200, 0.958);

# **Question: A messed up table**

Your task is to fix a **Planet** table has been corrupted.

There are two problems: some of the rows were copied with a negative radius and the unconfirmed planets were given fake Kepler names. To fix this, you'll have to:

Update the Kepler names of planets which don't have a confirmed status (replace the **kepler_name** with NULL);

Delete rows where the radius is negative.

Have a look at the table to find out what needs to be changed.

# ⌛Solution:

In [None]:
UPDATE Planet
SET kepler_name = NULL
WHERE status != 'CONFIRMED';

DELETE FROM Planet
WHERE radius < 0;

# Creating tables

After looking at how to modify data in existing tables, let's have a look at how to set up new tables from scratch.

For this, we're going to use the CREATE TABLE statement which adds a new table to a database using the following syntax:

CREATE TABLE $\text{<tablename>}$ (

  $\text{<attribute1>}$  $\text{<type1>}$ (size1) $\text{<constraint1>}$,

  $\text{<attribute2>}$  $\text{<type2>}$ (size2) $\text{<constraint2>}$,

  $...$

);

Each attribute must have a specified data type with optional size argument. Table attributes may also have additional constraints, which establish further rules that the data in the table has to fulfill.

# Data types

SQL uses standard data types that may have size arguments, specifying e.g. the length of a string. A few of the most commonly used data types are listed below:

| Data Type  | Description                            |
|------------|----------------------------------------|
| SMALLINT   | Signed two-byte integer               |
| INTEGER    | Signed four-byte integer              |
| FLOAT      | Eight-byte floating-point number      |
| CHAR(n)    | Fixed-length string with n characters |
| VARCHAR(n) | Variable-length string with maximum n characters |

With these data types, we can now set up a table from scratch. Let's have a look at how our Star table is created:

In [None]:
CREATE TABLE Star (
  kepler_id INTEGER,
  t_eff INTEGER,
  radius FLOAT
);

INSERT INTO Star VALUES
  (10341777, 6302, 0.815);

As we just saw, last line inserts a star into the new table. Take a look at the resulting table using the SELECT statement.

# Consistency constraints

In addition to the required data type, each attribute may additionally be constrained to only hold data of a specified form. For this, SQL uses so-called consistency constraints which are specified right after the data type in the CREATE TABLE command.

Constraints enforce rules for data that is added to the table. We have already encountered the NOT NULL constraint when we were looking at a table's structure using the \d statement. This constraint forbids the attribute from being empty. Other constraints are:

| Constraint Type | Description                                           |
|------------------|-------------------------------------------------------|
| NOT NULL         | Value cannot be NULL                                 |
| UNIQUE           | Value must be unique in the table                    |
| DEFAULT          | Specifies a default if the field is left blank       |
| CHECK            | Ensures that the value meets a specific condition    |
| PRIMARY KEY      | Combination of NOT NULL and UNIQUE                   |
| FOREIGN KEY      | Ensures the data matches the specified attribute in another table |


With these modifiers, we can now, for example, set up the Star table like this:

In [None]:
CREATE TABLE Star (
  kepler_id INTEGER PRIMARY KEY,
  t_eff INTEGER CHECK (t_eff > 3000),
  radius FLOAT
);

INSERT INTO Star VALUES
  (10341777, 6302, 0.815);

Where the kepler_id is now a unique attribute which is always filled with data. The temperature of any star that is added to the table cannot be lower than 3000K.

Try inserting a row with the same **kepler_id** or **t_eff** < 3000.

# Violating data types and constraints

What happens if we insert data into a table with the wrong type or violating consistency constraints?

In most cases, inserting the wrong data type should result in an error message and an aborted transaction. However, in some cases, values may be cast into the type required by the table attribute, as it happens in the following example:

In [None]:
CREATE TABLE Star (
  kepler_id INTEGER
);

INSERT INTO Star VALUES (3.141);
SELECT * FROM Star;

INSERT INTO Star VALUES ('a string');
SELECT * FROM Star;

Here the float gets truncated to an integer with no error message. When trying to insert a string the transaction fails.

Consistency constraints cannot be violated by inserting or updating data:

In [None]:
CREATE TABLE Star (
  kepler_id INTEGER CHECK(kepler_id > 10)
);

INSERT INTO Star VALUES (3);
SELECT * FROM Star;

This insertion fails because we're trying to insert an ID < 10, when the consistency constraint requires a value > 10.

# Make your own table

Now you've ready to create your first table. Your task is to set up a new Planet table and fill it with the planets listed below.

Your table should consist of the following attributes in this order:

* **kepler_id**, as type INTEGER
* **koi_name**, as type VARCHAR(15)
* **kepler_name**, as type VARCHAR(15)
* **status**, as type VARCHAR(20)
* **radius**, as type FLOAT

Add the following constraints:

make all attributes NOT NULL except kepler_name;
make the koi_name a unique attribute.
Finally, insert the following planets into your new table:

| Kepler ID | KOI Name   | Kepler Name  | Status     | Radius   |
|-----------|------------|--------------|------------|----------|
| 6862328   | K00865.01  |              | CANDIDATE  | 119.021  |
| 10187017  | K00082.05  | Kepler-102 b | CONFIRMED  | 5.286    |
| 10187017  | K00082.04  | Kepler-102 c | CONFIRMED  | 7.071    |


Don't forget to use NULL for the missing planet name!

# ⌛Solution:

In [None]:
CREATE TABLE Planet (
    kepler_id INTEGER NOT NULL,
    koi_name VARCHAR(15) NOT NULL UNIQUE,
    kepler_name VARCHAR(15),
    status VARCHAR(20) NOT NULL,
    radius FLOAT NOT NULL
);

INSERT INTO Planet (kepler_id, koi_name, kepler_name, status, radius) VALUES
(6862328, 'K00865.01', NULL, 'CANDIDATE', 119.021),
(10187017, 'K00082.05', 'Kepler-102 b', 'CONFIRMED', 5.286),
(10187017, 'K00082.04', 'Kepler-102 c', 'CONFIRMED', 7.071);

# Primary key constraints

Primary and foreign keys are very important constraints in relational databases. We're going to have a closer look at how they work and motivate why you'd want to use these constraints for your tables.

As we've previously, a primary key is a unique identifier of a row and cannot take NULL values. We can create a primary key by adding the constraint to an attribute in our table like this:

In [None]:
CREATE TABLE Star (
  kepler_id INTEGER PRIMARY KEY
);

A table can at most have one primary key, which can be one or more attributes. A primary key enforces data integrity as it forbids duplicates and creates a functional relationship between the key and the other attributes.

It also plays an important role when linking multiple tables together, as we will see in a moment.

# Foreign key constraints

A foreign key links data shared between two or more tables, thereby enforcing *referential integrity*. An attribute with a foreign key constraint is linked to an attribute in another table.

**It can only be filled with a value that exists in the other table.**

For example, **kepler_id** is shared by both stars and planets in our tables and links planets to the stars they orbit.

We can use a foreign key constraint for **kepler_id** in the **Planet** table to link the planets to their corresponding stars. The syntax for creating a foreign key constraint is:

CREATE TABLE Star (
  kepler_id INTEGER PRIMARY KEY
);
  
CREATE TABLE Planet (
  kepler_id INTEGER REFERENCES Star (kepler_id)
);
  
INSERT INTO Star VALUES (10341777);
INSERT INTO Planet VALUES (10341777);

Try changing the **kepler_id** value for the **Planet** table and see what happens.

# ⏰ **Hint**:

A foreign key constraint can only be created on a unique attribute in the referenced table. If you remove the PRIMARY KEY constraint in the Star table, the reference will fail.

# Copying CSV data into tables

Earlier, we inserted data into a table using the INSERT statement:

INSERT INTO $\text{<tablename>}$ $\text{(attr1, attr2, ...)}$ VALUES
  $(...)$;

It is inconvenient to insert large datasets like this. Instead, we can read data from CSV files using the COPY statement:

COPY $\text{<tablename> (attr1, attr2, ...)}$ FROM 'filename' CSV;

Let's have a look at an example. We have a CSV file with star attributes in the following order, (**kepler_id**, **t_eff**, **radius**):

| Kepler ID | T_eff | Radius |
|-----------|-------|--------|
| 10341777  | 6302  | 0.815  |
| 11296798  | 6335  | 3.523  |
| 3836450   | 5160  | 0.784  |
| 4483235   | 8782  | 1.965  |
| 6590362   | 5926  | 0.887  |

To create and fill the table, we use the following commands:

In [None]:
CREATE TABLE Star (
  kepler_id INTEGER PRIMARY KEY,
  t_eff INTEGER,
  radius FLOAT
);

COPY Star (kepler_id, t_eff, radius)
  FROM 'stars.csv' CSV;

SELECT * FROM Star;

### Query Results

**Table Creation and Data Insertion**

| kepler_id | t_eff | radius |
|-----------|-------|--------|
|  10341777 |  6302 |  0.815 |
|  11296798 |  6335 |  3.523 |
|   3836450 |  5160 |  0.784 |
|   4483235 |  8782 |  1.965 |
|   6590362 |  5926 |  0.887 |

The COPY statement copies the contents of the CSV file into the table. The new rows are added after existing rows.

# **Question: DIY exoplanet archive**

You can now create the Star and Planet tables that we've been querying throughout these activities and fill the tables using the two CSV files which contain the star and planet data.

The tables and CSV files should contain the columns as follows:

stars.csv:

**kepler_id** as an integer and primary key;

**t_eff** as an integer;

**radius** as a float.

planets.csv:

**kepler_id** as an integer and a foreign key referencing Star on kepler_id;

**koi_name** as a variable char (max 20) and primary key;

**kepler_name** as a variable char (max 20);

**status** as a variable char (max 20);

**period** as a float;

**radius** as a float;

**t_eq** as an integer.

Add the following other constraints to your tables:

Ensure t_eff and radius in Star cannot be NULL

Ensure the status in Planet cannot be NULL

# ⌛Solution:

In [None]:
CREATE TABLE Star (
    kepler_id INTEGER PRIMARY KEY,
    t_eff INTEGER NOT NULL,
    radius FLOAT NOT NULL
);

CREATE TABLE Planet (
    kepler_id INTEGER,
    koi_name VARCHAR(20) PRIMARY KEY,
    kepler_name VARCHAR(20),
    status VARCHAR(20) NOT NULL,
    period FLOAT,
    radius FLOAT,
    t_eq INTEGER,
    FOREIGN KEY (kepler_id) REFERENCES Star(kepler_id)
);

COPY Star (kepler_id, t_eff, radius) FROM 'stars.csv' CSV;

COPY Planet (kepler_id, koi_name, kepler_name, status, period, radius, t_eq) FROM 'planets.csv' CSV;

# Modifying the structure of existing tables

Up to here we have covered the manipulation of a table's data, i.e. its rows, using the INSERT, DELETE and UPDATE statements. We have also looked at how to create new tables and how to define their attributes and properties.

But what can we do if we not only want to change the data, but also the attributes of the table, i.e. its columns, or their properties after we've created it?

For this, we can use the ALTER TABLE command, which allows us to add, delete and modify the columns in an existing table.

# Adding and removing columns

To add a new column to an existing table, we use the ADD COLUMN clause in the ALTER TABLE statement:

ALTER TABLE $\text{<tablename>}$

ADD COLUMN $\text{<columnname>}$ $\text{<datatype>}$ $\text{<modifiers>}$;

To delete a column, we use the DROP COLUMN clause:

ALTER TABLE $\text{<tablename>}$

DROP COLUMN $\text{<columnname>}$;

For example, let's add two columns to capture the equatorial coordinates of the solar systems in RA (ra) and declination (decl) angles to our **Star** table and remove them again:

In [None]:
SELECT * FROM Star LIMIT 1;

ALTER TABLE Star
ADD COLUMN ra FLOAT,
ADD COLUMN decl FLOAT;

SELECT * FROM Star LIMIT 1;

ALTER TABLE Star
DROP COLUMN ra,
DROP COLUMN decl;

SELECT * FROM Star LIMIT 1;

### Query Results

| kepler_id | t_eff | radius |
|-----------|-------|--------|
| 2713049   | 5996  | 0.956  |
(1 row)

**After Altering the Table**  

ALTER TABLE  

| kepler_id | t_eff | radius | ra | decl |
|-----------|-------|--------|----|------|
| 2713049   | 5996  | 0.956  |    |      |
(1 row)

**After Removing Columns**  

ALTER TABLE  

| kepler_id | t_eff | radius |
|-----------|-------|--------|
| 2713049   | 5996  | 0.956  |
(1 row)


# Modifying data types and constraints

We can also use the ALTER TABLE statement to modify the data type and constraints of a column:

ALTER TABLE $\text{<tablename>}$

ALTER COLUMN $\text{<columnname>}$ SET DATA TYPE $\text{<newtype>}$;

ALTER TABLE $\text{<tablename>}$

ADD CONSTRAINT $\text{<columnname>}$ $\text{<newconstraint>}$;

When changing either the data type or constraint you have to be careful that the change does not conflict with the data in the table.

The query below changes the type of t_eff to float and adds a positive constraint for the radius:

In [None]:
\d Star;

ALTER TABLE Star
 ALTER COLUMN t_eff SET DATA TYPE FLOAT;

ALTER TABLE Star
  ADD CONSTRAINT radius CHECK(radius > 0);

\d Star;

### Query Results

| Column    | Type             | Modifiers |
|-----------|------------------|-----------|
| kepler_id | integer          | not null  |
| t_eff     | integer          | not null  |
| radius    | double precision | not null  |

Indexes:
- "star_pkey" PRIMARY KEY, btree (kepler_id)

Referenced by:
- TABLE "planet" CONSTRAINT "planet_kepler_id_fkey" FOREIGN KEY (kepler_id) REFERENCES star(kepler_id)

**After Altering the Table**  

ALTER TABLE  

| Column    | Type             | Modifiers |
|-----------|------------------|-----------|
| kepler_id | integer          | not null  |
| t_eff     | double precision | not null  |
| radius    | double precision | not null  |

Indexes:
- "star_pkey" PRIMARY KEY, btree (kepler_id)

Check constraints:
- "radius" CHECK (radius > 0::double precision)

Referenced by:
- TABLE "planet" CONSTRAINT "planet_kepler_id_fkey" FOREIGN KEY (kepler_id) REFERENCES star(kepler_id)


# **Question: Star coordinates**

Your task is to add two columns to the existing Star table to hold the equatorial coordinates as RA and declination angles and then fill the new columns with data. Call the attributes **ra** and **decl**.

The new stars data, including the equatorial coordinates is stored in stars_full.csv. The attributes in this file are ordered as follows:
(**kepler_id**, **t_eff**, **radius**, **ra**, **decl**)

# **Hint:**

To update the data in the table to fill in values for the new attributes, use the DELETE statement to empty the table and then copy the full CSV file in.

In [None]:
-- Add the new columns for RA and Declination
ALTER TABLE Star
ADD COLUMN ra FLOAT,
ADD COLUMN decl FLOAT;

-- Delete all existing rows from the Star table
DELETE FROM Star;

-- Populate the Star table with the new data from stars_full.csv
COPY Star (kepler_id, t_eff, radius, ra, decl)
FROM 'stars_full.csv'
CSV;